The propositions as types paradigm yields a correspondence between a propositional system and a lambda calculus --
variants of this are also known as the "Curry-Howard
Isomorphism". Lambek calculus corresponds to a typed lambda calculus
with subtypes. The more specific the type, the better the meaning of the
sentence is preserved. For example, consider the sentence "Socrates is a
man.". In the empty context, i.e., without any assumptions, the only
meaningful type that can be assigned to Socrates is subject.
If more information is available, e.g., if we include a story about Socrates's
life in the context, then it is possible to derive a much more specific type,
such as greek*human*male, where * stands for the
intersection type. Thus, by including a large corpus of text in the context,
the meaning of a sentence can be well parsed, assuming the text reveals any
information about the sentence.
Having a large context can clog up the inference machine. In the Forum 2010 project, the context are primarily based on IRC logs and Usenet newsgroups. Conventional theorem proving techniques cannot be used in this case. At the expense of not deriving the most specific types, performance can be significantly boosted if an artificial neural network (ANN) is incorporated into the system. Connectionist systems are a well established Artificial Intelligence technique for iteratively converging on an optimal solution where closed form methods are either impossible or prohibitively expensive. Recent work has looked at how to implement symbolic reasoning in an ANN framework [3].
A sentence can be analyzed in a parallel bottom-up fashion. In the first pass single words are analyzed, then the types of phrases are derived, then clauses, and in the end the whole sentence. This gives an average time complexity O(N log N), where N is the number of words in the sentence, assuming N processors are available.
The collection of all possible meanings forms a category Omega, with types as objects and meanings of sentences as morphisms. A given context C, such as the IRC logs, forms a reflective subcategory Sigma. Consider the monad T induced by it. Intuitively, this monad represents all possible responses to all possible sentences in the given context C. Consider an adjointness pair (A,X) that corresponds to the monad T. There is a whole spectrum of them, and each represents a state of mind. By choosing one of these pairs (A,X), we can simply compute a reply to t as A(X(t)).
We call these State Of Mind ADjointness pairs SOMAD's, also popularly known as "Forum 2010 personae". Each persona is just a SOMAD. The computation of a SOMAD is expensive, but it only has to be done once. In fact, a SOMAD can be incrementally updated. In Forum 2010, the neural nets that simulate the personae are incrementally trained on each day's Usenet feed, along with a basic regiment of recent IRC logs.
The simplest SOMAD is obtained by the Kleisli construction of adjointness (A,X) for monad T. This is known as the "Kleisli SOMAD", an unfortunate terminology since people tend to misunderstand it as "Kleisli persona" and then wonder why the Kleisli SOMAD does not behave like Kleisli. (Of course, a mathematical construction has nothing to do with the personality of its inventor!) The Kleisli SOMAD yields the statistically uncorrelated responses. It is in fact simulated by a randomly trained neural net. It is not computationally expensive, and thus usually gives very fast, and senseless, responses. On Forum 2010 the Kleisli SOMAD is known as the "System Monitor" persona, which was trained on random CMU Zephyr traffic.
On the other side of the spectrum is the Eilenberg-Moore construction, which represents a much different state of mind. It is the state of mind of an oracle that is allowed to draw conclusions from the knowledge represented by the whole monad T. In other words, in the Eilenberg-Moore adjointness, all facts represented by T are considered. Alas, this "omniscient" adjointness is non-constructible and undecidable, and consequently only of a theoretical value. One could say that it represents the thoughts of God, never to be known by man.
Other SOMAD's can be generated by various methods. See the implementation section 2.2 for discussion of Breen & Knight's work on this topic.
SIM uses techniques for rapid indexing and relevant information look-up similar to those in the Google WWW database. This significantly decreases search times and communication overload.
The first kind of SOMADs is computed from a large corpus of text of a known writer or philosopher. On Forum 2010 the following were used:
The second kind of SOMADs is the so-called multi-parametric persona SOMAD. In this case a SOMAD is determined by a number of parameters. Different settings of parameters give different personae. By using various optimization and best-fit techniques, the parameters are tuned to fit a given persona. This technique works well for personae for whom no large corpus of text is available, such as the "Bug-Eyed Earl" persona, based on the enigmatic Bug-Eyed Earl from the comic strip Red Meat.
The multi-parametric SOMADs are much more flexible and easier to generate. However, the criteria for setting the parameters are somewhat subjective, and do not always give best results. I am continuously experimenting with new techniques, and generate approximately one new persona every week, using Forum 2010 as a testing ground.
For results, it is best if you check it out yourself. Forum 2010 is located at
http://www.forum2010.org:2010/
Michael Harkavy, with his experience in complexity theory, pointed out the correct interpretation of the Eilenberg-Moore SOMAD.
[2] S. Mac Lane: "Categories for the Working Matehmatician", Springer-Verlag, 1971