Tuesday, February 08, 2005

Google is your friend

After a long hiatus, I'm going to start blogging again with this paper on "Automatic Meaning Discovery using Google" by Paul Vitanyi (the Kolmogorov complexity guy) and Rudi Cilibrasi, which has generated a flurry of interest, even sparking a slashdot discussion. The essence of the paper is this : Using the page counts returned by the Google search engine to define a distribution over words and word pairs and using it to automatically extract meaning from the world wide web. For example, the number of page counts returned by the query "horse"+"rider"(about 2,710,000) versus "hoarse"+"rider"(31,400) gives some information on the semantic associations between these words. In this way they are trying to exploit the huge but low-quality information source that is the world wide web to generate a lexicon and an ontology, in comparison to Cyc for example which is building a hand-crafted knowledge base.

The idea has been suggested before but this is the first realistic attempt that I know of. Their approach is interesting for two reasons. First, it has strong theoretical justification: an argument based on Kolmogorov complexity and optimal string encodings. Basically the metric they use, called the Normalized Google Distance, is universal w.r.t. the Google Distance of individual authors ie. the NGD of any two words is within a linear factor of the GD of those words in the web documents originating from any one source.

Secondly they have impressive experimental results, especially one involving the heirarchical classification of a set of numbers and colours. Another set of experiments uses the NGD between an instance word and a set of "anchor" words to define a set of features that is used as input to an SVM. By using the correct set of anchor words, they were able to classify all words that are "electrical" terms with 100% accuracy.

Tuesday, December 07, 2004

Godel Demystified

Apu Kapadia asked me some questions today about "Logic and Godel and stuff" and I sent him a long mail putting it all in a nutshell. I thought it might be useful to reproduce it here. The following is a massively condensed description of First Order Logic and its (in-)completeness theorems. I have not ventured into any philosophical speculations. A good book for that would be Douglas Hofstadter's Godel,Escher, Bach.

Every First Order Logic theory is in a language (say L) with 3 kinds of symbols - predicates, constants and functions. So you can make a statement like P(a,b) ^ Q(f(b),c) which means "P holds of a and b, and Q holds of f(b) and c".

To give semantics to a first order formula, each of these symbols must have an interpretation. This is done by using what is called an L-structure. There are two things required for an L-structure, first a set of objects called the universe, and an interpretation for each symbol of L w.r.t this universe. So each constant refers to some element of the universe and each function symbol (predicate symbol) refers to a function (relation) on the domain of these objects. Depending on how you define your L-structure M, it can either satisfy a First order formula 'phi' written in the language L, or not. If it does, it's called a model and you write M |= phi. For eg. suppose G= (U, @) is an L-structure for the language L with just the relational symbol P, s.t. @ is commutative over U. @ is the interpretation of P in G. Therefore G |= P(x,y) = P(y,x).

Suppose you have two formulas phi and psi s.t. every model of phi is a model of psi, then you write phi |= psi .

Now we need to define provability. First we define a set of syntactic manipulations you can do to a formula to get another one. If by taking some fomulas (axioms) Sigma and doing those manipulations you get phi, then you can write Sigma |- phi. For eg. Modus Ponens: if you have Sigma |- phi => psi and Sigma |- phi, then you can say Sigma |- psi .

Godel's Completeness theorem states that, Sigma |- psi iff Sigma |= psi. So in one sense the notion of provability captures the notion of truth, but only if by truth you mean what statements your axioms can entail.
Godel's Incompleteness theorem however says this: Let L contain the symbols {0, S, +, x, <}. Consider an L-structure NN defined in this way: Take N as the set of natural numbers and define the other symbols in the obvious way (S is the successor function so S(0) is 1, S(S(0)) is 2 etc). Let Th(NN) be all the formulas written in L that are true in the model NN. Can we find an axiomatization of Th(NN), ie. a set of formulas Sigma, s.t. if NN |= phi then Sigma |- phi ? Can we do it so that Sigma is computable ie. there exists an algorithm which recognizes if some formula belongs in the set of axioms Sigma? Godel showed that this was impossible. Church showed in addition that Th(NN) is algorithmically undecidable.

So therefore, when you're interested only in the "truth" w.r.t a particular model NN, then your proof system will be incomplete.

Finally to clear up a common misconception, the Continuum Hypothesis is not an example of such a Godel statement (true but not provable). The truth of the CH depends on the choice of the model you choose from among the models that satisfy the axioms of Set theory. In some models it is true, and in others it isn't.

In another post perhaps, I'll mention some lesser known but almost-as-wonderful results in logic such as the Compactness theorem, the existence of non-standard models of arithmetic and Tarski's "Undefinability of truth".

Saturday, December 04, 2004

Sequential Minimal Optimization

A Support Vector Machine is a linear classifier that attempts to maximise the margin (ie. the distance between the classifier and the nearest training datum). Although SVMs are not Bayes-efficient, in practice they often generalize well and are particularily useful in conjunction with the kernel trick, which allows the classifier to work in a large (even infinite) feature space. The standard tutorial on SVM's is Burges' A tutorial on Support Vector Machines for Pattern Recognition. Shivani Agarwal has a very good introductory presentation on SVM's.

Training an SVM requires the solution of a very large quadratic programming problem (finding lagrange multipliers for each data point) which is often intractable. Sequential Minimal Optimization approaches the solution by solving for two l.m.'s (keeping the others fixed) at each step, and doing hill-climbing. Because there are only two variables, the QP can be solved analytically, making the inner loop of the training program very fast.

SMO is competitive with other SVM training methods such as Projected Conjugate Gradient "chunking" and in addition is easier to implement. SMO is the work of John Platt at Microsoft Research, and he maintains a reference page on the topic here.

Finally, there is also an SVM mailing list.

Saturday, November 27, 2004

Next Year's Conferences

I've compiled a list of AI and AI-related conferences that are happening next year. These are just some of the most important ones. If you can add to the list please do so. I've listed them in the order of Name, Venue, Conference date and Submission deadline.

International Joint Conference on Artificial Intelligence(IJCAI); Edinburgh, Scotland; Aug 1-5 ; Jan 21

American Association for Artificial Intelligence(AAAI); Pittsburgh, Pennsylvania; July 9-13; March 18 (right after the IJCAI notification deadline)

The European Conference on AI(ECAI) is not taking place this year because of IJCAI.

Uncertainty in Artificial Intelligence(UAI); Edinburgh, Scotland; July 26th-July 29th; March 16th

International Conference on Machine Learning(ICML); Bonn, Germany; August 7-11 ; 1st March

Conference on Learning Theory(COLT); Bertinoro, Italy; June 27-30 ;Feb 2

International Conference on Automated Planning & Scheduling(ICAPS) ; Monterey, California, USA ; June 5-10 ; Nov 17, 2004

International Conference on Theory and Applications of Satisfiability Testing(SAT); St. Andrews, Scotland; June 19th-23rd; Feb 20

UPDATE:

International Conference on Automated Deduction(CADE) ; Tallinn, Estonia; July 22 - July 27; February 25

International Joint Conference on Neural Networks (IJCNN); Montreal, Canada; July 31st - August 4th 2005; Jan. 31

Wednesday, November 17, 2004

Steve's Grand Vision

Steve Grand, the creator of Creatures, and an independent AI researcher in his own right, has written an article in the IEEE Intelligent Systems magazine called Moving AI out of its Infancy: Changing our preconceptions. I find his ideas interesting, but not as groundbreaking or revolutionary as he imagines.

He dismisses traditional AI and connectionist methods as unsuccessful and based on false assumptions. A more or less fair assesment perhaps, but I think he has misunderstood the nature of what he calls "New AI", specifically claiming that it is influenced by the nervous systems of the simplest invertebrates. He is in search of what he calls the "periodic table" of AI, some radical new paradigm of intelligence that will transform AI from alchemy to chemistry, so to speak. I have serious issues with this. First, he bases his methods on being able to simulate the human brain, a very anthropocentric approach that may not be the best idea, given the tools of computation we have now and how little we understand (or even think we understand) the human brain.

Second, i think the view that intelligence must be based on one simple and easily described paradigm, that we just need to find and apply, is wrong. Researchers in Computational Brain Theory seem to be coming round to the view that the intelligence of the brain doesnt follow from one particular model of computation/cognition (say a hopfield network) but is an emergent behaviour of some pretty complex sub-systems which are functionally and operationally quite different.(More on this in a later post.) Intelligence must be engineered, using many different ideas, and we will most likely approach it asymptotically, like evolution did.

The rest of the article mentions some of the insights that he has, and I think a couple are worth mentioning, "Brains dont make decisions" and "Brains preform coordinate transforms". But I think these are just interesting alternate viewpoints on the problem itself and not necessarily constructive steps to a solution. I was disappointed that he seems to be focusing on questions that are exclusively in the realm of robotics. What about common sense, learning etc. ? Oh Well.

Tuesday, November 09, 2004

Conditional Preference Nets

Conditional Preference nets, not to be confused with coloured petri nets, are a relatively new and hot area in decision theory today. At AAAI'04, there was a whole session devoted to them. CP-nets are an intuitive way of representing qualitative preferences between a set of possible outcomes, in a form analogous to a directed graphical model of a probability distribution, i.e. a Bayes Net. This obviously borrows ideas from utility theory and reasoning in graphical models. The main idea that CP-nets borrow from utility theory is the attempt to elicit preferences between two features of the outcome Ceteris Paribus (all else being equal). The classic example is "Given that fish is being served, I prefer white wine to red." The major difference is that wheras utility theory attempts to quantifies the preference by comparing one outcome to the lottery between two others (for eg. if our preferences are A < B < C, then the utility of B can be decided by computing the value of p s.t. the user is indifferent to a choice between B and a lottery of A and C with distribution (p, 1-p)), a CP net merely quantifies those preferences, but in a compact form.

It does this with a representation similar to the conditional probability tables used in a Bayes Net. Consider the following diagram and imagine the arrows on the edges all point downwards.

A B
\ /
  C
  |
  D

Then the preference for the attribute A could be A < A' and for B, B < B' . Since C has A and B has parents, its conditional preference has to be a function of their values. For e.g.
(A^B)V(A'^B') => C < C'
(A^B')V(A'^B) => C' < C
Finally, the preference for D is a function of C :
C => D < D'
C' => D' < D


The CP-net represents a complex 'joint preference distribution' in a compact form. The standard inference task in a CP-net is to compute dominating outcomes - those which are either preferred over or indifferent to all other outcomes. Reasoning with feasibiity constraints over the outcomes can also be done; this is like reasoning in a Bayes Net with evidence nodes. Some recent extensions to the CP-nets paradigm include TCP-nets, where there are preferences among preferences, UCP nets, which incorporate utilites into the network and mCP nets, which maintain seperate CP-nets for multiple agents that can share or make visible certain attributes.

A tutorial on CP-nets by Boutilier, Brafman, Domshlak, Hoos and Poole appeared in JAIR'04.

Sunday, October 24, 2004

Rats flying F-22's

Scientists at the University of Florida have successfully conducted a very intriguing experiment.

A collection of 25,000 "living" neurons were taken from the brain of a rat and cultured in a petri dish. Soon, they claimed, they had a living computation device with enough power to control an F-22 fighter jet. Using a multielectrode array to interface between the neurons and a desktop computer, they set up a bidirectional communication channel and "taught" the neurons to operate a flight simulator. Pretty soon, they had a network that could maintain a relatively stable flight by controlling the pitch and roll of the aircraft in various weather condition.

Pretty impressive I'll admit. My roommate, a microbiologist, is all over me - "Ha Ha. we'll have AI before you will." But it's one thing to learn to control aircraft, another to do ,say, common sense reasoning, as AI researchers have found. Pretty soon they're going to run into the same problems we did, when they try to "scale things up." They seem pretty optimistic:

"We're just starting out. But using this model will help us understand the crucial bit of information between inputs and the stuff that comes out ... And you can imagine the more you learn about that, the more you can harness the computation of these neurons into a wide range of applications."

Best of luck guys, you'll need it.