Google is your friend
The idea has been suggested before but this is the first realistic attempt that I know of. Their approach is interesting for two reasons. First, it has strong theoretical justification: an argument based on Kolmogorov complexity and optimal string encodings. Basically the metric they use, called the Normalized Google Distance, is universal w.r.t. the Google Distance of individual authors ie. the NGD of any two words is within a linear factor of the GD of those words in the web documents originating from any one source.
Secondly they have impressive experimental results, especially one involving the heirarchical classification of a set of numbers and colours. Another set of experiments uses the NGD between an instance word and a set of "anchor" words to define a set of features that is used as input to an SVM. By using the correct set of anchor words, they were able to classify all words that are "electrical" terms with 100% accuracy.