Monday, January 28, 2008

Google Sets, Text Mining, and Enterprise 2.0

I was browsing on Google Labs as I do from time to time to see what the alpha-geeks are up to.

I had never looked at Google Sets before and when I looked at it this time I almost immediately dismissed it as useless. I mean who cares if I can create sets of things? But then I had the idea to type in some bands that I like to see what happened (the starting set was peaches, shiny toy guns, dresden dolls, goldfrapp, and the knife).

It instantly popped up with a long list of bands many of which I know and like and some that I have never heard of. It's the ones that I had never heard of that got me thinking.

This is exactly the kind of problem that pharmaceutical scientists are trying to solve every day. They have a bunch of things that they know are related and they want to find the other things that are related that they don't know about. But the text mining tools that they use to do it are very expensive and painful to use.

This set interface is so simple. So intuitive.

I imagine that the algorithm that Google Sets is using is some kind of basic co-occurance test, so there are lots of tools out there that are more sophisticated. On the other hand I didn't get any hits for sharpening stones, so it has to be at least a little more than that.

If everything inside the pharmaceutical firewall (or better inside+outside) could be indexed into a tool like this would it be useful? Yes, I think so.

It seems like a big problem, but it is a tiny problem compared with the one that Google has apparently already solved.

David

No comments: