Why aren't my tools learning?

This is going to start like whining soon, so this'll be my last post about this particular topic (at least for a while...)

After not attaching a file to an email message to Tim, I sent him an message to which I attached the version of the file that he had sent to me. Another obvious mistake that the mail client could catch.

It points out a kind of general problem: my tools don't learn. They sit there all day, watching me work, but they never really pay attention to what I'm doing. I probably produce hundreds (thousands?) of events every day that could be fodder for any number of machine learning algorithms.

Shouldn't my applications (all of them!) be noticing the directories where I open files and start proposing the directory for a new document when I save it for the first time? Shouldn't my email client notice where I file messages and start offering to file them for me? Shouldn't my IDE notice that when I type the name of a variable that's a java.util.Set I almost always either call add, contains , or iterator?

If you're wondering whether collecting this kind of information and learning from it is useful, simply consider the Google spelling corrector. I'm guessing that they're using a noisy channel model trained on misspellings pulled from their query logs.

Perhaps we can move past desktop search to desktop learning?


Here, here, brother.

I'm not sure what this would do to CPU cycles, but it sounds like a pretty low-impact option to me.

Now the only issue is building this into apps. It sounds like more of an OS solution to me that developers can tap into. Having every developer under the sun try to establish their own algorythms would be both taxing on their resources and hinder cross-polination among apps.

Honestly, hasn't this whole train of thought been brought up before? I'd be interested in finding some other sources of debate/proposed solutions.

Obviously, desktop search is just now being pushed, so maybe this is the next logical jump from smart folders, etc. Maybe we'll see something in the next release of Windows following longhorn -- 2010? Or the next iteration of OSX in 2006/7.

The concept is certainly worth further discussion beyond programs alerting you to simple user errors.

Posted by Lee Dale on May 09, 2005 at 10:32 AM EDT #

Hi, Stephen. If you are interested in automated spelling correction, you might enjoy this paper by Silviu Cucerzan and Eric Brill at Microsoft Research, "Spelling correction as an iterative process that exploits the collective knowledge of web users". I have worked on related systems in the past and found the paper to be a good read.

Posted by Greg Linden on May 09, 2005 at 11:20 AM EDT #

On the topic of your IDE learning what methods to suggest, check out RASCAL, described in <cite>Knowledge reuse -- Software reuse</cite> by McCarey et al.

Posted by Rory Parle on May 09, 2005 at 12:12 PM EDT #

Check out "Mylar" at http://kerstens.org/mik/. Attention-oriented stuff from a guy who used to be at PARC.

Posted by Adam Rosien on May 09, 2005 at 01:01 PM EDT #

Yep - me too. This stuff would be good but only if it's implemented properly - so none of this "You appear to be writing a letter ..." nonsense !

I think my main problem, is that in the years since I've used a gui, it hasn't got a lot better, nor has my mailer, nor my cd-playing application : we haven't moved terribly far from the basic functionality (apart from perhaps inventing a gazillion new toolkits and programming frameworks)

What's worse is that a lot of the time, I don't think we even need sophisticated machine learning techniques to make things easier on users, we're already annoying users by throwing out error messages every once in a while about something they're doing that we don't expect - why not gather the error messages a typical app generates and get a programmer to really think about them for a bit (c.f. my small whinge about GNOME and ejecting volumes) It's the little things like this that could really make a difference.

Posted by Tim Foster on May 09, 2005 at 07:21 PM EDT #

I believe KDE is trying something similar to what you suggest, with the new Tenor project: Tenor, The Context Link Engine.
Quoting from that LinuxPlanet article:

The idea is to not throw away contextual information and not ignore meta data, as we do now. Instead, we should store these things for later reuse and retrieval.

Tenor support must be deeply rooted in the environment it works within.

If I save an image attached to a certain e-mail today, a lot of relevant context is lost forever (unless my brain accidentally remembers it): Who sent me the picture? Is the sender in my address book? Did she send mails before?...

Posted by Pramod Biligiri on May 09, 2005 at 08:52 PM EDT #

Thanks for all of the pointers, guys! I knew that there must be something like this going on out there. The Microsoft Office clippy stuff has been around for a while, but it's been a failure, as far as I can tell, mostly due to the annoyance factor.

I'll have to look into the Tenor stuff in my copious free time :-)

Posted by Stephen Green on May 10, 2005 at 02:46 AM EDT #

And, I just read an article in Information Week discussing Artificial Intelligence and noting goals from IBM, Intel, Microsoft and more. http://www.informationweek.com/showArticle.jhtml?articleID=161501161

Posted by Lee Dale on May 10, 2005 at 04:19 AM EDT #

Post a Comment:
Comments are closed for this entry.

This is Stephen Green's blog. It's about the theory and practice of text search engines, with occasional forays into Machine Learning and statistical NLP. Steve is the PI of the Information Retrieval and Machine Learning project in Oracle Labs.


« June 2016