Tuesday Aug 11, 2009

Why do you want a free ISMIR registration?

Sun is sponsoring ISMIR again this year. As part of our sponsorship, we've been given two registrations. One of the registrations is going to our intern Fran├žois as our way of saying thanks for all his hard work, but we still have one to give away.

We'd really like for it to go to someone who really needs it, so if you would like to get our second registration, please send me a paragraph (just one paragraph!) explaining why you need our free registration. Send it to me by this Friday, August 14th, at noon EST, and a panel of experts (well, me and Paul) will look at them and decide who gets the registration.

And don't worry if you've already registered, if we select you, you'll get a refund.

Tuesday Mar 10, 2009

New look

A new look for the blog. This is the new "Cloud" theme for blogs.sun.com. Restful, isn't it?

I realize it may seem a bit trendy (and bandwagon-jump-ony), but on the AURA project, we've been building a system that's meant to run in the cloud. I'll probably be blogging about this more in the future.

Plus, I must really mean it if I switched my theme from "Green", right?

Wednesday Mar 04, 2009

Look what's back in Netbeans!

I was having a quick look at the new NetBeans 6.7M2 this morning, and look what I found in the plugins list:

W00t! Jackpot is back!

Man, I could have totally used this a couple of weeks ago. I was switching from using custom logging in Minion to using java.util.logging Jackpot would have relieved me of the need for a very evil Perl script (is there another kind? (I kid, I kid!)) and a lot of hand editing.

Extra awesomeness: Prolog editing mode! Now I can dust off the family tree NLP system that I wrote for CS240 (using Waterloo Prolog on an IBM mainframe, no less) in 1987!

Wednesday Feb 25, 2009

The upside of blogging

Paul's been encouraging me to blog more, and I'm trying to do that. I think my blogging habits have tended more towards the long form than the short. Still, I've been writing my series of entries about the dictionaries in Minion, and now I have a list of things to do in Minion that will ensure that:

  • I will need to completely rewrite these posts in the next couple of months.
  • I will need to figure out how to tell my wife that I want to spend all weekend for the next four weeks hacking on my search engine.

I guess there are worse problems to have!

Thursday May 08, 2008

As close as I'll get to going to Harvard

Although I did attend the The Harvard of the North!

I gave a talk at Harvard early in April on the history of search at Sun. If you're interested you can watch the video. I'm the big dude up at the front there. Also, you get a few thrilling seconds of Jim Waldo close to the start.

I go into a bit of background on the early work that formed the foundation for our engine, and some of the stuff that's going into Minion and Aura.

Monday Mar 17, 2008

My Dad can build a better indexer than your Dad!

An interesting (I think) thing about me is that I'm a second generation search guy. My father (searchdad?) worked at the Computation Center of the National Research Council of Canada, where he designed and helped build CAN/OLE (Canadian On-Line Enquiry) for the Canadian Institute of Scientific and Technical Information (CISTI). CAN/OLE offered access to a number of databases like Inspec.

I'm giving a talk in a couple of weeks that includes a couple of slides on the history of IR and I send dad email asking what the properties of the CAN/OLE collections were like. They were (I'd say) big for the time: between 5 and 15 million records (although the records were only a few hundred bytes --- this wasn't full text!) and somewhere between 3 and 6 GB of storage (on IBM 3330 disks, mind you, so we're talking a lot of floor space here!).

Dad mentioned that a google search for cisti +ole would yield some hits, and look what I found, Dad's even mentioned by name as the designer and one of the implementers. Cool!

The picture is my father and me circa 1976, about two years after CAN/OLE launched. I picked the picture out of a collection of family pictures that he sent me last year. As I was clipping us out, I realized that he's about 3 years younger in that picture than I am now. Yikes. Up until my father, the family profession was carpenter. I guess that now it's Information Retrieval. I'd better get cracking with my son!

Friday Sep 21, 2007

The Kids in the Hall episode that features my legs (and possibly my face)

While looking for a video that I wanted to show my manager I came across:

I was at the taping for this KITH sketch (they asked us to dress up so that the audience shots would look "Scandanavian"). I've seen this a few times (usually in a better resolution) and I'm pretty sure that at one point you can see my legs and a flash of my face.

Dang, I miss the Kids in the Hall.

The New Solaris Installer

Dave Miner blogged about it a while ago, but I got my first chance to try out the new Solaris Installer this week on my laptop.

I'm a latest-build sort of guy (ask anyone in Sun Labs East: if you want to install Solaris, talk to Steve. Just don't ask me to flash the BIOS on your motherboard!), so I've been using Live Upgrade which lets me keep using my computer while it's being upgraded.

Still, I wanted to see what the new installer was like, and I have to say, without a doubt, that it was the easiest install experience that I've ever had with Solaris. I'm not a really old hand at this: I've only been installing it myself since around build 65 of Solaris 10, but the difference is dramatic.

The user experience folks did a great job on the new installer. The interface is straightforward and the choices were easily understandable (of course, I may be a bit of a bad subject, since I could handle all of the complicated choices in the old installer!) It felt like there were only about three clicks and then the installer ran to completion (which still took a while, but, hey, you can do something else while it's running, eh?)

Aside from the interface, my most favorite thing about the new installer is that it noticed that I had a fair bit of room and set the disk up for Live Upgrade for me. This was always the most annoying part of doing an install: Fiddling around with the partition sizes to make the two root partitions, the swap partition, and an everything-else partition (Seriously, I'm doing this on a computer several orders of magnitude faster than the ones that we used to send men to the moon, do I really need to do the arithmetic on the partition sizes for it?)

Great job installer folk!

Friday Aug 31, 2007

But what I really want in a refactorer...

I'm back and refactoring. When we started developing our search engine, we were developing for deployment with the Portal Server. Java 1.3 had just come out, but the Portal Server ran in a 1.2.2 VM, so we developed for features only in 1.2.2.

Of course the world's changed since then, and part of what I've been doing is going through the code and modernizing it. I mentioned jackpot and I've been using the "basic" refactoring for a while, but what I want is a set of "modernizing" transforms.

For example, the attributes that a field can have in the index were represented in the traditional ored-together-powers-of-two style. There are a number of well known problems with this style, not the least of which is that if you want to add attributes sanely in such a situation, you need to do all the checking yourself.

Converting the powers-of-two to an Enum was easy, but changing all the places where those powers of two was used was a total pain. The set of transforms was fairly straightforward (replace the name of the constant with the name of the enum, replace bitwise ors with equivalent EnumSet.of, etc.), and I found myself thinking that a reasonably smart refactorer would have gotten me about 90% of the way.

The next big thing like this that we're facing is the Great Genericization of '07.

I guess if I want it, I'm going to have to write it myself.

Thursday Aug 30, 2007

Thanks Jackpot!

We've been working on code cleanup, documentation, and sample programming for the search engine. Along the way I wanted to replace a method call returning an Iterator backed by a collection that could change with a method that would return a copy of the List and then get its iterator, thereby avoiding the dreaded ConcurrentModificationException.

This sounded like a good time to try Jackpot. I had to install the development build of Netbeans 6 to get the development update center, but once that was done, it took about 5 minutes to write the Jackpot query to replace the old code with the new.

Having done something simple, I wrote a set of queries to replace a bunch of delegated methods in a class with a getter for the object to which the methods were being delegated and then calling the appropriate method on that object. The queries took about 20 minutes to write and then a few more minutes to debug and run.

All told, I reckon it took about an hour to do what would have taken a few hours of typing, compiling and fixing bugs. Thanks, Jackpot!

Monday Jun 04, 2007

Obligatory Ottawa boy hockey blogging

My secret shame: I'm Canadian, but not much of a hockey fan (well, not a fan at all, really. Hal Stern's 1000 times the fan that I am.) Heck, I can barely skate.

Still I'm an Ottawa boy, so I guess I should say something about the Senators being in the Stanley Cup. Mostly, I just can't believe that they put Don Cherry on NBC. Nice to see that he still has the same tailor. I guess he got to meet the governator, which is nice for him.

The Sens came to Ottawa after I'd left, but my Mom says that the people there are pretty excited. Apparently, my niece and nephew are getting a lot of mileage out of the fact that Brian Murray, the coach of the Sens has a cottage on the same lake as our family cottage.

So, Go Sens Go!

Sunday Jun 03, 2007

A peek at Google's search quality team

From Slashdot, a good article on Amit Singhal and the Google search quality team (run by Udi Manber, formerly of A9). The article talks about the "200 signals" that the ranking algorithm considers when deciding what rank to give pages in response to a search.

The signals are not just the data from the page, but also metadata like the frequency of changes for a page and personalization information like previous searches you've run (if you're logged in, of course!)

The interesting thing is the description of how they evaluate queries. Each query is run against a number of classifiers that decide what kind of search it is and therefore what kind of pages to return for the search. This is probably one of those instances where having the logs of billions upon billions of searches and the computing power to analyze them gives Google a distinct advantage.

Also, keep in mind that all of this computation is done in far less than a second!

Tuesday May 29, 2007

Replacing My Deck Stairs: A Refactoring Parable

When we had the inspection for our house last December, the inspector pointed out the stairs leading down from the deck to the back yard. He said that the boards were starting to rot and we'd probably need to replace the stairs in the spring.

I'm not exactly a super handy man, but I figured that I could handle this problem. I went out to Home Depot and bought some nice pressure treated stair treads and the stair angles to attach them to the stringers (side note: there's a piece of zinc coated metal to connect pretty much any two pieces of wood.)

On Monday morning with my hammer and pry bar, I set out to remove the treads, thinking that I would have the new treads on by the end of the day (or surely Monday at the latest!) As I started pulling off the old treads I started to notice that the stringers were a bit weird. Definitely not like the ones in the deck building book!

Small pieces of wood had been tacked onto the stringers to support the stair treads and the risers. There didn't seem to be any rhyme or reason to the additions: the builder of the stairs simply added a piece of wood wherever he thought was necessary.

As I pulled off further treads, I noticed that at one point the middle stringer had boards nailed onto both sides. Upon further inspection, I found that the stringer had been cut in two (clearly with a circular saw) and then patched back together. Seems like it would have been a better idea to start from scratch and cut another stringer, but I guess they were either in a hurry or they wanted to save money.

Of course, this haphazard approach to connecting things to the stringers (mostly things were nailed on with a few screws here and there (Why? You've got me!) lead to cracks in the stringers at the tips of the treads, which meant that as I removed the treads, the stringers were breaking apart.

So, about halfway through the removal of the treads I realized that I was going to need to replace the middle stringer. About three quarters of the way through I realized that I was going to have to replace all of the stringers.

Let me tell you, once you realize that you don't need to keep some things intact, you can take things apart a lot faster.

As I got closer to the deck that the stairs landed on, I realized that the stringers were simply sitting on the deck boards and that those boards were rotted clean through. I'm glad it didn't collapse over the winter, and I guess that speaks to the strength of wood, even when people do strange things with it.

So, the deck boards had to go as well.

At this point it was getting to be pretty evident that not a lot of planning had gone into this deck. This became abundantly more clear when, as I was removing the deck boards I realized that, while one railing post was centered on a joist, the other was about 3/4 of an inch away from a joist. Couldn't they have made the stairs 3/4 of an inch wider?

Throughout the project, I encountered a few boards that were in good shape and well attached, but I had committed to everything going.

I finally finished, about five hours after I started. What I was left with was this:

sore muscles and the wish that there was version control for the physical world. Now all I have to do is rebuild the deck and the stairs before my son's birthday in two weeks...

Wednesday May 02, 2007

New theme

First post in a while today and I decided I'd switch to a fancy new theme.

This theme is nice, not only because it gives interesting information about Sun's Eco Responsibility efforts, but because over there in the bottom right corner it contains a nice affirmation: Go Green!

I feel so motivated!

Thursday Jan 04, 2007

Achelbow climbing the charts...

Paul points out that we're already up to 73 hits for achelbow.

Looks like a lot of those are hits from various aggregators (like the very interesting Findory.

Note that there's a hit in there from my blog entry and Paul's blog entry, but also hits from the front page of blogs.sun.com. In a perfect world, the blog post would be canonicalized and we wouldn't see that hit, but I expect the folks at Google have a lot more pressing things to deal with...


This is Stephen Green's blog. It's about the theory and practice of text search engines, with occasional forays into Machine Learning and statistical NLP. Steve is the PI of the Information Retrieval and Machine Learning project in Oracle Labs.


« July 2016