Wednesday Mar 17, 2010

Is Your Farnsworth Snowcrashing?

Listening to the Myst soundtrack (Revelation) really takes me back to the Myst games and their fantastic atmosphere. Have you seen Avatar? In the 90's, Myst was something as immersive as Avatar today. I still like Myst more, because it is more of a "private vacation spot" than the more hectic and (in comparison) more densely populated Avatar.

I'm surprised that Myst is not mentioned in this context more often, but for me it was the first example of Steampunk. It may not be pure Steampunk, but it has everything I consider defining features: An anachronistic fiction that conjures up life in a world with the best of the future and past combined. Idealistic retro science-fiction. :-) Don't you think?

The past aspect has an enlightened Victorian Era feel to it: A peaceful time where something new is discovered every day, long before unromantic consequences like pollution and concrete slabs. The future aspect brings all the practical things that you wouldn't want to live without, even in a romantic past: Science, computers, video phones, submarines, (steam) trains and cars, blimps, space ships, rayguns.

When your car, or computer, or phone, breaks down, how often is "the electronics" (microcosmos) or "the network" (macrocosmos) to blame? In Real Life, you just stand there with a brick. Frustrating. In Steampunk however, everything is "human-sized", graspable, controllable. Steampunks never lose control, because they have a screwdriver, a barometer, and welding goggles: Three examples of artifacts that extend a person's dexterity, senses, and resistance. In a Steampunk world, that fixes any problem.

All the positive modern and sci-fi concepts (transport, communication, arts, science, medicine, education, etc) are kept, while negative aspects are replaced by something natural, clean, "mechanic". Artificial materials like plastic and concrete are replaced by brass, wood, and marble. Neonlights are replaced by candles. Mass-produced prints are replaced by DaVinci-style hand-written sheets. Intangible electronics are replaced by gears and levers. Your computer turns into an Analytical Engine, and your screen is a laterna magica or a Farnsworth. The capacity of the item stays the same (or is even better) as today, but Steampunks would never get a menial "bluescreen of death". :p

Nah, Bluescreens, that would be... Cyberpunk, wouldn't it? :-D Cyberpunk characters are stuck in a frustrating complicated dying inhuman technoverse where everything (including people) is controlled and owned by a few powerful beings -- like in the Matrix. In comparison, Steampunk characters lead very free, independent, and individual lives. Come on, Myst's Atrus has whole islands and planets to himself!

Firefly is another nice example of Steampunk that comes to mind, I'm sure you can find more: It can be films (Sky Captain, Brazil) and games (WoW Gnomes), novels and graphic art, even fashion and other DIY crafts... Check out the pictures. Pretty, eh? :)

Friday Jun 05, 2009

JavaOne: Toys are Us!

And now the JavaOne session everybody has been waiting for: James Gosling's Toy Show! Here a short rundown of the coolest tech demos we saw this morning:

Angela Caicedo and Simon Ritter are back, and this time they hacked a standard household wiimote. Simon brought a small white board and attached a few markers. He used an JavaFX app to triangulate the position of the markers with data from the wiimote IR sensors. He then projected a perspectively distorted image of a playing card onto the surface, and the image stayed in place even if he moved.

When he turned the board around, the app detected the motion and projected the backside of the playing card. :) It also supported more "motion sensor"-like features like shuffling to flip to a different card. To prove that there is no hidden magic in the board, he replaced the board with an opened white umbrella (again with markers) and projected an image of planet earth.

Continuing the them of "JavaFX is for all screens of your live", Angela presented her approach to a wiimote hack: Similar to her demo from last year she relied on a Minority Report-like glove with IR markers. As apposed to Simon, she can turn any white surface (such as a wall) into a "touch screen". In her demo she projected a canvas and a color palette, and move her finger in the air to paint lines and mix colors . She could reusue existing JavaFX features to animate a ball icon rolling along the drawn line--until the line ended and the ball dropped off the canvas. :)

Next Tor Norbye demoed his new JavaFX designer tool: He placed an object node onto the canvas and recorded 3 different keyframes, then he let JavaFX interpolate the animation. The end result had the textnode swing in and bounce off the floor to its intended position.

Tor also demoed a very userfriendly interface for binding components to values: You drag a line from one component to the other, and a menu of possible target values shows up that would make sense to bind. To give you an example: You may want to bind slider's left/right side to the video's start/end position, and a toggle button to the play/pause action. of course you can do that with all components and all properties (opacity, translation, color, rotation...). His tool allows you to save visual content for mobile and PC screens (and soon also TV).

We also learned of PlaySIM, a simulated SIM card (JavaCard) on a Sun SPOT. It allows you to set debugger breakpoints in live SIM card code. In the demo they used one Sun SPOT's motion sensor to trigger the menu of a phone (which was attached via a 2nd SPOT). check out for more details.

The FIRST robotics league brought one of their robots from this year's lunacy game: Robots throw balls and catch them in baskets. There are different periods in the game, e.g. one with human remote control, and one with autonomous robot control. The finals were very popular and filled the Georgia Super Dome. :) The robot on stage sucessfully collects balls, but then proceeded to throw them at James Gosling... Today's news is that from this year on (?) the students will be able to program robots in Java too (including on-device debugging), not only C/C++ (if I got that right).

The big highlight for the NetBeans Community was Sven Reimer's NetBeans platform-based application: A controller used in satellite ground stations. Gosling recollected when he used to analyze satellite data with a PDP8 (?) that had less power than a smart card... :) Sven's app ran in demo mode only since "some grumpy people didn't let us actually control satelites". ;( As a final surprise, Gosling became a honorable dream team member (well, he got the shirt) and received a copy of the community-translated (!) NetBeans platform book (originally in German).

Another interesting guest was Visuvi: Not only can you upload (cell phone cam or hi-res) images to their search engine and have them analyzed (E.g. to answer the question "who pointed that?"), but most importantly, the new image analysis technology is used for cancer research (e.g. you can search through a biopsy image database for visually similar cases).

Other demos included a micro financing app, a Solaris+JavaFX powered jukebox for starving musicians, a printer-scanner for teachers that scans student's test sheets as well as the answer sheet and then calculates the score.

The last demo was a video interview with the team around the Lincoln car that was on display in the Java Pavilion all week. The plan was to create a fast-driving drive-by-wire vehicle for the Realtime Java urban challenge. And Mr. Perrone refitted a stylish Lincoln Continental: He added batteries and a generator, self-diagnostic sensors and GPS, and finanly touch screen UI. The break lights and old speedometer are controlled electronically. They showed some cars on a test drive, but, sorry, I missed whether the Licoln ended up being remote controlled or not. (Leave a comment if you caught that please)

I think this year's message was: Different communities use Java technology in different ways. Astonishing (and inspiring) what you can do! :-)

Thursday Feb 19, 2009

Starting With Solaris From A Linux Point of View

A few weeks ago, Arun wrote in his blog about how to install OpenSolaris on Virtual Box. Let me add some OpenSolaris usage tips that I collected over time (so if I forget them, I can go back to my blog). ;) They are intended for users who already have prior experience with Linux and the command line in general.

When I say "Solaris" below, I mean OpenSolaris 2008.11, here is how to upgrade from OpenSolaris 2008.05 to 2008.11.\*

Processes and commands

  • The Linux top command displays the list of running processes. On Solaris this command is called prstat.

  • If you're looking for one particular process (e.g. firefox), use pgrep -l firefox; if looking for processes by one user (joe), use pgrep -l -u joe. If looking for both, combine them to a quick pgrep -lu joe firefox.

  • Solaris has no sudo command like Linux's "substitute user do" to execute one-off commands with admin permissions. On Solaris, use su to 'assume the identity' of another local user, including the root user, if you know the password. In contrast to sudo, you're responsible for exiting back to your own identity. So, before changing config files, type su - to become root. Including the dash argument will also update your shell environment (most visibly, the commandline prompt will say 'root' so you can actually tell which of your shells is the root shell).

  • You install a .pkg file with the pkgadd -d name command (usually as the superuser).

  • One tip (luckily) not from personal experience: On Linux, you kill an amock running process by killing its process ID (pid); if the Linux user doesn't want to look up the pid(s), she can use killall name to kill all processes with that name: Basically a handy shorthand for grepping the process list and killing them individually.
    On Solaris, this "kill by name" command is called pkill. The Solaris killall command however, well, kills all processes, period. You might as well shutdown and discard all unsaved changes...! You have been warned. :-p

Devices and media

  • If you "insert a DVD" (using Virtual Box's ability to mount disks from the host drive, or ISO images) it will show up under /media/CDROM. I haven't tried USB media yet (probably not supported?)

  • You can get an overview of devices and drivers (disk drives, mice, network and graphic cards) from the main menu: System > About OpenSolaris > Devices. E.g. my network card driver is e1000g, so I know in the file system, my first network interface will be represented as /dev/e1000g0 (see "ipconfig" tip below).

  • I don't suggest to mess around with device tables, it's just good to know files like /etc/mnttab and /etc/vfstab in case you want to look up a device path for another config file. (Strangely I don't see my hard drive's /dev path though, is that because it's virtual...?)

  • If your audio (e.g. on a MacBook, Core Audio + ICH driver) doesn't work, get drivers from Open Sound.

  • The shared folders feature is not available for the Mac/Solaris host/guest combo. I use the network (ftp, scp, or simply mail) to get files out or in of the VirtualBox.

  • Another way to get files from the Mac Finder to Solaris: Put the files in a folder, use the Apple Disk Utility to create a disk image (.dmg) from the folder, use the same utility to convert the .dmg to a DVD master image (.cdr), rename .cdr to .iso, then use VirtualBox's Virtual Media Manager menu mount the disk image as a Solaris medium, and access it from the /media directory. Phew... If you know an easier way, please leave a comment. =-)


  • If Solaris does not seem to use the network interface, check whether Virtual Box is set to use the "host interface" (for me the setting defaulted to NAT).

  • Solaris uses the nwamd demon to auto-detect and use DHCP on your network, and it even detects wireless networks, very useful. There is a Network control panel, but if you're just a user running Solaris on a notebook or PC, look for other sources of the problem before you fiddle with nwamd:

    1. Use the command ifconfig -a to see whether the DHCP server assigned you an IP address. (Hint: Look up the name in the device list (see above), e.g. e1000g0. The IP address stands next to the word 'inet'. LOOPBACK and don't count!) If not, check the cables and whether other PCs can access the same DHCP network.

    2. If you do have an IP address but still cannot open any web pages, test whether you can browse to a web page by its IP address: Use the host command on another machine to obtain a test address. (E.g. typing host returns If the browser is able to open the web page by its IP address (e.g. then you know you are online - but your name servers are not configured!

    3. In this case, open the file /etc/resolve.conf and add entries for name servers. (You need to be the superuser to edit the file, see su above.) Copy the name servers' IP addresses from another machine (I got them from the Mac's Network System Preferences), or ask your admins.

More about using and configuring Solaris, for example installing it on a Virtual Box and more info on nwamd.

Read the OpenSolaris Observatory blog to stay up-to-date.

\*) This method seems to have worked for many, it trashed my VirtualBox though. If you want to save time and already have an OpenSolaris 2008.11 DVD, use that, upgrading is not faster than a fresh install.

Monday Jun 23, 2008

The Deadly Doc Writer Approach

I just came across a (not so new, but still valid) article by Amazon's Werner Vogel. What's the best way to plan a new product? Work backwards.

Reminded me of a great talk I attended at the ADHOC (MacHack) some years ago, "Writing books backwards", but it turned out to be a different punchline. What Vogel means is, before you implement anything, write the press release and manual. Interesting. (Is he a tech writer, by any chance? I always document my weekend projects so well, I never really get around to implement them...!) ;)

  1. A press release is the summary of your completed work, when it's ready for the customer. Vogel's point is that thinking about the press release first forces you to think in a way that is perfect for customer-oriented planning. Write down concisely what the product is (category) and does (features), why anyone would want it (uniqueness), for who it is (target group), how to obtain it. Looking at it from the point of view of a future customer helps you not to get lost in internal details early on.

  2. The next step of backward planning is to write a basic FAQ page. While laying out the project's (fake) press release, which disconnects did you and your colleagues have about feature priorities? Will your customers have similar misconceptions about the product's purpose? Antecipate the most important questions and find "one line" answers to them. What is this product, what system requirements do you target, how much will it cost, how will customers obtain, install, trouble-shoot and update the product.

  3. Next you write a quickstart tutorial, show-casing an example of how a typical user solves a task with your product. Should be very short and simple, (UI draft) screenshots of the user interface are a plus. (Tip: Use the NetBeans GUI builder for prototype screenshots.) Again, this forces you to think about the workflow from the user's point of view.

  4. The last level of backward planning (we're still planning here, we haven't implemented anything yet) is outlining the user manual. This forces you to clearly define concepts and processes — what to do with your product, and how. The manual should also include a reference section that allows for different ways (incl. syno-/hyper-/hyponyms) of finding information (but you should agree on one way to refer to a process throughout). If your product is used differently by different target groups, write separate content for each target group (don't split up each paragraph of the general manual into "if you are x, do this, otherwise do that"). For example, has two sets of FAQs, targeting either developers who work with the IDE, or developers who work on top of the Platform.

Of course you are not supposed to immediately publish these documents. Keep them as references on your internal planning wiki. Looking at the normal work process backwards during the planning phase has (hopefully) made your project's goal clearer to everybody involved.

It doesn't have to be software, many everyday types of plans can be approached backwards. For example, I once went to an anti-procra— uh, I mean, time management class: One of the strategies we learned was to write down how people will perceive our completed task. Next we determined what we have to do to make this outcome come true. Do this analysis for the best outcome (planning), and for the worst outcome (risk preparation). This risk preparation is called pre-mortem (or premortem) analysis, in contrast to post-mortem. If you think that's morbid - did I mention the time management trainer asked us to write our own obituary?!

Oh, and the "Writing Books Backwards" talk I mentioned? Turned out it wasn't suggesting to write books backwards. It was the same talk as the "Writing Books" presentation. So why the name? Well, in presentations, speakers often draw people's attention by reversing the order of the cognitive process: They show the cool conclusion first, and then elaborate how they figured it out. On that day, this particular talk was the first one in the morning, and the speaker expected people to be tired and show up late. So he decided he'd go through the slides in reverse order, starting out with the details, and working towards the results overview, so the late-comers wouldn't miss the main points!

Which probably shows that the most important factor for choosing a strategy (be it backward or forward) is still how well you know your target group. :)

Tuesday May 06, 2008

Maker Faire 2008 in San Mateo

A few pics from the Makers Faire...

After the faire... Yup, the instructions are in Japanese. But we went by the pictures and the robot worked right away!

Monday Mar 24, 2008

New Blog Spammer Hack?

My brother just discovered a mean blog content hack in an RSS feed. Somebody managed to insert a div with spam text into a blog entry's content (and in one case even into the description meta tag). As opposed to 'normal' comment spam (see rel=nofollow), content spam makes it look as if the blogger recommended the link, which (I presume) gives it a higher google ranking.

So why does the blogger not notice the inserted text? The height and width of the div are zero, so the text is hidden. Some feedreaders however preview entries without div styles, so the inserted text is visible in the RSS feed.

By googling for variations of the link text, I found 7 more blogs. Sure, eight is far from a botnet epidemic. Still it's strange how the same hidden text turns up in the content of eight unrelated blogs. Do they have anything in common?

The eight cases I saw all run on Wordpress, but on different versions. This still does not explain why only these eight were affected. If someone had 'teh über h4ck' to insert arbitrary text into other people's blogs, there'd be A LOT more cases, you would think. So is the common denominator something more simple, such as a weak password? But then, why only wordpress...?

If you have a wordpress blog, please quickly search the page source for a div with style='overflow:auto;width:0;height:0; and tell us whether you got one too. I'd really like to get to the bottom of this Easter mystery...

PS: Update

OK, I found out more. Somebody indeed exploited a bug in WordPress' XML-RPC interface to insert text into certain versions of WordPress blogs. They patched it, but users didn't update.

Do CMS providers like wordpress have something like the netbeans update center? Can they send users a message reminding them to update? I assume not (unless the user signs up to a mailinglist). :(

The recommendation is not only to update to the latest patched version, you also should change your password.

Monday Jan 28, 2008

The Dutch Designer Trashed My Online Presence

Just a boring online shop.

Or... is it?

Friday Dec 14, 2007

Solaris vs MacBook

During the Sun Tech days in Frankfurt, I discovered a Solaris Installfest booth. Shortly before the end of the last session, I showed up with a Mac Book pro and Parallels Desktop, and asked for a Solaris disk... >:-D The guy said, "Sure... go ahead!"

So I created a virtual machine for Solaris 11 and inserted the DVD. I had to hit the (virtual machine's) reboot button two or three times until Parallels won the fight and tore control over the DVD drive from the iron grasp of MacOS. Or something.

Anyway, I was positively surprised how slim the Solaris installer looked. No clicking through 32 pages to set up drivers. It's almost like MacOS, it asks for your time zone and locale etc, and figures out the rest.

Unfortunately, installation of 7 Gigs of software took longer than I expected. Everybody was leaving (see photo)... They switched off the network, and then the lights... And I was only 49% done. I had to go to the airport, I could not wait another hour in the conference center for the installation to complete. Well I thought, I know now that it works, so I'll just close the MacBook (to put it in sleep mode) and reinstall Solaris on Thursday. It's a pity to have to delete the half-finished VM and lose this one hour, but actually no big deal.

So on Thursday, back in the office, I woke up the MacBook, expecting a broken VM with a broken (interrupted) Solaris installation to greet me. Nothing of that sort! MacOS had sorted it out, the installer hadn't even noticed the day-long interruption. It simply continued where it had been, and completed the installation. :-o I don't know whether I was just lucky, or whether it's meant to be like that. Anyway, Solaris runs in the MacBook's VM!

There is only one problem: Parallels does not allow direct write access to the virtual Solaris partition, and seemingly it cannot offer Solaris a recognizable default network device. What, no MacBook drivers? ;-) Here is a solution I found in moazam's blog: Parallels comes with extra drivers for this purpose, you just have to know where to find them.

  1. In MacOS, mount the file /Library/Parallels/Tools/vmtools.iso (in the Parallels menu, "Devices > CD/CVD > Connect Image")
  2. In Solaris, read the Readme and then execute /media/PRLTOOLS/Drivers/Network/RTL8029/SOLARIS/ in the Terminal (as root!)
  3. Reboot Solaris (i.e. the VM)

One tip: The "Connect image" (mount) command is a very useful work-around to get data from MacOS into Parallel's VM! Your image will show up in Solaris' /media/ directory.

Another tip: Same as in MacOS, you can drag and drop files and folders from Nautilus into the Terminal, and it will spell out their paths. E.g. type "cd " into the Terminal, drag a folder onto the Terminal window, and it will complete to "cd '/path/to/whatever/directory/'".

PS: How did I manage to save this as draft and not post it all week?

Friday Oct 12, 2007

Thank Constantine it's Friday

Hehe, interesting page about the Roman calendar (from which ours is derived). The thing is an utter hack. Did you know that...?

  • ... It may have been God who invented the seven-day week, but Constantine I was the one who actually installed it 1700 years ago?
  • ... The year 2000 was actually one of the least likely dates for the world to end?
  • ... The Romans used to have 10 months per year with 30 or 31 days each? The missing (roughly 61) days in the end were just declared "winter".
  • ... The year used to start in March? It ended with the above mentioned 2 months of winter, from which we today got January and February (after we kicked a rather confused leap-month, Mercedonius). This is why the leap-day is added in February -- at the former end of the year.
  • ... The months September, October, November, and December are named after the Latin words for seventh, eighth, ninth, and tenth. However, September isn't the seventh, but the ninth month (etc). This is why this has been pretty much a useless fact since 153 BC, when the beginning of the year was moved to January -- for 'administrative' reasons.
  • ... There even used to be numbered months called Quintilis and Sextilis (the fifth and the sixth)? They corresponded to our July and August (you guessed it, the seventh and the eighth). They were renamed in honor of the Calendar Service Packs Version 1 and 2 released by Julius Caesar and Augustus. Of course you cannot dedicate a lame 30-day month to an emperor; not after another emperor just got a whole cool 31-day month. So Augustus changed the days-per-month so his month could have 31 days too, resulting in the irregular 28/29-30-31-day pattern we have now.

Speaking of which: The above article mentions a pretty confusing English mnemonic, "Thirty days hath September...", to memorize the days per month. WTF? Never heard of it. You know what we learned at school?

Make a fist and look at your knuckles. Start at one knuckle (say, the one below the index finger). Count either a "hilltop" or a "valley" (between the knuckles) for each month. At the last knuckle (below the pinkie finger), you count the hilltop twice, and go back the same way. December will land you on the knuckle below the middle finger.

So what do you get? January, March, May, July, August, October, December are on a "hilltop"; February, April, June, September, November are in a "valley". Rule? Hilltop = 31 days, valley = 30 days, with exception of February. Much easier and more reliable. At least until the next emperor knuckles the whole system in the head.


I once read a fantasy story, the six fingers of time, about a guy who learned to fade into a time dimension 60 times faster than ours. He had the theory that the Babylonians (and other people from this dimension), who basically invented 'modern' time measurement, must have had 12 fingers. Or why else should they have used the dozen (12), the gross (144), 60 minutes (12\*5) per hour, and 24 hours (12\*2) per day, etc? Obviously it was as easy for them to count with 6 and 12 as it is for us to count with 5 and 10 -- right?

Well, there is a very simple explanation that doesn't require 12 fingers. All you need is 12 phalanges! A guy called Scott Reynen puts it very nicely on his page:

"The twelve months on our calendars, twelve hours on our clocks, and twelve inches to a foot all suggest a duodecimal (base twelve) number system, possibly derived from the twelve [phalanges] on the fingers of one hand (not counting the thumb). Duodecimal math is actually simpler than decimal math because twelve has more factors than ten."

Try it: Use the thumb as the pointer and point at the index finger's tip for one, its middle phalanx for two and the base phalanx for three, and so on. The base phalanx of the pinkie finger corresponds to twelve. This way you easily count up to twelve with one hand, while everybody else with the lame 5-finger system only counts up to 5! How stupid is that? Since I read this theory, I started counting with phalanges too, just to freak out the on-lookers with my incredible single-handed counting ability.


You think the geeks' binary system is cool, where you count up to 16 by the fingers of one hand? Oh yeah? Well, the phalanx method at least doesn't make you to use a middle finger gesture for the number 4. :-P

Monday Jun 25, 2007

Scientists Running Wild on the Streets (2)

So I went to the science on the streets (Veda v ulicich) over the weekend. Some impressions:

The virtual doppelganger was really that: They scanned people's faces, from the pictures they generated an animated 3D face, and then added a Czech speech synthesis and made it say stuff. (Yes, Czech tongue-twisters, too.) Some of the faces even came with facial gestures, and they also put a lot of effort in the teeth, tongue and eyes. No speech recognition though, neither was it hooked up to a dialog system requiring such an avatar for interaction.

The "three dimensional paint" turned out to be a liquid chemical substance in a marker, similar to window colors, but opaque. After drawing on paper, the lines foam up a bit and stand out of the picture. (Also interesting for the blind I assume?)

The "mammoth hunters" were a group of poor Czech men, women and children who somebody had talked into wearing "native clothing" and holding flintstone spears while camping in front of the Muzeum and smiling for the tourists. They also tried to play music on cattle horns. :) All the while, the mammoth was hiding inside the muzeum and never came out. Chicken.

One brainteaser I came across was a chain of flat wooden tiles, held together by straps, and you hold one tile and let the rest hang down, and one after the other tile kind-of flips over and down, but the chain doesn't fall apart and no tiles change their position in the chain. Huh? I took one home to figure it out, it's very simple indeed, like one of those Rubik thingies, but still... Who comes up with these things?

The last impression is from the promised "Robots separating waste in the metro station event". In short: They had not found a solution to automatic waste separation. Instead it was a normal student project very similar to the ones we did: I assume if they had advertised it as "Robots vaguely moving towards color-coded objects, sometimes picking them up, and then maybe throwing them in a random direction," fewer people would have bothered to come. ;-) However you could see that the robots where custom-made from scrap metal and old computer parts, which was very cool.

Wednesday Jun 13, 2007

Scientists Running Wild on the Streets

Help! Somebody set free all the Czech scientists! By the end of next week, Cimrman's heirs will roam Prague in their eco-cars, and they will aim lie detectors at your spouse.

Seriously! It says so on a leaflet I picked up at the metro this morning. =-) (Well, it didn't literally say Cimrman. But it did say lie detector and spouse.)

I usually like advertisements as little as the next person, but on a boring metro ride, why not learn new vocabulary in an actual context, right? Well, the terms of public transport, the mortgage and the icteric ads just don't cut it anymore after a while. So I picked up a blue flyer from a student this morning that read "science on the streets!"

The coolest items are on display right at the Muzeum: A neural networking robot that can learn tasks, a "virtual doppelganger" that informs you of the train schedule (doesn't specify whether this involves speech tech, we will see), and (my favorite item) "Robots will separate waste in the Muzeum metro station. Come and cheer them on!" ... :-D

Other scientists will be set free on various squares and metro stations all over Prague. They will demonstrate physics and chemistry experiments, solar telescopes, models of powerstations, a sort of paint for 3-dimensional pictures (?!), motorcycle crash tests, eco-friendly sources of energy, optical illusions and brain teasers, mammoth hunters, and orthopedic footwear (... huh? I swear I never know when Czechs are serious or not.) :-)

Time: June 22nd and 23rd, 2007.
Venues: Muzeum, Jiriho z Podebrad, Namesti Miru and Republiky, and at FJFI CVUT technical university.

Sunday Feb 11, 2007

Search Engineering - the End of N-Grams? (or Not)

My, I completely missed that: Heise now writes My Search Engine understands me (in German), and also A Chance to Rival Google tells you more (in English):

A start-up company called Powerset has set out for the holy grail -- a natural language search engine! One that does not (like the other one that starts with Goo and ends in gle) "simply" strip off stop words from the search string, and then makes parallel searches of lexemes and synonyms in a database of cleverly sorted n-grams?

Powerset say they will use technology from Xerox PARC. Okay, those Xerox guys are good, but having licenced their stuff doesn't solve the task alone. How and what exactly will they do? Dang, the web page doesn't tell... :-[ Hmmm... "Power set"...? Is that a hint of some sort? ;-)

Google is good because it already lets you search for natural input like "Who is that German who believes that 300 years of the history of the early middle ages are made up??" more or less successfully. Just hit "I feel lucky" and find a page where somebody blogged about this original conspiracy theory. But I had to try five times with different sentences to find this perfect hit. -- Well, still better than getting laughed at by your friends for asking them about phantom time, right? In real natural language search you would not have to try paraphrases yourself.

But don't expect Powerset will implement natural language search requests like "How many non-American and non-Russian Astronauts were deployed to the ISS?" that would get you the list of names and a number. This would require that the search engine not only understands the question perfectly, but also a) has access to a list of ISS astronauts, b) has access to a list of their respective nationalities, c) knows that "non" means I want it to filter all persons of American and Russian nationality from the list (and not people whose names happen to look like a nationality), and then d) interprets "how many" as a request to count the remaining entries, and finally prints the answer. That would "merely" require a mapping from each ever-so crazy text-book question type the user comes up with, to a step-by-step search solution, and the tables to look up data from...Then after 12 years of research a user types in "So how many legs do a farmer, a dog and 17 chicken have together?" and your engines gripes "Gimme a break" and dies of karoshi. \*Sigh\*

Answering math questions is not what Powerset attempts to achieve. (If they do, then they really found the holy grail!) Basically search engines are 'only' a more user-friendly interface for common "a and b and (c or d or e)"-style search requests, they are not supposed to do math homework for you. Unless somebody wrote a web page about the exact same question using similar words, these questions won't work. You will have to search for "list of ISS astronauts and their nationalities" or something and count them yourself -- sorry.

But there is still enough work to be done to make normal fact searching more intuitve, and that is more likely what Powerset are up to. They may try to use implicit context, so you could type in "Kubiak eat now" then the search would fill in your city (i.e. the city via which your provider connects you) as default location and list fast-food chains. Also search results could be clustered and labeled better -- are they news articles (presumably more reliable) or blog entries or reviews (if you are searching for opinions), are they privately hosted or on company domains or on university sites, what media are they, and how old? If the user's browser is set to German but she searches for an English word, would she be interested in German results about the same topic too? Etc. I hope they hired a lot of user interface designers.

Ooh, Powerset even have job openings for computational linguists, man, I haven't seen that for a while. Well, currently I already have a job, thank you, ;-) so I will have to wait until the end of the year to see the first prototype. \*Sigh\* Come on, Powerset, give us a Beta! Some Google apps have been in beta for how long? Since, like, the last millenium? See!

Uh-oh. Speaking of Google. I feel a disturbance in the force... Google just published their corpus of n-grams? Free. Gzipped. On 6 DVDs. Woah. \*Waves at Thorsten Brants from Saarbrücken!\*

Of course this does not mean that google is releasing their n-grams in response to Powerset's announcement, because, dunno, Powerset's new method will be the end of n-gram usage now or something. Obviously, Google did not publish the database column that says on which web page this n-gram was found (data which has to be refreshed regularly anyway)... :-P

If you don't know what an n-gram is: It's just sequences of words like they typically appear in text, sorted by frequency. This involves tokenizing loads and loads of text. Amounts of text which can be found for free on the internet.
For instance, a 3-gram (trigram) is a typical sequence of three words torn out of context ("I am a, I am just, I am here, I am the, I am not" etc). So, if you have collected lots and lots of different trigrams ("not completed but, not count on, not belong to", or "to do it, to go away, to continue his" etc), and have sorted them by frequency, then! \*drumroll\* You can calculate the statistically most probable English sentence! Which would look like (I am making an example up here) "I am not belong to continue his words are the one of the president of stupid like a big piece of all the things are no doubt that he is another step to..." Wahahahaha! :-D ... :-| Okay, maybe applied linguistics jokes are only funny for computational linguists.
Anyway, the real use case is, having alphabetically sorted n-grams annotated with the web page they came from, speeds up the search process significantly for a search engine provider (because jumping to a position in the alphabet can be done faster than doing full-text search over and over again).

Another example of useful things you can do with Google's n-grams even without the page URL they came from, is training speech recognition systems: If the system didn't get whether you said "Oh painter knew him tivo'ed phile ending sword mine amen sedate" or "Open a new empty word file and insert my name and the date", a quick n-gram frequency comparison tells it: The second interpretation is a bit more likely to occur in the English language. Now, aren't you glad n-grams were invented and you can get them for free? :-)

Monday Oct 09, 2006

Userfriendliness Is for Wimps

I jotted down some notes while getting the hang of one of the 3D Mesh Editors I checked out, Blender. Blender is cool: It runs on Windows, Linux and MacOS, has a huge selection of professional features (incl. animation, scripting, UV skins, and skeleton kinematics), supports a dozen common formats (incl. .obj, .x, .md2, .lwo), and it is free.

The catch? Blender's non-existent user interface makes you wish you could use vi instead. :-(( Let's say if a good user interface is a dialog, then Blender is a monologue, and it speaks a very strange dialect. My notes, a kind of Blender for Beginners, list the top 10 things that made me go "Gaaah! Nobody told me!" after two Sundays of trying to get a hang of it. Be warned, there may be some sarcasm ahead.

If these top 10 revelations did not scare you away from Blender for good, there's a wikibook called Blender 3D: From Newb to Pro that will answer a lot more questions (such as, which combination of the three million levers do I have to pull to make glass and water transparent?). Have... fun. :-)

Wednesday Aug 09, 2006

The Perfect Trap to Catch a Newline

Hah, I need to write this down before I forget it again... One of the files I was editing was created on Windows: All the html was displayed in one long line with lots of \^M's instead of newlines -- pretty annoying when you just want to do some quick fixes with a simple linux texteditor.

The easiest way in Linux to replace a character (set) by another is the shell's tr command. If you get an error from tr, note that it expects input from STDIN, don't give it a file name as argument. And make sure the output file has a different name than the input file, otherwise the file will go all Ouroboros on you and eat itself. Usually, you give tr two arguments, the character to replace and the replacement.

But to search for a special character like \^M, you can't just search for a \^ and an M, your need heavier ordnance. The \^ thingy is shaped like a v, that was what finally reminded me how to do it. You press ctrl-v and then hit return. It will come out as \^M. Don't ask me why, it's magic. It also works with tab and other stuff. So the following line will replace all alien \^M characters by good and friendly \\n characters:

tr '\^M' '\\n' < file.html > file2.html # press ctr-v return for \^M

The wikipedia entry on newlines claims I could have searched for \\r instead of \^M, or used dos2unix if installed -- but oh well, next time. My solution at least escapes any character I want, not just "return", and it's always installed.

What about other options? In emacs, I'd press esc shift-5 for the search&replace "dialog", and when I replace some crazy special character, I usually just cowardly select it with the mouse while no-one's looking, and paste it into the dialog by pressing the middle mousebutton. Yes, generally this also works for newlines -- you select from after the last character of a line to before the first character of the next line, and it will select the (invisible) newline character. (Copying&pasting newline characters works in MacOS too, only in Windows I never succeeded with this trick.) Unfortunately, emacs somehow couldn't generate a "normal" newline character as replacement in this context. And replacing \^M by \\n in vi only gave me even crazier \^@ thingies. Hm, seems I'll stick with the shell then.

Unless anyone has a better solution. Feel free to leave a comment.

Tuesday Aug 01, 2006

Mind Hacks (they work)

Recently, I tried this 'Mind Hack' that lets you see better in the dark: I was fetching a drink from the fridge at night, and you know the problem: The moment you switch on the light (or the fridge does), your eyes adapt to it. When you go back and try to find your way through the dark, you are virtually blind.

Here comes the Hack: When you switch on the lights, you keep one eye closed. Then you fetch your drink or whatever it was. Then, when you walk back through the dark, you close the eye that's adapted to the light, and use the other one that's still adapted to darkness. I was too tired to find a mirror to see whether the eyes looked any different in that state. o_O But well, it worked! Fascinating.

Then yesterday, I was trying to remember a few things for today, so I tried a visual mnemonic: I pictured myself in the situation where I should remember something. For example, right after logging on to Linux, I should chmod u+x .xsession because I forgot to do that when I created the file. [No I don't know why I wake up in the middle of the night thinking "I need to chmod .xsession!", but hey, if it helps getting my system set up, I'm not complaining.] Okay, perfect, I managed to remember that one today pretty well.

But before logging on, when I switched on my computer... there was something else... I clearly remembered that I wanted to use the time it takes for booting to prepare something... I think in the kitchen... But what? Not tea... Well so much about that visual mnemonic crap! :-/

Alright, I thought, scratch that, and started working. I checked my mail and read some stuff. So, a while later, I am still reading stuff and I come across a line that says something about visual web development. — Suddenly I think "I wanted to put the juice I brought into the fridge!" That was it! But what reminded me? I remembered it exactly when reading the word visual .... :-o So that visual crap does work! Just in unexpected ways.


NetBeans IDE, Java SE and ME, 3D Games, Linux, Mac, Cocoa, Prague, Linguistics.


« February 2017

No bookmarks in folder