Wednesday Mar 27, 2013

Real-Time Topic Modeling of Microblogs

A new article published on the front page of otn/java, by Yogesh Tewari and Rajesh Kawad, of Infosys Limited Labs in Bangalore, India, titled “Real-Time Topic Modeling of Microblogs,” explores “the challenge of real-time extraction of topics from a continuous stream of incoming microblogs or tweets that are particular to an application” that they created. From a simple tweet text, the application is designed to accurately suggest relevant topics discussed in the tweet, and provide real-time timelines of topics generated from the tweet streams.

They explain that this is no simple tasks since a tweet, “considered as a text corpus, contains only 140 characters and second, given their brevity, tweets may not provide useful information and may contain different forms of text such as ‘smileys’ and short-form URLs. Finally, tweets are generated in real time.”

Yogesh and Rajesh apply LDA (latent Dirichlet allocation) to topic model tweets and make use of the Machine Learning for Language Toolkit (MALLET) API as the implementation for LDA – all performed in a Java environment. The LDA implementation is in turn encapsulated within the MALLET API, which here functions as a command line–based Java tool.

As they state: “Our targets are the actual Java classes that perform the LDA logic whose methods we invoke with required input in real-time. Storm is our choice of a free and open source distributed real time computation engine implemented in Java and running in a distributed mode. Storm is highly scalable and easily capable of handling incoming tweet streams. We use Twitter4J to stream tweets, which require valid Twitter authentication. So our task is to design a topology that will consume tweet streams and output a timeline of topics.”
Check out the article here.

Devoxx UK Highlights

The London Java community really put on a smashing first Java conference that was coloured with local flavour. The conference began on a keynote of patriotic fun as the national anthem was played as a picture of the Queen was displayed with the quote: "One likes to code". The participants stood politely, but did not sing.

The title of the keynote was "The Programmer" and was all about the act of programming, insights into who programmers are and tips to get better at it. Kevlin Henney received applause and cheers when he stated: "We didn't get into programming because we wanted to deliver business value. That's what we say during interviews." His knowledgeable presentation backed up with research was spot on and it's worth any developer's time to watch the replay on Parleys.

Oh, and do "mind the gap" between the train and platform as we are admonished nonstop by the station minders. As well, Mind the Geek, the clever tagline of the conference. But, if I don't mind the geek, what do I risk? Broken code, twisted error messages, suffering a memory leak or worse, I'm sure. Let us all mind our inner geeks, then?

With Devoxx UK, the number of Devoxxians will reach 5,500 across Europe this year. The hands-on labs, talks, quickies, birds-of-a-feather and bash run from 9:30am to 10:00pm in the spacious business design center with its mezzanine. 75 speakers talked in 50 sessions in 7 tracks about cloud, Java SE, methodologies, Java EE, web & big data, new languages on the JVM, and future Devoxx.

As if you didn't know, the French have already got a Holy Grail, and so refused to assist King Arthur and his Kiniggits in their quest. That was then, now the Brits borrowed the Grail from the French for the two-day conference and will return it for the beginning of Devoxx France starting tomorrow. 

Insider News from the Java Team at Oracle!



« March 2013 »