« A Little Book Release Video | Main | A Leap Forward for Semantic Web »

Sentiment Analysis and the Semantics of Spin

Sentiment analysis (how we feel based on what we say or write) is a branch (in my mind at least) of entity extraction, business intelligence and semantic web capabilities. Because most sentiment is expressed in unstructured information (speeches, documents, conversations, chats etc) it is very useful to be able to determine the sentiment of the communicator when we are consuming information. As humans we do this almost unconsciously. We can tell when someone is being sarcastic, or is pissed off or is just letting us know about something.

The current political season in the US is ripe with unstructured information full of sentiment. Our Canadian neighbors are taking full advantage of the current season of speeches to test out their algorithms that parse speeches for "spin". Queen's University in Kingston, Ontario, Canada is where much of the research is taking place. They put their algorithm to the test against the speeches at the Democratic and Republican conventions and tested for spin. The graphic below lists the results.
semanticSpinAnalysis.jpg
You can read more about their methodology and how their mathematical system works in a neat article in New Scientist HERE.

Comments (4)

Michael Feldstein:

Um...reading the article, I would say the approach is problematic to say the least. For example, it would rate the sentence, "I know I am going to hate this chalupa" as having a very high spin content. Likewise, "I want to kill this bad appropriations bill" would have a high spin content, while "We are the champions, my friends" would be considered un-spinful.

Michael Feldstein:

Oops! I misread slightly, so permit to revise and extend. "We should oppose this bad appropriations bill" would be high spin, while "I am a champion of the common man" would not be. The general point still holds.

bex:

I call bullshit...

The algorithm counts usage of first person nouns - "I" tends to indicate less spin than "we", for example. It also searches out phrases that offer qualifications or clarifications of more general statements, since speeches that contain few such amendments tend to be high on spin.

So... telling tall tales about your personal past is not spin, but saying what your family did last week is spin.

If an augment is nuanced, this algorithm will give the same spin if it was explained clearly, compared to if it was obfuscated. And both have more "spin" than a painful and false oversimplification of a complex problem... such as "the fundamentals of our economy are strong."

What about framing? What about flat-out lying? Speeches have been pretty heavy in both of those, and yet they would rate low in the "spin" category because lies and frames are stated as "fact."

Any algorithm that put Hillary's spin lower that Bill or Obama is pretty sloppy, if you ask me...

guys, calling bullshit on the methodology is a bit hasty don't you think? Skillicorn is widely published and peer reviewed in some pretty heady places (bibliography here: http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/s/Skillicorn:David_B=.html) . While the NewScientist article doesn't plumb the depths of the math behind the analysis, I do strongly suspect some over generalization on their part to make the article digestible for us humans.

What the article does do is point to trends in speech patterns that correlate with other "tells" such as psychology, facial analysis, and body language. It also points out how what they call "spin" may really be a proxy for emphasis or even empathy.

My point is that computationally derived evidence of intent, purpose, or meaning is a step forward for semantic and NLP technology. Think what this algorithm could do if applied to customer feedback in realtime (phone, face to face, or web form submission). It could help business identify the next best course of action to take with the customer. Offer them a discount if they're pissed, or upsell them if they're pleased.

The point is that this isn't some quack shilling snake oil in the back alley. Law enforcement, counter terrorism, and high tech are all working on similar paths. Like this abstract from a 2006 high tech conf illustrates

In intelligence, law enforcement, and, increasingly, organizational settings there is interest in detecting deception; for example, in intercepted phone calls, emails, and web sites. Humans are not naturally good at detecting deception, but recent work has shown that deception is actually readily detectable - using markers that humans don't see but which software can readily compute. Pennebaker's model suggests that deceptive communication is characterized by changes in the frequency of four kinds of words: first-person pronouns, exception words, negative emotion words, and action words.We investigate what can be learned about the deception model by applying it to a large corpus of Enron emails. We show that each of the four kinds of words in the Pennebaker model acts as a separate latent factor for deception, rather than having their effects mixed together. (cite: http://portal.acm.org/citation.cfm?doid=1188966.1189005)

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)