Following the first blog in this series of taking data sets from "less typical" sources and analyzing with Oracle Data Visualization to unearth powerful insights, we move from comic books to traditional literature - and in particular, the highest selling author of all time. With April 23 marking both William Shakespeare’s birthday (and death day), we focus our attention to his plays. We found some open source data and Ismail Syed, Oracle UK intern, did the analysis and visualizations.
Who has the most lines?
We felt a great place to start is who has the most lines – to see if there were any surprises. And there was a fairly big one.
Rather than being one of the more common characters or antiheroes, the individual with the most lines across Shakespeare’s complete works (according to our data set) is Falstaff – appearing in Henry IV Part I, Henry IV Part II and The Merry Wives of Windsor.
It is also interesting that the fourth most prominent speaker is the Duke of Gloucester, even though he appears in the most plays. Featuring as a minor character in both Henry IV Part II and Henry V as well as being a major character in Henry VI Part I and Henry VI Part II, the character is actually a representation of Humphrey, Duke of Gloucester – son of Henry IV, brother of Henry V and uncle of Henry VI. There’s also a Duke of Gloucester in King Lear, who is an entirely fictitious creation, so could be seen as a different character. If his lines are added to the equation, then the Duke of Gloucester’s prominence jumps much higher – actually to first place!
Another noteworthy point is how much airtime Shakespeare gives the 'Clown' in his plays – in total, they have the 13th most lines. It's nowhere near the most, but given these characters tend to be brought it as some light relief between the action for the audience, it shows that, even amidst some of the biggest tragedies, even the clowns are given a voice to cheer things up a bit.
Which play has the most lines?
Hamlet has the most lines as a character in his own right and all within one play, but that’s to be expected given Hamlet is also Shakespeare’s longest play. The second-longest is lesser-known Cymbeline, while the shortest of his plays is A Comedy of Errors – coming in at only 1,787 lines, compared to the Elizabethan average of around 3,000.
As you might expect, Shakespeare’s plays in general fit in with the average length of plays for the time – perhaps because he contributed so many to the market. We found the average length of his plays to be 2,751. And, as you can see from the graph, beyond A Comedy of Errors, none veer too far away from the average.
Words past vs words present
We wanted to see which words were the most used in Shakespeare's plays – and then compare them to the words that are most used today. We did the latter by analyzing the most popular words in use on the internet today. Beyond the expected words like 'the', 'and' or 'or', you can see that 'thee' is one of the most popular phrases in Shakespeare – which might be expected. But ‘electronic’ also appears prominently in the word cloud too, which is interesting given the few electronics around at the time. Naturally, when you look at the popular words from present day, far more of them involve computers, the internet or other online terms.
Be more data driven
So as you can see, using data visualization, you can derive some interesting insights– here we’ve shown how far language has changed in ~450 years since William Shakespeare was born, but also how impactful different characters are.
With today’s volume of data from the variety of data sources, business leaders have never faced greater pressure to be data-driven in their strategy and execution. Visual analytics tools offer organizations a tremendous advantage in this regard.
Should you be a frequent data user and want to move on from wrangling data purely in spreadsheets, you should consider Oracle Data Visualization. Why not find out more, view a short demo and sign up for a trial?
To become fully data-driven, your business needs more characters in your organization to explore their data and gain better insights faster. Any data—anytime, anywhere.