Today I want to show how we can get the Oracle Streaming Service (OSS) data using Data Science.
Oracle OCI Streaming service is mostly API Compatible with Apache Kafka, hence you can use Kafka APIs to produce/consume from/to OCI streaming service and based on that we are going to use the Kafka-Python API.
To connect with Twitter we are going to use the Tweepy API.
Note: Because OSS is Kafka compatible we can use a Kafka API and this means that you can in an easy way change to use Kafka broker.
Data Science Environment
First, you need some configurations in your tenancy for Data Science.
here
You need to allow Data Science environment to access the internet and for this, you need to configure a NAT Gateway.
here
Note: I suggest you follow the "getting-started.ipynb" first to validate your environment. If you get an "HTTPsConnectionPool" error means that you have an error in your NAT Gateway configuration.
open the Data Science terminal and install the two lib's.
<strong><em><span class="n">pip install tweepy</span></em></strong>
<strong><em><span class="n">pip install kafka-python</span></em></strong>
OSS Environment
For the OSS you just need to check the authentication part, and for this, you need:
Producer - Application that sends the messages.
Consumer - Application that receives the messages.
For connecting with Twitter you will need credentials.
here.
Great, at this point the tweets are being captured. The execution speed will depend on how “hot/trending” the keywords you defined currently are on Twitter. I defined keywords to collect data about Donald Trump, which is always a hot topic.
Now let's do some kind of analysis: we can create a class that gets the tweets and saves them than in a file, we can stop streaming when it reaches the limit.
here we can convert the tweets data to pandas
This is a very simple DataFrame, but we can still check some interesting stuff;
<strong><em><span class="n">df.lang.value_counts()</span></em></strong>
<strong><em><span class="n">df.source.value_counts()</span></em></strong>
Now you can play with and so some nice analyses and visualizations as well.
Links
Principal Big Data Consultant, Developer Advocate, Streaming Evangelist, Brazilian Geek, Coffee lover, Sepultura Fan, Maker and Hockey Player. Based in Dublin.
My hobbies are playing with IoT, electronics, Twitter sentiment analysis, electric guitar and In-line Skate/Hockey.
I'm current have a project to Transforming Donald Trump’s Tweets in cash.
I also run a monthly meet-up for Brazilian IT professionals in Dublin with an average of 80 attendees. Volunteer mentor and assisting Coder Dojo in Rathmines.
Previous Post
Next Post