Today I want to show how we can get the Oracle Streaming Service (OSS) data using Data Science.
Oracle OCI Streaming service is mostly API Compatible with Apache Kafka, hence you can use Kafka APIs to produce/consume from/to OCI streaming service and based on that we are going to use the Kafka-Python API.
To connect with Twitter we are going to use the Tweepy API.
Data Science Environment
First, you need some configurations in your tenancy for Data Science.
You need to allow Data Science environment to access the internet and for this, you need to configure a NAT Gateway.
open the Data Science terminal and install the two lib's.
pip install tweepy pip install kafka-python
For the OSS you just need to check the authentication part, and for this, you need:
Producer - Application that sends the messages.
Consumer - Application that receives the messages.
For connecting with Twitter you will need credentials.
Great, at this point the tweets are being captured. The execution speed will depend on how “hot/trending” the keywords you defined currently are on Twitter. I defined keywords to collect data about Donald Trump, which is always a hot topic.
Now let's do some kind of analysis: we can create a class that gets the tweets and saves them than in a file, we can stop streaming when it reaches the limit.
here we can convert the tweets data to pandas
This is a very simple DataFrame, but we can still check some interesting stuff;
Now you can play with and so some nice analyses and visualizations as well.