The Rotten Tomatoes movie review dataset is a corpus of movie reviews used for sentiment analysis, originally collected by Pang and Lee. This competition presented a chance to benchmark sentiment-analysis ideas on the Rotten Tomatoes dataset.
We are asked to label phrases on a scale of five values: negative, somewhat negative, neutral, somewhat positive, positive. Obstacles like sentence negation, sarcasm, terseness, language ambiguity, and many others make this task very challenging.
The dataset is comprised of tab-separated files with phrases from the Rotten Tomatoes dataset. Training set contained 156060 rows.
I have retrained the spaCy language (“en”) model using train data provided.
I have trained for 5 iterations only it runs for about 1 hour. We can increase iterations to get better performance.
For more details refer my kaggle kernel in this link
This blog is also posted in my personal website here.