Eric D. Manley and Timothy M. Urness
Drake University
eric.manley@drake.edu, timothy.urness@drake.edu
Summary | Students use Rotten Tomatoes movie review data to predict the sentiment of new text. |
Topics | CS 1 Version: File I/O, early control structures, and accumulators/counters for the minimal version (optionally, it can include methods, arrays, lists, dictionaries, and/or min/max algorithms) CS 2 Version: Hash Tables, custom classes, string manipulation |
Audience | CS1/CS2 |
Difficulty |
CS 1 Version: Easy. CS 2 Version: Intermediate. |
Strengths |
|
Weaknesses |
|
Dependencies | None |
Variants |
|
This assignment uses movie reviews from the Rotten Tomatoes database to do some simple sentiment analysis. Students will write programs that use the review text and a manually labeled review score to automatically learn how negative or positive the connotations of a particular word are. This can then be used to predict the sentiment of new text with reasonably good results. For example, student programs will be able to read text like this:
The film was a breath of fresh air.and predict that it is a positive review while predicting negative sentiment for text like this:
It made me want to poke out my eyeballs.The data (with some pre-processing from us) is from a Sentiment Analysis project at Stanford (which used a much more sophisticated algorithm) and has been used for a Kaggle machine learning competition.
We have provided two examples of projects based on this idea that we have used in a CS 1 course and a CS 2 course, though there are many extensions that could be made for these or other higher-level courses.