The project brings together everything you've learned this term to build something that I hope you'll be proud of. Its purpose is to give you practise using dictionaries, creating GUIs, deciding on some aspects of a program's specifications, and designing an entire program from scratch.
Note that 80% of your mark for the project is for completing a basic set of requirements, and the last 20% is for adding enhancements of your choosing.
If you've ever bought a book online, the bookseller's website has probably told you what other books you might like. This is handy for customers, but also very important for business. In September, online movie-rental company Netflix awarded one million dollars to the winners of the Netflix Prize. The competition simply asked for an algorithm that would perform 10% better than their own algorithm. Making good predictions about people's preferences was that important to this company. It is also a very current area of research in machine learning, which is part of the area of computer science called artificial intelligence.
So how might we write a program to make recommendations for books? Consider a user named Rabia. How is it that the program should predict books Rabia might like? The simplest approach would be to make almost the same prediction for every customer. In this case the program would simply calculate the average rating for all the books in the database, sort the books by rating and then from that sorted list, suggest the top 5 books that Rabia hasn't already rated. With this simple approach, the only information unique to Rabia used by the prediction algorithm was whether or not Rabia had read a specific book.
We could make a better prediction about what Rabia might like by considering her actual ratings in the past and how these ratings compare to the ratings given by other customers. Consider how you decide on movie recommendations from friends. If a friend tells you about a number of movies that s(he) enjoyed and you also enjoyed them, then when your friend recommends another movie that you have never seen, you probably are willing to go see it. On the other hand, if you and a different friend always tend to disagree about movies, you are not likely to go to see a movie this friend recommends.
A program can calculate how similar two users are by treating each of their ratings as a vector and calculating the dot product of these two vectors. (Remember that the dot product is just the sum of the products of each of the corresponding elements.) For example, suppose we had 3 books in our data-base and Rabia rated them [5, 3,-5], Suelyn rated them [1, 5,-3], Bob rated them [5, -3, 5], and Kalid rated them [1, 3, 0]. The similarity between Rabia and Bob is calculated as: (5 x 5) + (3 x -3) + (-5 x 5) = 25 - 9 - 25 = -9. The similarity between Rabia and Suelyn is: (5 x 1) + (3 x 5) + (-5 x -3) = 5 + 15 + 15 = 35. The similarity between Rabia and Kalid is (5 x 1) + (3 x 3) + (-5 x 0) = 5 + 9 + 0 = 14. We see that if both people like a book (rating it with a positive number) it increases their similarity and if both people dislike a book (both giving it a negative number) it also increases their similarity.
Once you have calculated the pair-wise similarity between Rabia and every other customer, you can then identify whose ratings are most similar to Rabia's. In this case Suelyn is most similar to Rabia, so we would recommend to Rabia the top books from Suelyn's list that Rabia hadn't already rated.
Your task for this project is to write a program that takes people's book ratings and makes book recommendations to them using this technique. We will refer to the people who use the program as "members".
These are the requirements you must satisfy for the first 80% of your grade.
Your program must keep track of the following information about each book in the system:
Your program must keep track of the following information about each member:
Your program must keep track of book ratings. Each rating has three pieces of information: the book, the member, and the rating. Each rating must be stored as an int with one of these values: -5, -3, 0, 1, 3, 5. They are to be interpreted as follows:
Rating | Meaning |
---|---|
-5 | Hated it! |
-3 | Didn't like it |
0 | Haven't read it |
1 | ok - neither hot nor cold about it |
3 | Liked it! |
5 | Really liked it! |
Your program must use a GUI for all input and output with the user.
Your program must give feedback indicating the result of every operation the user performs. For example, if the user tries to add a new member with an account name that is already in the system, it might display the message "Sorry -- that account name is already in use. Try again." It will be up to you to decide how you want to handle all the situations that may arise, except in the few cases where I specify the behaviour.
Below are the basic operations your program must provide. You will get to decide (1) how to handle all the "border" conditions that may arise and (2) what the GUI will look like.
When it starts, your program must load all the existing data (created in previous runs of the program) from a pickle file called data.pkl. (If the file doesn't exist -- because this is the first run of the program -- your program must initialize all the data structures to empty.) And when the user chooses to exit the program, it must automatically save the updated date to that file for use the next time the program runs.
Your program must provide the operations below to the user.
I have provided you with a baseline description of the application. You will decide how challenging to make this assignment by extending your application with new features ("enhancements") of your choice. These enhancements will be worth the final 20% of your project grade.
The additional features you implement could be practically anything! Here are a few suggestions:
To receive credit for your enhancements, you must submit a written description that
explains how your enhancements went beyond the baseline requirements.
Submit your description in a plain text
file named enhancements.txt
. (To ensure that your file is
in plain text, you should use Wing as your editor.) Your submission should
not exceed 150 words.
If you do not submit this file, with this name, your enhancements will
receive a mark of zero.
If you did not add enhancements, submit enhancements.txt
but
say in it "None";
you will still be able to earn up to 80%.
You may find that there are a few values that never change, and that you need to use throughout your program. For example, the name of your pickle file will never change while the program runs. When you have an unchanging value, you should use a named variable for it, rather than just sticking the value in everywhere. That makes it easier to change such a value in the future: you just change it in one place rather than hunting around for all occurrences of the value. It is accepted practise to give such variables names that are in all upper case and to define them at the top of the module so that they can be accessed throughout the module without having to pass them around. This is an exception to our general rule against use of global variables, and should only be used for a small number of values that will never change. Generally, all functions should receive the information they need to know in order to do their job through parameters, and should tell back their result through a return value (and possibly also through mutable parameters). This is a critical aspect of good programming style. If you are at all unsure about it, please ask me!
To help you break the project down and keep on track, here is a series of suggested milestones.
The specifications provided tell you what operations your program must offer, but not all the details of how it should behave under every circumstance. Go through all of the operations and make those decisions. For example, if the user wants to create a new account but gives an account name that has already been taken, what should happen? Write all your decisions down.
Next, plan what your enhancements will be. If you have ambitious plans, you would be wise to have some simpler enhancements in mind, just in case you get behind.
Finally, you should work with your partner to create a schedule for milestone completion. Write it down and refer back to it often.
This would be a good time to check the course discussion board for any relevant postings. Get in the habit of checking it every time you work on the project.
Write the outline of your recommender program, using the "recipe" we have used in lecture and lab for designing GUI programs. Include all the top-level operations that you want to provide in your basic version. (You can easily add new operations, if you need them for your enhancements, later.)
Once you have the outline written, run the code to see if everything is hooked together properly. You are on your way!
You will need some structures to represent all the information about books, members, and ratings. Decide on what structures you will use. There will be many options, and part of your grade will be for making good choices. Try to choose structures that will make it easy to perform the kinds of operations you will need to do. For example, will you ever need to search the members by name, to add multiple ratings for the same book, to look up all books that ever received a high rating, or to determine the average rating a user gave?
Now write the bodies of the functions for your operations. Remember to use top-down design to break your functions down into coherent pieces that are nice and small. For each helper function, be sure to write the docstring before you write the code. In fact, ideally you will do it in this order: write docstring, design test cases, write the function, test the function.
This milestone will require you to start writing code that involves the data structures you designed. This is a good time to think about how well your data structure(s) are working for you, and consider whether any revisions would be appropriate. Think also about all the special cases that can arise with your operations (e.g., trying to delete a member who doesn't exist), and how you are handling them. You may want to make some revisions.
Remember that each operation should give feedback about what your program just did (or didn't) do. You might display, for instance, "Successfully added new member Diane Horton" or "There is no member Fred Flintstone; unable to delete."
Although you have been testing pieces of your program throughout its development, now you should verify that it works properly as a whole.
You should reread the docstring for each function and make sure that you have tested it under every condition that the function is meant to handle. And if you think of cases during your testing that are not mentioned in your docstring, update it.
This would also be a good time to re-read the entire handout and make sure that you have met all the project requirements.
By this time, you have had the experience of implementing the basic requirements and you can make a better estimate as to how much work your planned enhancements will be. Put this together with how many days you have left -- you may not quite be on your planned schedule -- and decide whether to revise your plans. It is crucial that you make this decision explicitly, rather than just cross your fingers and hope it will all work out. One of the biggest mistakes students will tell you they made on their csc108 project is not being realistic about their enhancements.
Now that you have a realistic plan, go ahead and implement your enhancements.
If you develop your program in small, incremental steps, and you test your code as you go, then even if you run out of time at the end, you will still have a working program to hand in. It may not implement every feature, but what it does have will work, which will mean you can get part marks for correctness.
To mark your assignment, we will run your program by hand. We will also, as always, examine your Python code.
To mark your enhancements, we will read enhancements.txt
and run your
application to test the features
that are described. Only enhancements described in that file will be marked.
These are the aspects of your work that we will focus on in the marking:
Correctness: Your code should meet the project requirements. Correctness, as measured by our tests, will count for the largest single portion of your marks.
Decisions: You will make many decisions about the behaviour of your program under "border" conditions. The quality of those decisions will count for part of your mark.
GUI design: You have the freedom to design the GUI any way you like. We will assess the "useability" of your design. For example, can the user go straight to what they want to do, or are many unnecessary clicks required? Does the GUI always make it clear what's going on?
Docstrings and comments: We want to see great docstrings and internal comments. See the assignment rules for more details.
Data Structures: Your data structures should store all the necessary information in a way that uses space efficiently and makes it convenient to access the data.
Program design: Your program should be broken into functions, both to avoid repetitive code and to make the program easier to read. Each function should have a coherent purpose. Functions should use parameters to get all the information they need from the caller, and should use a return value to give back any relevant information to the caller.
Programming style: Your variables names should be meaningful and your code as simple and clear as possible.
Formatting style: Make sure that you review the rules for formatting your code.
Enhancements: As described in enhancements.txt
.
(See the Enhancements section for more info.)
Instructions about local submission mechanism.
Hand in
recommender.py
if __name__ == "__main__"
section, and includes any import statements that are needed.
We must be able to run your project simply by running the code in recommender.py.
enhancements.txt