voronoi

Let's go out to eat!
Show me places I would like
By learning my tastes.

Introduction

In the Yelp Maps assignment, students apply basic machine learning concepts to predict user ratings and cluster restaurants around a university campus using the Yelp Academic Dataset. The assignment primarily illustrates CS1 concepts such as data abstraction, sequence processing, and key-value pairs. The machine learning ideas are all introduced by the project text as a way of motivating course fundamentals. The course itself does not need to cover these topics in order for the students to complete the project.

The project builds a single visualization that is developed in independent phases. First, restaurants are clustered by location using the K-means algorithm. Second, ratings for restaurants (visualized using a color scale) are predicted using linear regression (simple regression only; no matrix inversions required). Finally, restaurants are filtered by search queries. Test cases provided with the project ensure that these pieces combine correctly.

Students don't write the visualization code or need to know how to write a web server, but the interactive visualization is rendered by a web browser using Python's built-in HTTP server. For students interested in web programming, this project allows them to explore and extend an example of a browser-based data visualization.

Getting Started

Try the project now by clicking this link! If you like it, you can deploy it to your students just by giving them this link or by hosting the contents of this zip on your own site. If you decide to make changes, please make sure to update the contents of maps.zip, because that's the starter package that students use.

Context

This project was developed to illustrate the concepts in Sections 2.2, 2.3.1-2.3.5, and 2.4.1-2.4.3 of Composing Programs, a free online CS 1 textbook developed for CS 61A at UC Berkeley. The project presupposes understanding of functions and expressions. We recommend assigning a prior project that reinforces these foundational topics, such as the Python version of Hog (Nifty Assignments 2010). This project was developed as the second project in a semester-long introductory course in which 2/3 of students have prior programming experience.

Instructor Guide

We distribute all unit tests to students, which allows them to detect and correct their own errors. We also use this project regularly in large courses, so it is common that solutions appear on the Internet. Therefore, we recommend that you assign this project as an instructional component of your course, rather than an assessment. Give students points for completing the assignment correctly, but expect that most students will receive all the points.

Feedback

In our end-of-semester survey, 26% of students rated Yelp Maps as their favorite project (second only to Ants vs. Somebees).

Some sample positive feedback, shamelessly cherry-picked from our end-of-semester survey:

The Yelp Maps project felt like one of the best because it was the most polished in terms of how it was presented and the design. I really love when the project instructions have lots of pictures and diagrams to a larger idea.
Data analysis projects are great and very applicable to a lot of fields. I think there should more emphasis on this for projects.
Yelp Maps brought many advanced topics to a level we could understand.
I loved [Yelp Maps]! It made me feel like I was making something that somebody would use.
I especially like the Yelp Maps project because it is an example of a practical application. I am most interested in programming apps that have practical daily uses, and this gave me an idea of how one might go about programming a complex app.
Yelp Maps taught me a lot because I think many of the confusions regarding [data structures] were solved as a result of the project!

You can see the final visualization here. These sandwich shop recommendations were generated for a sample user that tended to prefer expensive restaurants.

Metadata

Summary Students develop a restaurant recommendation system using machine learning and the Yelp academic dataset.
Topics Data abstraction, Python data structures, and introductory ideas in machine learning.
Audience Appropriate for CS1 students familiar with data abstraction and the Python language.
Difficulty This assignment is the second project in our CS1 class. Students are given one week, and it is the shortest project out of four.
Strengths
  • The project introduces advanced ideas that students do not typically see until upper-division classes, such as machine learning. This allows them to build an application that is more practical and interesting than would otherwise be possible.
  • Students like the real-world application of analyzing large amounts of data and recommending restaurants to try.
  • The final D3.js visualization of predicted restaurant ratings is interactive, and a cool result to play with.
Weaknesses
  • The project has a lot of starter code and students are carefully guided through the project. Although this is helpful for most students who don’t know how to start a project from scratch, more advanced students occasionally feel constrained.
  • It is difficult to explain the mathematics behind the algorithms used; some students feel like they are just being told to implement an algorithm.
Dependencies
  • Yelp data for local businesses; the Yelp academic dataset includes information for businesses near 30 schools.
  • Students have to understand the fundamentals of data abstraction, the Python language, and in particular Python data structures such as lists and dictionaries.
Variants
  • More advanced visualizations (e.g. power diagram or other weighted Voronoi diagrams) can be substituted in place of the default map.
  • Students can use the Yelp API to make suggestions based on their own Yelp ratings.
  • Students can learn about and implement additional clustering and regression machine learning algorithms.
  • Students can use additional data provided with the Yelp academic dataset to improve suggestions.