Bar Chart Racer

Author: Kevin Wayne (wayne@princeton.edu).


Overview.   Write a program to produce animated bar charts like the following one:



This animated bar chart visualizes the 10 most populous cities in the world, from 1500 to 2018. To generate this visualization, students will successively draw 519 individual bar charts (one per year of data), with a short pause between each drawing. Each bar chart contains bars for the 10 most populous cities in that year, sorted in descending order of population, and colored according to world region.

Learning objectives. By the end of this assignment, students should be able to

Niftiness. The assignment is nifty because it combines graphics and real-world data to create a captivating visualization. Animated bar charts spread virally over social media in 2019 because they are a surprisingly simple, yet powerful, way to tell a story about categorical data over time. We supply a variety of real-world data sets drawn from human geography, sports, entertainment, and business.

Data files. One of the principal contribution of this assignment is the data curation for several fascinating input files, including population by city or country; movies by gross revenue; global brands by valuation; European football clubs by Elo rating; and characters in Avenger's Endgame by screen time.

input file description period data source
cities.txt most populous cities in the world 1500–2018 John Burn-Murdoch
countries.txt most populous countries in the world 1950–2100 United Nations
cities-usa.txt most populous cities in the U.S. 1790–2018 U.S. Census Bureau
brands.txt most valuable brands in the world 2000–2018 Interbrand
movies.txt highest-grossing movies in the U.S. 1982–2019 Box Office Mojo
baby-names.txt most popular baby names in the U.S. 1880–2018 U.S. Social Security
football.txt the best football clubs in Europe 1960–2019 clubelo.com
endgame.txt characters in Endgame by screen time Minute 1–170 Prashant
game-of-thrones.txt characters in Game of Thrones S01E01–S8E06 Jeffrey Lancaster


Resources.


Sample executions. Here are some sample executions:


           


           


Metadata

Summary Write a program to create an animated bar chart.
Audience CS1. Our students are primarily scientists and engineers, not necessarily computer scientists.
Difficulty This assignment is not difficult. It takes one week in the second half of a CS 1 course. The reference solution is approximately 100 lines of code.
Topics Here are the main topics that the assignment addresses:

  • Data visualization / graphics. Plot a sequence of bar charts to produce a compelling visualization.
  • Sorting arrays/lists. Define a total order for a user-defined type and sort an array/list of objects using the system sort.

  • Reading an input file. Read a text file one line at a time and parse each line (which consists of fields, separated by commas).
Strengths

  • Quintessential data visualization example.

  • Easy to explain.

  • Motivates sorting in a fun and compelling context.

  • Real-world data files (and opportunities to construct new ones).

  • Broadly appealing not only to computer scientists but also to students in the natural sciences, engineering, and the social sciences.

  • Mesmerizing animations!
Weaknesses

  • Static bar chart library. Depending on the programming language, the code for drawing static bar charts can be tedious. We provide versions in Java (BarChart.java) and Python (barchart.py) so that students can skip this step.

  • Autograding. Grading programs that generate graphical output presents some special challenges. To circumvent this potential difficulty, our autograder intercepts all calls to the supplied BarChart.java library (so that there is no need to check what is drawn to the screen).
Dependencies

  • Input files. Must be able to read data one line at a time from a text file (UTF-8 encoded), such as with java.util.Scanner in Java or file objects in Python.

  • Graphics. Requires a library to draw 2D graphics. Our Java version of the assignment uses the open-source library StdDraw and our Python version uses matplotlib. However, any library that can draw lines, rectangles, and text (such as acm.graphics, PyGame, or stddraw.py) would be suitable.
Variants Many interesting variations of this assignments are suitable for CS1 (or even CS2).

  • Curate a data file. This involves identifying an interesting data source, downloading the raw data via web scraping, reorganizing the data, cleaning the data, and devising a strategy for maintaining the data.

    Some possibilities include Twitter accounts by number of followers, programming languages by number of programmers, religions by number of followers, countries by number of military personnel, web browsers by downloads, or political candidates by debate speaking time.

  • Enhance the visualization. There are limitless opportunities for creativity here.

    • Incorporate icons for each of the bars in the bar chart (e.g., flags for countries, company logos for brands, or avatars for Avengers characters).

    • Choreograph a more elegant animation effect when one bar overtakes another. This involves interpolation of the data.

    • Collapse consecutive dates with no movement (e.g., football Elo rating during summer months or top-grossing movies during dump months).

    • Integrate relevant historical analysis of the data. For example, the city of Vijayanagara went from 3rd most populous city in 1564 to nowhere. The city was destroyed over a period of five months.

  • Develop/refine a library to draw static bar charts. There are numerous opportunities for exploration, such as determining the the x-axis labels to draw or choosing colors for the bars. This could, itself, be an interesting and dependently useful programming assignment.

  • Reorganize the data within the data files. The data is supplied in the specified order (sorted by name and grouped by time period) and format (fields separated by commas). By providing the data to students in a different order (e.g., sorted by value or grouped by name) or format (e.g., JSON or XML), processing the data file can become easier or require additional data structures.

  • Substitute a priority queue for sorting. Instead of sorting, students could keep track of the top k bars using a priority queue (such as a sorted array or a binary heap).

  • Use a dictionary/map to color bars. The provided BarChart assigns colors to the bars according to their category (e.g., all cities in East Asia are assigned the same color). This functionality could be moved to BarChartRacer, thereby requiring students to use a dictionary/map.

Credits This assignment was inspired by tweets from from Matt Navarra and John Burn-Murdoch in early 2019.