Natural Prestidigitation Assignment Meta-Page

This is the meta-information for the Natural Prestidigitation Assignment as part of a submission to Nick Parlante's SIGCSE Nifty Assignments panel.

This assignment explores Benford's Law. The law describes a surprising pattern in the frequency of occurrence of the digits 1–9 as the first digits of natural data. You might expect each digit to occur with equal frequency in "arbitrary" data. Indeed, in truly random data (over appropriate ranges) each digit does appear with equal frequency. However, a substantial amount of natural data from sweepingly diverse sources does not exhibit a uniform distribution. Instead, 1 is more common than 2, 2 more common than 3, and so forth. The image in the upper right illustrates this trend with some disparate data. The trend is most obvious in the aggregate frequency across all the data.

Many thanks to the students and staff of Computer Science 312 at UBC during 2004–2005 for helping to develop, critique, and collect data for this assignment.

Meta-data

Summary	Natural Prestidigitation — write a simple number-crunching program to explore a bizarre phenomenon in the distribution of first digits of numerical data (called Benford's Law) in real, student-collected data.
Topics	Simple, mathematically-focused, procedural-style programming. Uses loops, conditionals, integer division, modulo, and a bit of arrays and I/O.
Audience	Appropriate for intro programming. (Originally designed for a third year functional programming course.)
Difficulty	A fairly easy assignment, doable in 6 hours outside class for students most of the way through CS1. (3rd year UBC students just learning their first functional programming language finished in an average of 4 hours outside of class, including data collection.)
Strengths	This is a simple, solid practice problem of the sort our students often request. The niftyness comes with the Benford's Law tie-in, which gives students a taste of the power of computing in scientific exploration, uses real data that students collect on their own, and reveals a fascinating natural property of numbers. Also your only opportunity to explain that logarithms once came in books and catch financial fraudsters in one fell assignment! (For more on both, see the resources below.)
Weaknesses	Unlike most nifty assignments, this one appears rather dull. Even once under the surface, not all students will find the mathematics of nature exciting! Finally, the non-central use of arrays impedes use of the assignment earlier in CS1.
Dependencies	Requires a good grasp of integer division and modulo. Requires some sophistication with loops and conditionals: although no nested loops/ifs and no complex conditions are used, the edge cases can be tricky. Requires basic ability to understand, write, and use functions. Makes very light use of I/O and a one-dimensional array. Works fine in any language.
Variants	Various concepts can be emphasized or de-emphasized with minor changes: Emphasize design: remove the functional decomposition; require students to create and submit their own Remove arrays: process each datum as it's read (rather than collecting them in an array). Use ten separate variables for the ten digit counts. Requires one big cascading if/then and some ugly repeated code for printing out. Emphasize arrays: change the functional decomposition so that input numbers are explicitly split into a list of their digits De-emphasize loops: process each datum as it's read (rather than using one loop to collect data and another to iterate over the resulting list). Implement `countDigits` as `floor(log(num)) + 1` with a special case for 0. To pick out the `n`^th digit back, divide by `10ⁿ` and mod by `10`. Cuts down to just one simple loop at the expense of emphasizing some more complex math. Others: emphasize data cleaning, use a true database, emphasize graphics by plotting results, etc.

Supporting Materials

You'll be wanting the assignment itself. Before you give the assignment, the key thing you need to do is understand Benford's Law. Since MathWorld is authoritative but impenetrable, you might check out some other Benford's Law resources: Joe Walthoe's Benford article is much gentler and runs the gamut from history to derivation of the law. For the accountant's take, try the AICPA article on Benford's and fraud detection.

Here's some sample data files to get you started:

Sunspot data collected by Robin McQuinn. The number of sunspots that occur monthly, as compiled by a Belgian agency: http://sidc.oma.be/html/sunspot.html.
LiveJournal starts collected by Shirley Man. The number of new LiveJournal accounts created daily Jan 1, 2000–Feb 28, 2005, from http://www.livejournal.com/stats/stats.txt, field name "newbyday".
Library holdings collected by Christian Ayotte. The number of books held in a set of several thousand US libraries, from an Access database at: http://nces.ed.gov/pubs2004/data/PUPLD02a_Access.zip.

Meta-Commentary

The assignment document intentionally does not mention Benford's Law. Unlike a lot of nifty assignments, this one actually comes off as rather dull on first inspection. The only exciting part is that students are invited to acquire and process their own data, which may be a prospect of limited interest unless they discover Benford's odd pattern in the data on their own.

Instead, this assignment offers three nifty features:

A connection to real science. Some years ago, I heard the wonderful expression: "You have to move the milk to find the beer." In other words, the drudgery of massaging, perusing, refining, and exploring data leads is the work that leads to occasional, breath-taking discoveries. Benford's Law is one of those breath-taking discoveries: the shocking discovery that so much data of perfectly unremarkable extraction should hide such an unexpected, consistent, and beautiful pattern! Better still, students discover this surprising pattern in data that they themselves gather as a collaborative team!
A sample of the power of computing in science. Although Newcomb and Benford first noted this phenomenon on the basis of wear patterns in books of logarithms, computers make broad empirical study of this phenomenon (and other still-hidden properties of numbers!) feasible. Indeed, the second time I offered this assignment to students, we stumbled on a pattern in the distribution of second digits in our data and later found a mathematical description of the generalized distribution in MathWorld's entry on Benford's Law.
An interesting mathematical story. The discovery, rediscovery, and development of Benford's Law makes for an interesting history, for those who enjoy bringing stories to their classrooms. Modern students are sure to be surprised at Newcomb's and Benford's discovery of a pattern in the wear on books of logarithms, not least because they'll have no idea why anyone would keep a book full of logarithms! Benford's Law has also crept into unusual applications such as fraud detection in accountancy. Finally, Benford's connects to Zipf's Law and other Power Laws, incredibly crucial concepts in science and mathematics. (I seem to recall that Zipf himself was a pretty strange fellow, but I can't find any good documentation of that.)

That said, to milk the niftyness of this assignment, it is absolutely crucial that you introduce it and wrap it up carefully. In your introduction, make clear that, while the assignment seems straightforward, the code they write has the potential reveal some fascinating properties of real data in the world around us! Ask students to predict a distribution of first digits in real data and why that distribution should occur.

In your conclusion, bring together as many sets of submitted student data as possible and graph them individually and as a combined set. Talk about the pattern that emerges. Discuss anomalies and their effect on the overall pattern! (For example, one of my students found housing prices for a certain neighborhood. Most data was in a very tight range, so that many first digits appeared rarely. Nonetheless, for the range represented, the first digits followed Benford's Law well!) Discuss Benford's Law and point students at some web resources for further inquiry if they're interested.

P.S. Several people have used the title "Looking Out For Number One" for Benford's Law related articles. I didn't use that title because I didn't think of it until I saw others' work with the name. Still, it's a better name than "Natural Prestidigitation". You make your own choice :)

Steve Wolfman
Vancouver, British Columbia, Canada
2005/08/25

Extra info about this assignment: