sample plot of first digit data

Natural Prestidigitation Assignment Meta-Page

This is the meta-information for the Natural Prestidigitation Assignment as part of a submission to Nick Parlante's SIGCSE Nifty Assignments panel.

This assignment explores Benford's Law. The law describes a surprising pattern in the frequency of occurrence of the digits 1–9 as the first digits of natural data. You might expect each digit to occur with equal frequency in "arbitrary" data. Indeed, in truly random data (over appropriate ranges) each digit does appear with equal frequency. However, a substantial amount of natural data from sweepingly diverse sources does not exhibit a uniform distribution. Instead, 1 is more common than 2, 2 more common than 3, and so forth. The image in the upper right illustrates this trend with some disparate data. The trend is most obvious in the aggregate frequency across all the data.

Many thanks to the students and staff of Computer Science 312 at UBC during 2004–2005 for helping to develop, critique, and collect data for this assignment.

Meta-data

Summary Natural Prestidigitation — write a simple number-crunching program to explore a bizarre phenomenon in the distribution of first digits of numerical data (called Benford's Law) in real, student-collected data.
Topics
Simple, mathematically-focused, procedural-style programming. Uses loops, conditionals, integer division, modulo, and a bit of arrays and I/O.
Audience
Appropriate for intro programming. (Originally designed for a third year functional programming course.)
Difficulty
A fairly easy assignment, doable in 6 hours outside class for students most of the way through CS1. (3rd year UBC students just learning their first functional programming language finished in an average of 4 hours outside of class, including data collection.)
Strengths
This is a simple, solid practice problem of the sort our students often request. The niftyness comes with the Benford's Law tie-in, which gives students a taste of the power of computing in scientific exploration, uses real data that students collect on their own, and reveals a fascinating natural property of numbers. Also your only opportunity to explain that logarithms once came in books and catch financial fraudsters in one fell assignment! (For more on both, see the resources below.)
Weaknesses
Unlike most nifty assignments, this one appears rather dull. Even once under the surface, not all students will find the mathematics of nature exciting! Finally, the non-central use of arrays impedes use of the assignment earlier in CS1.
Dependencies
Requires a good grasp of integer division and modulo. Requires some sophistication with loops and conditionals: although no nested loops/ifs and no complex conditions are used, the edge cases can be tricky. Requires basic ability to understand, write, and use functions. Makes very light use of I/O and a one-dimensional array. Works fine in any language.
Variants
Various concepts can be emphasized or de-emphasized with minor changes:
  • Emphasize design: remove the functional decomposition; require students to create and submit their own
  • Remove arrays: process each datum as it's read (rather than collecting them in an array). Use ten separate variables for the ten digit counts. Requires one big cascading if/then and some ugly repeated code for printing out.
  • Emphasize arrays: change the functional decomposition so that input numbers are explicitly split into a list of their digits
  • De-emphasize loops: process each datum as it's read (rather than using one loop to collect data and another to iterate over the resulting list). Implement countDigits as floor(log(num)) + 1 with a special case for 0. To pick out the nth digit back, divide by 10n and mod by 10. Cuts down to just one simple loop at the expense of emphasizing some more complex math.
  • Others: emphasize data cleaning, use a true database, emphasize graphics by plotting results, etc.

Supporting Materials

You'll be wanting the assignment itself. Before you give the assignment, the key thing you need to do is understand Benford's Law. Since MathWorld is authoritative but impenetrable, you might check out some other Benford's Law resources: Joe Walthoe's Benford article is much gentler and runs the gamut from history to derivation of the law. For the accountant's take, try the AICPA article on Benford's and fraud detection.

Here's some sample data files to get you started:

Meta-Commentary

The assignment document intentionally does not mention Benford's Law. Unlike a lot of nifty assignments, this one actually comes off as rather dull on first inspection. The only exciting part is that students are invited to acquire and process their own data, which may be a prospect of limited interest unless they discover Benford's odd pattern in the data on their own.

Instead, this assignment offers three nifty features:

That said, to milk the niftyness of this assignment, it is absolutely crucial that you introduce it and wrap it up carefully. In your introduction, make clear that, while the assignment seems straightforward, the code they write has the potential reveal some fascinating properties of real data in the world around us! Ask students to predict a distribution of first digits in real data and why that distribution should occur.

In your conclusion, bring together as many sets of submitted student data as possible and graph them individually and as a combined set. Talk about the pattern that emerges. Discuss anomalies and their effect on the overall pattern! (For example, one of my students found housing prices for a certain neighborhood. Most data was in a very tight range, so that many first digits appeared rarely. Nonetheless, for the range represented, the first digits followed Benford's Law well!) Discuss Benford's Law and point students at some web resources for further inquiry if they're interested.

P.S. Several people have used the title "Looking Out For Number One" for Benford's Law related articles. I didn't use that title because I didn't think of it until I saw others' work with the name. Still, it's a better name than "Natural Prestidigitation". You make your own choice :)


Steve Wolfman
Vancouver, British Columbia, Canada
2005/08/25

Extra info about this assignment: