Natural Prestidigitation

The natural world is full of hidden and beautiful mathematics. The whorls of a conch shell hide the Fibonacci sequence and its Golden Ratio, plants grow in fractal patterns, and comets trace hyperbolic patterns through the solar system. All those beautiful patterns hide in the grungy data of human observation.

So, what are the populations of every town in Quebec and the number of posts by various authors to a hockey bulletin board hiding from you?

Assignment

In this assignment, you will write a program that determines the distribution of initial digits in a set of data. In the end, we want a program that reads in a number n and a list of numbers nums and outputs a list of 10 values: the frequency with which each digit 0–9 appears as the n^th digit of one of the input numbers. However, we'll break that problem down into easier steps. (Note: throughout this problem, you may assume that the numbers processed are non-negative or you can use the absolute value function to help you handle negative numbers in a reasonable way.)

Write a function countDigits. countDigits(num) calculates the number of digits in the integer num. countDigits should evaluate to 1 for 0–9, 2 for 10–99, 3 for 100–999, etc. Hint: use repeated division by 10 to calculate the number of digits in num. (There's also a tricky solution using logarithms that avoids the repeated division!)
Write a function nthDigitBack. nthDigitBack(n,num) finds the n^th lowest order digit in num, i.e., the n^th digit from the right. We take the rightmost digit to be the 0^th digit. nthDigitBack should evaluate to 0 for digits beyond the "start" of the number. Hint: use repeated division by 10 again, followed by the modulo operator to pick out just the rightmost digit. (You could also use exponentiation to avoid the repeated division!). For example:
- nthDigitBack(0,123) ⇒ 3
- nthDigitBack(1,123) ⇒ 2
- nthDigitBack(2,123) ⇒ 1
- nthDigitBack(3,123) ⇒ 0
- nthDigitBack(0,0) ⇒ 0
- nthDigitBack(3,18023) ⇒ 8
Write a function nthDigit, using nthDigitBack and countDigits. nthDigit(n,num) finds the n^th highest order digit of num, i.e., the n^th digit from the left. We take the leftmost digit to be the 0^th. nthDigit should evaluate to 0 for digits beyond the "end" of the number. For example:
- nthDigit(0,123) ⇒ 1
- nthDigit(1,123) ⇒ 2
- nthDigit(2,123) ⇒ 3
- nthDigit(3,123) ⇒ 0
- nthDigit(0,0) ⇒ 0
- nthDigit(3,18023) ⇒ 2
Write a function nthDigitTally1, using nthDigit. nthDigitTally1(n, num, tally) assumes that tally is a 10 element list tallying the number of n^th digits seen so far. It updates tally to reflect the n^th digit of num. In other words, if d is the n^th digit of num, then increment the d^th element of tally. Examples:
- nthDigitTally1(2, 1072, [0,0,1,2,0,0,3,0,9,0]) ⇒ [0,0,1,2,0,0,3,1,9,0]
- nthDigitTally1(0, 2541, [0,0,1,2,0,0,3,0,9,0]) ⇒ [0,0,2,2,0,0,3,0,9,0]

Write a function nthDigitTally, using nthDigitTally1. nthDigitTally(n, nums) returns a tally of frequencies of 0–9 as the n^th digits of all the numbers in nums.

Here's a sample test case. These are enrollments in Research Triangle Park colleges and universities in Fall 2000 (thanks to the "Research Triangle Regional Partnership" website: http://www.researchtriangle.org/data/enrollment.html).

Institution Enrollment

Duke University 12176

North Carolina Central University 5476

Louisburg College (Junior College) 543

Campbell University 3490

University of North Carolina at Chapel Hill 24892

North Carolina State University 28619

Meredith College 2595

Peace College 603

Shaw University 2527

St. Augustine's College 1465

Southeastern Baptist Theological Seminary 1858

Assume the variable enrollments contains the enrollment numbers from that table. Then:

Institution	Enrollment
Duke University	12176
North Carolina Central University	5476
Louisburg College (Junior College)	543
Campbell University	3490
University of North Carolina at Chapel Hill	24892
North Carolina State University	28619
Meredith College	2595
Peace College	603
Shaw University	2527
St. Augustine's College	1465
Southeastern Baptist Theological Seminary	1858

nthDigitTally(1, enrollments) ⇒ [0,3,4,1,0,2,1,0,0,0]

Write a function readMysteriousNumbers that reads whitespace-separated integers from input (terminated by end-of-file) and returns a list of the numbers suitable as input to nthDigitTally. Here's the university enrollment data from above:
```
12176
5476
543
3490
24892
28619
2595
603
2527
1465
1858
```
From this, readMysteriousNumbers should produce the list [12176, 5476, 543, 3490, 24892, 28619, 2595, 603, 2527, 1465, 1858].
Optional: To be human-readable, the data files should also allow labels for the data. We'll accomplish this by allowing commenting in the input file. Change readMysteriousNumbers to ignore anything between (* and *). (You may assume that the (* and *) symbols will be surround by whitespace and that nested comments — comments inside other comments — are not allowed.) Now, the unversity data set can look like:
```
(* Duke University *)                             12176
(* North Carolina Central University *)           5476
(* Louisburg College (Junior College) *)          543
(* Campbell University *)                         3490
(* University of North Carolina at Chapel Hill *) 24892
(* North Carolina State University *)             28619
(* Meredith College *)                            2595
(* Peace College *)                               603
(* Shaw University *)                             2527
(* St. Augustine's College *)                     1465
(* Southeastern Baptist Theological Seminary *)   1858
```
Finally, compose your main program to read a number n from input followed by a data set. The program should tally the n^th digits of the numbers in the data set and print out a table of the results. For example, given:
```
1
12176  5476  543  3490  24892  28619
2595   603   2527 1465  1858
```
Your program should print:
```
0s: 0
1s: 3
2s: 4
3s: 1
4s: 0
5s: 2
6s: 1
7s: 0
8s: 0
9s: 0
```

You need to submit your commented code implementing each step and a couple of tests to show that your code works.

Optional: If you want to find the patterns hidden in the numbers around you, try the following three-part bonus problem:

Find a data source on the web that no one else has used (see next part) and transform it into a format suitable for input to readMysteriousNumbers. The data must all be separate measurements of a single type of phenomenon. For example: measurements of university/college enrollments across different institutions (like above) or at the same institution across different years; measurements of the flow rates of all the major rivers in British Columbia; measurements of the height of 10000 randomly chosen Vancouver residents; measurements of the number of hits per day on the UBC computer science website over three years; measurements of the length in characters of each article in the Wikipedia; measurements of the population of the 1000 largest cities and townships in Canada; etc. Furthermore, there must be at least 250 measurements in the list (but more would be better!).
Post all of the following items to the course newsgroup with the title "Natural Prestidigitation Data": the URL for your data source, a description of the data source, one attachment with bare data suitable for readMysteriousNumbers, and one attachment with labelled data (using the (* *) style above).
Submit with your assignment the URL of your data, a description of the data source, and digit tallies for digit 1 and digit 2 of your data (using nthDigitTally). Are there any oddities in the tallies? What about in other students' data?