Summary | Encode
and decode files using a Hamming (7,4) algorithm. |
Topics |
Error
correcting codes Matrix multiplication Binary representations of data Binary I/O (optional) |
Audience |
Appropriate
for CS2 or a later
course. A simplified version could be used in CS1. |
Difficulty |
This
is an intermediate assignment, taking 1 or 2 weeks for a CS2
student. |
Strengths |
Integrates
disparate topics from programming and discrete math into a cool
application. Solves a classical problem in Computer Science. |
Weaknesses |
Like
so many other topics in CS, there are some very inelegant ways to
approach parts of the problem. For example, if encoding
Integers, Java's Integer
class contains a method: static
String
toBinaryString(int i), which
students can use to pull apart the Integer into its binary
representation rather than use the more appropriate bitwise operators. |
Dependencies |
Requires
an understanding of matrix multiplication and binary representations of
data, including bitwise operators and binary I/O. |
Variants |
There
are three steps to implementing Hamming codes and you can make
each easier or more difficult. Make the Data Bits Accessible Easy - Give the students an array of bits. Harder - Give the students an array of Integers and make them pull out the bits of each using bitwise operators. Most Difficult - Give the students an arbitrary file and make them read it byte by byte pulling the bits out of each byte. Calculate the Parity Bits Brute Force, but Easy to Understand - Add the appropriate data bits together to calculate each parity bit. Easy, but Harder to Understand - Use matrix multiplication (mod 2) to generate the code block. Most Difficult - Have the students develop the matrices needed on their own. Storing the Results Easy - Have the students write the code bits to the screen. Harder - Append a leading 0 to each code block and have the students write the byte to a file. Most Difficult - Have the students develop a BitStream class and write the results to a file using it. |
(The following discussion borrows, in part, from Wikipedia's Hamming code page.) Hamming codes are error-correcting codes. They are named after their inventor, Richard Hamming. Hamming was working for Bell labs in the 1940s. He grew frustrated with how often he had to restart his programs because a read error occurred. He developed Hamming codes as a way of detecting and correcting errors. Hamming codes can detect and correct single-bit errors or can detect (but not correct) up to two simultaneous bit errors.
Bit
position |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
... |
|
Encoded
data bits |
p1 |
p2 |
d1 |
p3 |
d2 |
d3 |
d4 |
||
Parity |
p1 |
X |
|
X |
|
X |
|
X |
|
p2 |
|
X |
X |
|
|
X |
X |
||
p3 |
|
|
|
X |
X |
X |
X |
Shown are 7 encoded bits (3 parity, 4 data) but the pattern continues indefinitely. The key thing about Hamming Codes that can easily be seen from visual inspection is that any given bit has a unique parity bit coverage. For example, the only data bit covered by p1 and p3 (and no other bits) is bit 5 (d2). It is this unique bit coverage that lets a Hamming Code correct any single bit error.
Suppose we
want to use Hamming (7,4) to encode the byte 1011 0001.
The first thing we will do is split the byte into two Hamming code
data blocks, 1011 and 0001.
We expand the first block on the left to 7 bits: _ _ 1 _ 0 1 1.
The first missing bit (bit 1) is 0, because adding bits 3, 5 and 7
gives an even number (2).
The second missing bit (bit 2) is 1, because adding bits 3, 6 and 7
gives an odd number (3).
The last missing bit (bit 4) is 0, because adding bits 5, 6 and 7
gives an even number (2).
This means our 7 bit block is: 0 1
1 0
0 1 1
We expand the
second block to 7 bits using similar logic, giving: 1 1
0 1
0 0 1
Extra info about this assignment: