Hamming Codes SiGCSE 2011

Hamming Codes

Stuart Hansen
UW-Parkside
hansen@uwp.edu

Summary

Encode and decode files using a Hamming (7,4) algorithm.

Topics

Error correcting codes
Matrix multiplication
Binary representations of data
Binary I/O (optional)

Audience

Appropriate for CS2 or a later course. A simplified version could be used in CS1.

Difficulty

This is an intermediate assignment, taking 1 or 2 weeks for a CS2 student.

Strengths

Integrates disparate topics from programming and discrete math into a cool application.
Solves a classical problem in Computer Science.

Weaknesses

Like so many other topics in CS, there are some very inelegant ways to approach parts of the problem. For example, if encoding Integers, Java's Integer class contains a method: static String toBinaryString(int i), which students can use to pull apart the Integer into its binary representation rather than use the more appropriate bitwise operators.

Dependencies

Requires an understanding of matrix multiplication and binary representations of data, including bitwise operators and binary I/O.

Variants

There are three steps to implementing Hamming codes and you can make each easier or more difficult.

Make the Data Bits Accessible
Easy - Give the students an array of bits.
Harder - Give the students an array of Integers and make them pull out the bits of each using bitwise operators.
Most Difficult - Give the students an arbitrary file and make them read it byte by byte pulling the bits out of each byte.

Calculate the Parity Bits
Brute Force, but Easy to Understand - Add the appropriate data bits together to calculate each parity bit.
Easy, but Harder to Understand - Use matrix multiplication (mod 2) to generate the code block.
Most Difficult - Have the students develop the matrices needed on their own.

Storing the Results
Easy - Have the students write the code bits to the screen.
Harder - Append a leading 0 to each code block and have the students write the byte to a file.
Most Difficult - Have the students develop a BitStream class and write the results to a file using it.

Review of Hamming Codes

(The following discussion borrows, in part, from Wikipedia's Hamming code page.) Hamming codes are error-correcting codes. They are named after their inventor, Richard Hamming. Hamming was working for Bell labs in the 1940s. He grew frustrated with how often he had to restart his programs because a read error occurred. He developed Hamming codes as a way of detecting and correcting errors. Hamming codes can detect and correct single-bit errors or can detect (but not correct) up to two simultaneous bit errors.

Hamming codes require O(lg(n)) parity bits for n data bits. Each parity bit checks some (but not all) of the data bits. If an error occurs in a data bit, all the parity bits that checked that bit will show the error, allowing us to uniquely determine where the error occurred. The parity bit at position 2^k-1 checks bits in positions having bit k (1 based) set in their binary representation. The following table summarizes the technique.

Bit position		1	2	3	4	5	6	7	...
Encoded data bits		p1	p2	d1	p3	d2	d3	d4
Parity bit coverage	p1	X		X		X		X
	p2		X	X			X	X
	p3				X	X	X	X

Shown are 7 encoded bits (3 parity, 4 data) but the pattern continues indefinitely. The key thing about Hamming Codes that can easily be seen from visual inspection is that any given bit has a unique parity bit coverage. For example, the only data bit covered by p1 and p3 (and no other bits) is bit 5 (d2). It is this unique bit coverage that lets a Hamming Code correct any single bit error.

It also shows how a two-bit error can be detected but not corrected. For example, if bits 1 (p1) and 2 (p2) were flipped then this would be confused with bit 3 (d1) being flipped since the parity bit coverage of bit 3 is bits 1 and 2.

Hamming codes can be described in terms of the total number of bits per block and the number of data bits. Hamming(7,4) encodes 4 data bits into 7 bits by adding three parity bits, as the table above.

Example

Suppose we want to use Hamming (7,4) to encode the byte 1011 0001.
The first thing we will do is split the byte into two Hamming code data blocks, 1011 and 0001.
We expand the first block on the left to 7 bits: _ _ 1 _ 0 1 1.
The first missing bit (bit 1) is 0, because adding bits 3, 5 and 7 gives an even number (2).
The second missing bit (bit 2) is 1, because adding bits 3, 6 and 7 gives an odd number (3).
The last missing bit (bit 4) is 0, because adding bits 5, 6 and 7 gives an even number (2).
This means our 7 bit block is: 0 1 1 0 0 1 1

We expand the second block to 7 bits using similar logic, giving: 1 1 0 1 0 0 1
7 bits do not a byte make, so we can add a leading 0 to each code block.

Commentary

CS Students still like to know how things work. Most students find a single parity bit easy to understand. The additional parity bits are something many of them haven't encountered before, but they can understand how they work quite easily.

The assignment integrates matrix multiplication with binary representations of data, two topics typically covered early in a CS curriculum, but often not with an associated programming project.

CS students find alternative data representations cool. This assignment is in the same spirit as Huffman Trees, LZW compression and RSA encryption. It stretches students' brains into different ways of thinking about encoding data and they like it.

Hamming Codes

Review of Hamming Codes

Example

Correcting Errors

Commentary

Assignment Documents

SIGCSE Presentation