# Hamming Codes

Stuart Hansen
UW-Parkside
hansen@uwp.edu

 Summary Encode and decode files using a Hamming (7,4) algorithm. Topics Error correcting codes Matrix multiplication Binary representations of data Binary I/O (optional) Audience Appropriate for CS2 or a later course.  A simplified version could be used in CS1. Difficulty This is an intermediate assignment, taking 1 or 2 weeks for a CS2 student. Strengths Integrates disparate topics from programming and discrete math into a cool application. Solves a classical problem in Computer Science. Weaknesses Like so many other topics in CS, there are some very inelegant ways to approach parts of the problem.  For example, if encoding Integers, Java's Integer class contains a method: static String toBinaryString(int i), which students can use to pull apart the Integer into its binary representation rather than use the more appropriate bitwise operators. Dependencies Requires an understanding of matrix multiplication and binary representations of data, including bitwise operators and binary I/O. Variants There are three steps to implementing Hamming codes and you can make each easier or more difficult. Make the Data Bits Accessible Easy - Give the students an array of bits. Harder - Give the students an array of Integers and make them pull out the bits of each using bitwise operators. Most Difficult - Give the students an arbitrary file and make them read it byte by byte pulling the bits out of each byte. Calculate the Parity Bits Brute Force, but Easy to Understand - Add the appropriate data bits together to calculate each parity bit. Easy, but Harder to Understand - Use matrix multiplication (mod 2) to generate the code block. Most Difficult - Have the students develop the matrices needed on their own. Storing the Results Easy - Have the students write the code bits to the screen. Harder - Append a leading 0 to each code block and have the students write the byte to a file. Most Difficult - Have the students develop a BitStream class and write the results to a file using it.

## Review of Hamming Codes

(The following discussion borrows, in part, from Wikipedia's Hamming code page.)  Hamming codes are error-correcting codes.  They are named after their inventor, Richard Hamming. Hamming was working for Bell labs in the 1940s.  He grew frustrated with how often he had to restart his programs because a read error occurred.  He developed Hamming codes as a way of detecting and correcting errors.  Hamming codes can detect and correct single-bit errors or can detect (but not correct) up to two simultaneous bit errors.

Hamming codes require O(lg(n)) parity bits for n data bits.  Each parity bit checks some (but not all) of the data bits.  If an error occurs in a data bit, all the parity bits that checked that bit will show the error, allowing us to uniquely determine where the error occurred.  The parity bit at position 2k-1 checks bits in positions having bit k (1 based) set in their binary representation. The following table summarizes the technique.

 Bit position 1 2 3 4 5 6 7 ... Encoded data bits p1 p2 d1 p3 d2 d3 d4 Parity bit coverage p1 X X X X p2 X X X X p3 X X X X

Shown are 7 encoded bits (3 parity, 4 data) but the pattern continues indefinitely. The key thing about Hamming Codes that can easily be seen from visual inspection is that any given bit has a unique parity bit coverage. For example, the only data bit covered by p1 and p3 (and no other bits) is bit 5 (d2). It is this unique bit coverage that lets a Hamming Code correct any single bit error.

It also shows how a two-bit error can be detected but not corrected. For example, if bits 1 (p1) and 2 (p2) were flipped then this would be confused with bit 3 (d1) being flipped since the parity bit coverage of bit 3 is bits 1 and 2.

Hamming codes can be described in terms of the total number of bits per block and the number of data bits.  Hamming(7,4) encodes 4 data bits into 7 bits by adding three parity bits, as the table above.

## Example

Suppose we want to use Hamming (7,4) to encode the byte 1011 0001.
The first thing we will do is split the byte into two Hamming code data blocks, 1011 and 0001.
We expand the first block on the left to 7 bits: _ _ 1 _ 0 1 1.
The first missing bit (bit 1) is 0, because adding bits 3, 5 and 7 gives an even number (2).
The second missing bit (bit 2) is 1, because adding bits 3, 6 and 7 gives an odd number (3).
The last missing bit (bit 4) is 0, because adding bits 5, 6 and 7 gives an even number (2).
This means our 7 bit block is: 0 1 1 0 0 1 1

We expand the second block to 7 bits using similar logic, giving: 1 1 0 1 0 0 1
7 bits do not a byte make, so we can add a leading 0 to each code block.

## Correcting Errors

Suppose we send our first code block to a file, or out across the Internet, and accidentally bit 5 gets flipped, giving: 0 1 1 0 1 1 1.
We check our parity bits and find that p1 and p3 (bits 1 and 4) show an error has occurred.
1 + 4 = 5, so we know bit 5 is incorrect and we change it back to 0.

## Commentary

Hamming codes are not new.  They have been around since the 1950s.  In the 1960s and 70s, CS curricula often included a Files course where it was common to cover topics like error correcting codes.  Modern hardware is much more reliable and error correcting codes are no longer considered core CS curriculum.  Students find this assignment engaging because:
• CS Students still like to know how things work.  Most students find a single parity bit easy to understand.  The additional parity bits are something many of them haven't encountered before, but they can understand how they work quite easily.
• The assignment integrates matrix multiplication with binary representations of data, two topics typically covered early in a CS curriculum, but often not with an associated programming project.
• CS students find alternative data representations cool.  This assignment is in the same spirit as Huffman Trees, LZW compression and RSA encryption.  It stretches students' brains into different ways of thinking about encoding data and they like it.