CIS 22 - Data Structures

Part I - A Bare-Bones HTML Displayer


Overview

A Web browser is used to display HTML files and to allow a user to navigate the Web by entering specific addresses or following hypertext links. For this assignment, you will write a very simplified version of a browser that will work only on local files, (not on remote Web pages) Your HTML Displayer will be text based, not graphical. (If you have never used a text based browser, try out lynx)

The Displayer

Your displayer should open an HTML file and display its contents appropriately -- i.e., extra white space should be ignored, line breaks should be ignored. You should provide a default setting for the number of rows and number of columns to be displayed. The text should be displayed accordingly. You should keep displaying text on one line until there is no more room on that line. (depending on the number of columns). Be careful not to insert a line break in the middle of a word!

If the input text contains a < BR > tag, you should insert a newline. Similarly, if the input text contains a < P > tag, you should insert a newline plus an additional blank line. For this assignment, you can ignore all other HTML tags except < A > which will be described below.

For example, suppose you have set the number of columns to 40, and you have an input file that looks like this:

One, two            
Buckle my shoe.             
< P >
Three, four
Shut the door.
Five, six
Pick up sticks.

It would be displayed as:

One, two Buckle my shoe.

Three, four Shut the door. Five, six        
Pick up sticks.

Hypertext Links

Hypertext links are embedded in HTML files using the < A > tag. When a browser encounters a link, it needs to do several things: You have a similar task. Firstly, you have to display the text so that it stands out. For our purposes, we will simply offset the text of the link by brackets [] and number the links.

For example, suppose you have an the following input:

Here is some text.
And now < A HREF="one.html" > here is the first link. < /A >
And now we have a < A HREF ="two.html" > second link < /A >

It would be displayed as:

Here is some text. And now [1] here is 
the first link. [] And now we have a 
[2]second link []

The other thing you have to do is "remember" the target of each link. You can do that by maintaining an array corresponding to the links. Each entry of the array should contain the name of the file that is is being linked to.

User Interface

After the file has been displayed, the user should be able to make several choices:

Extra Credit

Testing

Your program must be thoroughly tested. An incomplete test will result in the assignment being returned for resubmission. You can only resubmit once-after that the assignment will no longer be accepted, so be sure you perform thorough tests the first time.

You should make sure that you test tests both for successful and for unsuccessful operations. For example, you should include a test which attempts to access a file that does not exist, as well as an invalid link number. You should display an appropriate error message in these cases.

You should also deal with some kinds of badly-formed HTML: what if a tag doesn't have a closing '>'? Or a < A > doesn't have a matching < /A >? Just as any other browser does, your code should do its best to display this HTML; you should not generate error messages for HTML syntax errors.

One set of input files can be downloaded here.

You must also make up your own input files to use for testing.

What to Submit

I do not want to look at your output on the screen. Dump the screen output, print it and submit a hardcopy. Also submit all of your input source files. (You do not need to submit the files you downloaded from here.) Obviously, submit all the program and header files.