CIS 22 - Data Structures

Extending the Browser: Site Listing


Overview

For this assignment, you will extend the browser that you have been working on by adding a site listing option. The idea is that if you start at the home page for a web site, the program should display a list of ALL the web pages that are reachable from that home page. You do not actually display each of these pages! You just provide a list of all the possible pages that can be reached starting at the current page.

Implementation

Your program already produces a list of links from the given page. You will be using those links to look further and find ALL the reachable pages.

The following is a pseudocode description of what to do:

Load the first page.
Find all the links.
Insert each of these links into a queue. 

Loop until the queue is empty{
	Remove a link from the front of the queue. 
	If this is a new link (see below) { 
		Load that page and find the links on that page.
		Insert those links onto the queue.
	}
}
The problem with this is that you might end up going in circles without ever ending -- if page 1 includes a link to page 2 and page 2 links back to page 1...

The solution is to maintain a list of all the "visited" links. When a link comes off the queue, it should be processed only if it has never been visited yet.

It is up to you how you implement the "visited" links (array, list, sorted, unsorted ...) BUT, you MUST document your implementation and explain in comments how you implemented it, and why you chose that implementation

Testing

Use the web pages I provided for Assignment #1 to test this. Also include additional test of your own. Make sure to handle broken links (i.e., links to nonexistent pages), and pages with no links at all.

Run your program and all the tests twice: once using the queue class implemented using an array and once using a queue class implemented using nodes.