An extended datafile itcextwww-big.txt is at disposition. This file is created, using the same homepages as itcwww-big.txt, which can be found at the mainpage of the Search Engine project, but it contains more information about the homepages. The format of the extended datafile can be illustrated by the following example:
*PAGE:http://www.it-c.dk *TITLE:IT-C's homepage *DIST:0 Here another page is referenced *REF:http://www.it-c.dk/secondpage.html at which this is written *PAGE:http://www.it-c.dk/thirdpage.html *TITLE:A third and different page *DIST:1 Here some very +0important and some +4less important is written
The format is the same as for the other datafiles with the following changes:
A typical use of the extended datafile would be showing the user the titles of the homepages where a word can be found
Another use is trying to show the most important pages first. For example the search engine google uses the way different pages are referenceing each other to decide how relevant different pages are.