The extended datafile

About the extended datafile

An extended datafile itcextwww-big.txt is at disposition. This file is created, using the same homepages as itcwww-big.txt, which can be found at the mainpage of the Search Engine project, but it contains more information about the homepages. The format of the extended datafile can be illustrated by the following example:

*PAGE:http://www.it-c.dk
*TITLE:IT-C's homepage
*DIST:0
Here
another
page
is
referenced
*REF:http://www.it-c.dk/secondpage.html
at
which
this
is
written
*PAGE:http://www.it-c.dk/thirdpage.html
*TITLE:A third and different page
*DIST:1
Here
some
very
+0important
and
some
+4less
important
is
written

The format is the same as for the other datafiles with the following changes:

Possible Uses

A typical use of the extended datafile would be showing the user the titles of the homepages where a word can be found

Another use is trying to show the most important pages first. For example the search engine google uses the way different pages are referenceing each other to decide how relevant different pages are.