Two assumptions
1. Good hub pages: good for a topic hub pages links many good authoritative pages on this topic.
2. Good authoritative pages: Good authoritative pages for a topic are linked by many good hub pages for this topic.
Note: circular definition
Algorithm process:
1. Locate the root set: The user enters a query and, based on the term in query, finds documents containing at least one term in the document set to make them the root set.
2. Locate the base set: On the root set, locate the Web page in root set that is chained to or linked to and not in root set, and add them to the root set to form the base set.
3. Calculate the Hub score H (x) and authoritative score a (x) for each page. H (x) = The sum of the A values of the pages that are linked by X. A (x) = The sum of all the H values of the page that the X is linked into. (Initially, all h and a values are 1)
4. Normalization.
5. Iterate until convergence.
6. Select Top-n h worth the page as top hubs, select Top-n a value of the page as the top authoritives.
Compare to page Rank:
1. Range different Hits:base set page Rank: all pages
2. HITS: Related to query, online Page Rank: not related to query, off line
[IR Course notes] hyperlink-induced Topic Search (HITS)