Relationship Between Two Conceptual Models and Algorithms

Source: Internet
Author: User

Relationship Between Two Conceptual Models and Algorithms

Before introducing a specific link analysis algorithm, we first introduce two conceptual models and describe the relationship between the link analysis algorithms, this helps readers understand the basic ideas and inheritance relationships of each algorithm from a macro perspective.

Random Walk Model)

When Internet users access the Internet, they often have similar network behavior: Enter the website address, browse the page, and then follow the Page Link to open new pages. The Random Walk model is an abstract conceptual model created for user behavior Browsing Web pages. This abstract conceptual model is created because many link analysis algorithms, including PageRank algorithms, are built on the basis of the Random Walk Model.

Figure 6-4 shows the Random Walk Model. In the initial phase, the user opened the browser to browse 1st webpages. Suppose we have a virtual clock for timing. At this time, we can set the time to 1. After reading the webpage, if you are interested in the page pointed to by a link in the webpage, click the link to go to the 2nd page. At this time, the virtual clock times again and the clock goes to number 2, if a webpage contains k outbound links, the probability that the user jumps to any link from the current page to the page is equal. Users repeatedly repeat the above process and jump between pages with links pointing to each other. If you are not interested in browsing any links on a page, you may enter another URL in your browser to directly access the page, this behavior is called teleporting ). Suppose there are m in the Internet
Page, the probability that the user remotely jumps to any page is equal, that is, 1/m. The Random Walk Model is a conceptual model that abstracts the user browsing behaviors of direct jump and remote jump.

 

Next we will give an example of a specific Random Walk model. For simplicity, this example does not introduce remote jump behavior.

In the example shown in 6-5, assume that the Internet is composed of three webpages, A, B, and C. The directed edge between the page nodes in the link between them is shown. The transfer probability between page nodes can be calculated based on the link relationship. For example, for node A, there is only one outbound link pointing to Node B. Therefore, the probability of jump from node A to Node B is 1, node C has links to node A and Node B, so the probability of turning to any other node is 1/2.

 

 

 

Assume that at moment 1, the user browses page A, then enters page B through the link, and then enters Page C. There are two possible options: Jump to page a or page B, the two have the same probability, and both are 1/2.

Assume that the Internet in this example contains more than three pages, but 10 pages. At this time, you do not want to jump back to page a or page B, you can jump to any other page according to the probability of 1/10, that is, remote jump.

Subset Propagation Model

The subset propagation model is a conceptual model abstracted from many link analysis algorithms (see Figure 6-6 ). The basic idea is to divide an Internet webpage into two or more child sets according to certain rules during algorithm design. A subset has special properties. Many algorithms usually start from this subset with special properties and give the initial weight of the webpage in the subset, then, based on the link between the webpage and other webpages in this special subset, the weights are transmitted to other webpages in a certain way.

 

Some of the link analysis algorithms introduced in this chapter comply with the subset propagation model, such as the HITS algorithm, Hilltop algorithm, and its derivative algorithms. In the "anti-cheating on webpages" chapter (chapter 8th) more link analysis algorithms matching the model are displayed.

The subset propagation model is a highly abstract algorithm framework. Many algorithms can be considered as examples of this framework, that is, the overall idea is shown in the process described above, there are usually differences in the following aspects.

• How to define a special subset, that is, how to determine the special nature and algorithm of the webpage in the subset

Different rules.

• After determining the nature of a special sub-set, how can we give some

Initial score? Different algorithms have different scoring methods.

• How does one spread the score from a special subset to other webpages? The propagation distance is:

How far? Different algorithms are also quite different at this stage.

Note: The subset propagation model is an abstract model summarized by the author from a specific link analysis algorithm. It has not been explicitly proposed in any document. Please read it carefully.

Link Analysis Algorithms

So far, academic circles have proposed many link analysis algorithms. Figure 6-7 lists some influential algorithms and their relationships, the arrow links between different algorithms in the figure represent the improved relationship between algorithms. For example, the salsa algorithm integrates the basic idea of PageRank and hits algorithms. The relationships represented by other algorithms are similar. We can see the inheritance relationships between algorithms in the figure.

 

Although there are many link algorithms, the concept model basically follows the Random Walk Model and subset propagation model described in the preceding section. It can be seen that PageRank and hits algorithms are two of the most important representative link analysis algorithms, many subsequent link analysis algorithms are derived from these two algorithms.

In the subsequent sections of this chapter, we will introduce PageRank algorithm, HITS algorithm, salsa algorithm, topic-sensitive PageRank algorithm, and hilltop Algorithm in detail. Most of these algorithms have been used by different commercial search engines and play an important role in real life. For other link analysis algorithms listed in figure 6-7, we will briefly describe their principles at the end of this chapter.

 

 

 

 

-- This text is excerpted from "this is a search engine: detailed explanation of core technologies"

Book details: http://blog.csdn.net/broadview2006/article/details/7179396

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.