Click Models for Web Search (1)-Basic Click Models

Source: Internet
Author: User

This article mainly introduces some basic click Model, these different click model to the user and search results page interaction behavior make different assumptions.

To define a model, we need to describe the observed Variables,hidden variables, and the associations between them, and their dependencies on model parameters. After we have obtained the model parameters, we can make a CTR estimate, or calculate the maximum likelihood estimate of the data.

1. RANDOM CLICK MODEL (RCM)

This is the simplest model, with only one parameter:

This means that the probability of each document being clicked is the same, that is. The estimation of parameters is relatively simple, which is equivalent to calculating the global Ctr. When we get the value, we can use a simple Bernoulli distribution to estimate the user's click behavior.

This model is very simple and can be seen as a baseline. In addition, this model is basically not overfitting because it has only one parameter.

2. Click-through rate MODELS (CTR)

We can build on the random click Model, not just a simple model parameter. The model parameter described below is primarily related to the sort of document, or query-document pair.

2.1 rank-based CTR MODEL (rctr)

One of the most frequently used data in the study of click Logs is the CTR at different locations. According to the paper published in 2005 by Joachims and others, the CTR of the 1th document is about 0.45, and the 10th-document CTR is less than 0.05. Based on this observed phenomenon, we can construct a model that relates the location of the click probability of a document:

When using maximum likelihood estimation (MLE) to estimate model parameters, it can be seen as a direct calculation of the training set's CTR at different positions. RCTR is also a simple model that does not encounter the risk of overfitting, often seen as baseline.

2.2 document-based CTR MODEL (dctr)

Another way of thinking is to model the click-through rate of query-document pair:

DCTR constructs a parameter for every query-document pair, so dctr is easier to rcm,rctr than overfitting when we use past observational data to predict new data. This is especially true when query or document has not been present in previous data.

3. Position-based MODEL (PBM)

Many click Model will refer to the so-called examination hypothesis:

This means that a user clicked on a document when and only if he examine the document, and the document was attractive. Parameters and can be considered as independent. The model corresponds to the Bayesian network as shown:

This parameter is used to denote the attractiveness of this document under the corresponding query. One thing to emphasize is that the attractiveness here is the feature of the summary that the document displays on the search results page, not the entire content of the document. Although the two are relevant, they cannot be regarded as equal.

Joachims in a paper published in 2005, the probability of a user examine a document depends primarily on the location of the document in the search results page, and the lower the probability of the position being lower. To consider this feature in model, we introduce examination parameters for each location. The position-based model (PBM), which was presented by Craswell and others in 2008, can be expressed as the following formula:

4. CASCADE MODEL (CM)

The cascade model, proposed by Craswell and others in 2008, assumes that the user viewed the document from top to bottom on the search results page until he finds a relevant document. Based on this assumption, the document of the 1th position can be considered to be always examine, and documents starting from the 2nd position will be examine only if the previous document has been examine and not clicked. The formula for the Cascade model is expressed as follows:

The parameters of the Cascade model are not difficult to estimate because all examine events are observed (deterministic) under the model. Cascade model assumes that all documents above the first click of document have been examine. The Cascade model only models events that have only one click behavior in a session, and it cannot interpret non-linear examination patterns. The Bayesian network of the Cascade model is as follows:

The main difference between the Cascade model (CM) and the position-based model (PBM) is that the click probability of a document in PBM is independent from its previous documents, and CM is not. In addition, CM does not allow more than one click in a session, while PBM is possible.

5. USER browsing MODEL (UBM)

The user browsing Model (UBM), proposed by Dupret and Piwowarski in 2008, is an extension of PBM, which contains the idea of Cascade model. UBM that examination probability, although position-based-based, but also need to consider the previous click on the situation in:

Therefore, the above formula can be written as:

The Bayesian network of the UBM is shown. The arrow on the left indicates the Click event that examination probability depends on. In turn, it will affect the examination probability of the document behind it.

6. DEPENDENT CLICK MODEL (DCM)

Guo and others in 2009 the dependent Click Model (DCM) is an extension of the Cascade model that can model events with multiple clicks in a session. This model assumes that when a user clicks on a document, it is still possible to examine other documents. That

This is continuation parameter, which relies only on the location of the document.

In order to be consistent with the model that will be encountered later, and in order to simplify parameter estimation, we introduce satisfaction variables to represent the satisfaction of the user after a single click has occurred.

The Bayesian network for DCM is as follows:

7. CLICK CHAIN MODEL (CCM)

Click Chain model on the basis of DCM further, the author introduced a parameter to solve the user without any click behavior to abandon the search situation. They also updated the continuation parameter so that it does not depend on the location of the document, but relies on the relevance of query-document. CCM can be represented by a formula as:

There are 3 constants (continuation parameters).

In the same way, we introduce satisfaction variable:

The CCM assumes that the probability of satisfaction is equal to the probability of attractiveness. This is a strong assumption, because satisfaction is determined by the content of document (after a click), while attractiveness relies primarily on the document's summary on the search results page.

It is important to note that the CCM is by far the only one to differentiate between continuation probability and clicking on the document DISSATISFY continuation probability model. The Bayesian network of the CCM is as follows:

8. DYNAMIC BAYESIAN NETWORK MODEL (DBN)

The Dynamic Bayesian network model (DBN), presented by Chapelle and Zhang in 2009, is another different form of extension of Cascade model. Unlike CCM,DBN, it is assumed that the satisfaction of the user after clicking on a document is different from the document's attractiveness (or perceived relevance), but actual Relevance:

This indicates that the user did not click on the document or clicked after the continuation probability without satisfy. The Bayesian network of DBN is as follows:

Simplified DBN MODEL (SDBN)

Parameter estimation can become much easier when you assume it.

When we further assume that the user clicks on a document and always sarisfy, that is, the value is always 1, the SDBN degrades to the Cascade model.

9. CLICK Probabilities

The click Model is used to model the user's click behavior on the search results page. The basic Click Models we're discussing is able to calculate the click probability of a given document and the probability of clicking on a particular document after a click event in the same session. The former can be used to estimate the CTR of document, which can be used to simulate click behavior, or to calculate maximum likelihood.

For RCM,RCTR,DCTR,PBM These simple model, the examination probability of a document does not depend on the click in front of the document. So for these model, there is. For rcm,rctr,dctr, these probabilities are directly equal to model parameters:. For PBM, it is.

For the rest of basic Click Models, document examination relies on the document above it. For CM,DCM,CCM,DBN,SDBN, there are:

The difference between the model is the calculation of examination probability. The general rule for the above models is:

For example DBN model:.

For CM,DCM,CCM,DBN,SDBN, conditional probability can be calculated using the following formula:

It indicates whether to click in a specific query session again. Again because:

So the final calculations are:

In addition, for UBM model, the calculation process is somewhat different.

A summary of the calculations for each of the above model:

Ten. SUMMARY

The above model has many similarities.

1) Attractiveness variable

2) Satisfaction Variable

3) Define how the examination probability depends on the examination, satisfaction and click events for previous documents:

Finally, one more summary chart:

Click Models for Web Search (1)-Basic Click Models

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.