How to do user intent recognition

Source: Internet
Author: User

What is user intent recognition. is to allow the search engine to identify the most relevant information with the user input query, such as user input query "Paladin biography", we know that "Paladin biography" both the game and there are TV dramas and news, pictures and so on, if we through the user intent to identify the user is want to see the "Paladin biography" drama, Then we directly return the TV series as a result to the user, it will save the user's search clicks, shorten the search time, greatly enhance the use of experience.

General Search and vertical search
Universal search is to crawl the Internet page, in the form of index and keyword matching, the title of the page, summary, URL and other information displayed "1" ^{"1"}. Vertical search for a specific area, search results are limited to the field, such as product search, recruitment search, and so on, an example is shown in Figure 1:



Because vertical search has limited the user's intentions to specific areas, the accuracy of search results is also high. How to understand the user's search needs is the intention in the general domain. Then you need to use the technology of intent recognition.

Intent Classification
Andrei Broder The requirements of user queries are divided into three categories through survey statistics:
(1) Navigation type
The purpose of the user is to access the corresponding pages, such as search "Tsinghua University" to enter the official website of Tsinghua University.
(2) Information type
The purpose of the user is to obtain information about the Web, such as "What is the status of Kobe Bryant?" "Check out the page information about Kobe Bryant's recent status.
(3) transaction type
The purpose of the user is to perform a series of transactional operations, such as searching for "Tmall" into Tmall shopping.

Of course, in the specific use, but also not limited to the three categories, you can define the category, for example, can be divided into maps, news, quiz, sports, games, film and television works and so on. Therefore, in the case of a given user intent classification, the user intent recognition becomes a classification problem, that is, using query string and related information to get the query intention category, and then make further inquiries under the specific intention category.

Intent Recognition
How to do the intention to identify it.
1. Data Set
The intent to identify the data, the search field of intent to identify the use of data is usually the user's search log. Typically a search log record includes information such as the time-query string-click on the URL record-the location in the results.
2. Data cleaning
Get the log data, is generally not directly used, it will contain a lot of noise data, useless information, we need to wash it off.
3. Query extension
As mentioned above, the intent recognition can be regarded as the problem of text classification, but only rely on query string is certainly not, the information provided is too little, so often do is to consider using the previous search log information, such as the history of the query string corresponding to the title, time, synonyms and other information. Some scenarios also include location information, such as map searches.
Some of the information you can use to extend is:
(1) Click on the title. Usually in the search log, will be a session in the unit, a session is saved in a time period of the relevant search information, we can use the Information field is the query string-click Title-click Times-time, etc., in different sessions of the same query may correspond to the same click Record, We can merge them together and put the headings in the query document;
(2) similar query string. Similarly, we can also use the different queries of the same click record;
(3) In addition, synonyms of the forest, the use of Word2vec to get a collection of synonyms can be extended.
So that we can get a more informative query document. It should be noted that the different queries under the same session, if there is an incremental relationship, indicating that the user in accordance with the search results to modify the query, then the new words should be the intention to classify the role of a large, such as Figure 1 "3" ^{"3"}:


Figure 1

4. Feature Engineering
Using the TFIDF to quantify the query document above, we can get a eigenvector, in general, this feature vector dimension will be very high, we can use the word frequency, chi-square, mutual information and other methods to select the feature, preserving more useful feature information.

We can also add some digital features to the inside, for example:

(1) Length of query
(2) Frequency of query
(3) Length of title
(4) The frequency of title
(5) BM-25
(6) Query the first word, tail word, etc.

Some statistical features can also be considered, such as the paper "2" ^{"2"} refers to the different page hits DPCN (Different page click Number), PCNs (page click numbers without subpage), etc. which
(1) different page hits the number DPCN, indicates the user to the query string return result the click Statistics, because the author statistics found for the Navigation class query, the user goal is very clear, usually only clicks one or two pages to complete the query, but corresponds the information transaction type to click the different page number more, For example, the author statistics show that when the number of different page hits is not greater than 7 o'clock, the query string navigation intention accounted for 66.7%, more than 7 o'clock, the information transaction intention accounted for 83%.
(2) The different source page hits the number PCNs, indicates the query string return result, to click the frequency highest page as the benchmark, the difference between the page click Number and its sub-page number. For example, for a query string w, different page hits DPCN is 17, and the most frequently clicked Page sub-page appears 15 times, then the number of different source page hits is 17-15 = 2, the purpose is to eliminate the same page of sub-pages to calculate a different page case.
5. Classifier Training
After completing the feature task, the next step is to select the appropriate classifier for training, because the intent recognition can be regarded as a multi-classification task, so it is usually possible to choose SVM, decision tree, etc. to train the classifier.

Reference
1. The purpose of the query is identified. Zhang Fan.
2. Automatic recognition of query intent in the query log. Li yu.
3. CIKM Competition Data Mining contest winning algorithm Chen Yunwen
4. User query Intent detection based on query log. Dong.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.