Related query Mining

Source: Internet
Author: User

1. What is the relevant query

I also refer to the related query as a similar query, and a series of search terms in the search log for a user in a short period of time are called related query. Correlation is the two query between a certain relationship, reflecting the needs of users at the time. This article introduces the relevant query in the context of application search.

2. What is the role

The relevant query has a lot of role:

    • Wrong word correction: Enter an incorrect word to find the app, then enter a correct word
    • The same name interchange: and the wrong word correction is a scene, such as Chinese and English names (Cytus and music world), aliases (Ida's Dream and Monument Valley), etc.
    • Synonyms: PvP and sparring, billiards and snooker, etc.
    • Same type of application: such as QQ, American Regiment and glutinous rice etc.
    • Content supplement: such as TTKP and everyday cool run, soil and potatoes, days of scars and shaft sword and so on

When the user enters a query, the background joins its related query to search together, can better understand the user, return more and more accurate results, reduce the number of searches, increase the number of downloads, improve conversion rate.

3. How to Dig

Candidate data

Analyze daily user logs, extract search terms from the search log for a short time (15 minutes or 30 minutes) to compose candidate related query to <a, b>. The more log days of the last analysis, the better, the more data you dig out the more relevant query pairs, the more accurate the results.

Feature Extraction

    • Co-occurrence similarity: a variant of conditional probabilities that increases the penalty for large query (top query with high frequency, such as large query and other query co-occurrence is more likely)

              

    • Editing distance: reflect the two query words in the content of the similarity, such as Daily racing and daily speed , they have every day, there is a certain contribution, but should be co-existing similarity of

Model Training

    • Manual labeling of sample data, indicating that a certain amount of query pair (query pair) is related to query or not
    • Identify candidate machine learning algorithms, such as logistic regression, SVM, or decision tree, and train the model with sample data
    • Predict the original data with a well-trained model and finalize an algorithm based on the actual effect

Fill in the missing data

The main factors affecting the end result are the user search log interval and the number of log days. the specific implementation process found and big query related to the small number of query not come back, because its own search too many times. But we need big query to guide the small query.

< Monument Valley (44,736 Times), the Sky Maze (200 times), the total number of times is 89, the similarity is 0.004, the similarity is too low, resulting in the Monument Valley can not recall the sky maze.

< Sky Maze, Monument Valley > Its similarity is 0.069 and is considered relevant for query.

So we're going to reverse it once, for the relevant query pairs < Sky Maze, Monument Valley > will judge the situation under its reverse pair < Monument Valley, Sky Maze >, If the Monument Valley is found to be large query (more than a certain number of 1w) and its own similarity exceeds a certain field value (such as 0.003), we will also bring back the < Monument Valley, the Sky Maze >.

Feedback on line

The online system uses offline data (related query pairs) to supplement or recall the search results on line, display the relevant query corresponding application to the user, the user will choose to download and not downloaded. We get this data to retrain the algorithmic model.

Download List of Querya <appIds>

Find Querya related Queryb recall application from Appids: app name and Queryb edit distance above certain value, think the app is recalled by Queryb

If the number of applications downloaded by the Queryb recall exceeds a certain domain value, we assume that this is a forward case,queryb is a Querya related query

If the QUERYB recalled app is not downloaded or downloaded less than a certain number, it is considered a negative case,queryb not a Querya related query

So that we can get a real annotation data through the results of the online display, use this data to retrain the algorithm, and get a new model to re-predict the original data.

The effect of online feedback is to find the real labeling data, replace the old sample to obtain a new model, and constantly improve the accuracy of the model

Persist good case to avoid fallback

Initially <querya, queryb> is related to the query pair, whenever the user searches Querya, will come out queryb results. After a long time, the user input Querya will not enter the Queryb, which will lead to a certain period of time can not dig out the similar pair, then Querya can not show queryb corresponding application The user will gradually enter the Querya after entering the queryb to get the desired result. This results in an undulating effect, and we need to avoid this situation.

So for each of the positive case in the online feedback, we do persistence, in the form of a white list to be added to the final related query. In order to accumulate positive case, reduce the effect of fallback.

4. Overall process

So far, we have a dynamic, complete, and sustainable offline online feedback-boosting system.

Related query Mining

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.