Hing Xinpeng: Baidu Search Algorithm Summary-keyword segmentation algorithm

Source: Internet
Author: User
Tags join keyword list reference

This article to connect the above Baidu algorithm summary details please click: Hing Xinpeng: Baidu algorithm Summary

First, about Chinese participle:

1. Chinese Word Segmentation difficulty analysis

First of all to explain the following is: Ordinary users of the search and do SEO or more familiar with the web search user's search habits are very different, and happened to the general search user is the basic force of Baidu search. In the beginning to repeat this point is Shing Peng in order to express its Baidu search algorithm in Chinese participle of importance. Because, for Baidu Google such a second-generation search engine, the use of the search technology is mainly rely on keyword to match, and the user's understanding of the key words and machine procedures for the key word understanding is a great distance.

Baidu is better than Google in Chinese word segmentation, this is Baidu to win Google one of the key factors, Chinese word segmentation than English is much more complex (same as the Chinese participle of the same trouble with the important language of Japanese, Korean, Russian, which is also Google has no way in these areas to win one of the reasons), Hing Xinpeng here because the space does not repeat, interested friends can study the Latin language (for example) sentence and Chinese sentence difference, Chinese sentence not only a lot of synonyms, and the word order is changeable, adverbs too much (the main-predicate outside of the fixed-shaped complement, interjection, etc.).

  

Simply give an example "how Baidu ranked" "Baidu is how to rank" "Baidu How to Rank" "Baidu is how ranked" "Baidu How to Rank" "Baidu how ranked" "Baidu according to what ranking" Baidu by what rank "" Baidu search is how to rank "... These phrases contain at least one meaning "what is the rule (principle) of Baidu's ranking of search results", in addition, each sentence has other meanings, such as the sentence also contains "How to do Baidu rankings (to achieve this goal of the method)" "Baidu is how to search rankings (principle of the process)" ...

Take the example above: when the user input above (in most cases, the ordinary user to Baidu as a panacea, so just search seo come so do not conform to the rules of the search behavior), Baidu to quickly respond to the user needs results, this time, Baidu is facing the core issues are:

A. First you need to know what the user is searching for (semantic analysis, see "two");

B. Second, because Baidu's search method is still based on keyword matching technology, so the user's search for participle (the next section will analyze how Baidu participle);

C. Then Baidu to pass the result of segmentation, to the database to retrieve matching snapshots;

D. The previous step is only retrieved, but also to carry out the fourth ranking, this time is not a challenge to Baidu's problem (although in SEO opinion, this step is really very difficult)

E. The fifth step is to return the results to the search page to users, and to complete their advertising (Baidu auction ads), and to properly promote their products (Baidu know, Baidu Library ...) Write a bit of a mess, SEO consultant Shing Peng in this apology, did not find a better way to state, hope friends to organize and carry forward.

2. Baidu Chinese Word Segmentation method:

Baidu for Chinese participle is not only a large number of user search (this is different from Google, after all, Baidu is rooted in Chinese culture, more understanding of Chinese, but also has a large Chinese dictionary database for support, and dynamically added search hot words, search behavior, such as word, "from the recent adjustment of Baidu algorithm look, Baidu more respect than before the user's search behavior, is the user's input as the first, Baidu correction of secondary, this is very important oh "The following example, users search" how Baidu rankings? " When the participle:

A. Natural segmentation: Including punctuation, space-induced segmentation, this is the first factor, such as or "how Baidu rankings" such a search behavior will be Baidu first divided into "Baidu", "How to rank", this is certain, to understand the user search behavior intent, first of all, to respect the user's search behavior; SEO consultant Shing Peng According to actual combat observation summed up, do SEO many friends may not notice, here to mention a wake up

B. Chinese Thesaurus Segmentation: It is not difficult to understand, "how Baidu rankings" will be divided into "Baidu" "How" "ranking" of these words, because this is the Chinese dictionary exists in the word, Baidu has a huge Chinese dictionary library support, this is not difficult;

C. Split phrase: B in the participle is clearly not enough, to better understand the user intent, must ensure semantic coherence, then the three words can be combined into "how Baidu rankings"; How Baidu "+" ranking "; "Baidu ranked" + "how"; How to rank "+" and "Baidu" and the combination of these words upside down, the importance in accordance with the order of precedence principle, followed by the reverse and bidirectional sequence of participle combination, analysis and segmentation has a basic principle is the least segmentation.

Above three points is the usual sense of participle, in addition, there are more troublesome participle needs Baidu processing, see after a few.

D. Word: If users search "how Baidu rankings", Baidu is helpless, because you can not judge the user is in search "Baidu How to Rank", but also to respect the user search behavior, so, we have to further the Chinese word for the word: "Hundred" "degree" "such as" "What" "ranking", Then in the combination of participle, composed of different phrases to the database to match.

E. Tone words/typos: If someone searches "how white degree ranking" is actually mistakenly "Baidu" into "whiteness", then Baidu to correct this error, but the recent adjustment look, Baidu is not like the previous word Ku matching to do error correction "and more is the user after the search for the behavior of the accumulated data to prepare for error correction." (such as search "whiteness" many users end up spending more time in the "Baidu" keyword page, then Baidu for "whiteness" of the search error correction will be biased to "Baidu" on!

Of course, the word is Hing Xinpeng example, in fact, Baidu search "whiteness" is not so, examples can refer to Baidu's "beauty gauge car" View, Baidu will prompt or say "You are looking for is not: the United States", in addition, Baidu for error correction through the search Drop-down box related words recommended, search the bottom of the page "Related search", Baidu Know (user is very large, is Baidu search important supplement) to carry out error correction data statistics and error correction guidance.

F. New words: There are generally two sources of new words: A. Recent popular language, the Baidu database will be based on user search behavior accumulated data and network heat word monitoring data to adjust to the Thesaurus; B. Language neologisms/user coined, this is mainly by the accumulation of search behavior data adjustment, Also aimed at the part of the language of the new poet work supplement.

Hing Xinpeng again to explain, Baidu is actually very tired, it is the user's every search behavior to be counted (of course, the way the Machine program Records): The general main record of search keywords, visiting pages and the way to visit (generally is linked), each page stay time (not easy to read before, Now Baidu by browsing Cookis, Baidu account, IP records, Baidu statistics "if the site installed Baidu statistics program, in fact, Baidu is very smart, in various ways to enter the site, such as the recent popular Baidu share button, this tool is actually the largest spy" and so a large number of ancillary tools to statistics), The general calculation is based on the visit of Baidu to provide the snapshot of the page browsing behavior (first open which, and then open which, where to stay for a long time, and finally from where to leave Baidu to achieve, Baidu for a Web page on the user is useful view: The longest stay in the page, And finally, after browsing the page to leave Baidu as the primary standard, followed by the degree of interaction in these pages from the factors.

Second, on semantic analysis:

In fact, this paragraph has been mentioned in the last section, the list is nothing more than the "semantic analysis" of the search behavior and "participle" to distinguish between semantic analysis and participle is complementary to each other, semantic analysis is more based on word segmentation and user browsing behavior habits data based on the conclusions, as mentioned above, Baidu through a variety of ways to statistics the user's behavior and for these behaviors and the use of keywords and input methods Soso statistical data for the support and word segmentation matching.

After all, how to calculate, so many pages, billions of times a day of search behavior, Baidu is still difficult to calculate out (Baidu is improving the way and improve the machine algorithm to achieve this vast project), the main use of the popular search for the sampling statistics and other search of random statistics to achieve search semantic analysis ( This is the SEO consultant Shing Peng According to the actual combat observation to do the hypothetical inference.

  

Baidu's most elusive is not so much the ranking algorithm, it is more semantic analysis algorithm, because and SEO do not understand Baidu algorithm, Baidu also do not understand the search for the user intent (so Baidu has been in the study, has been in the adjustment, has been in perfect, just like SEO has been studying, has been in the adjustment, have been perfecting the same reason). Elusive is a reason, more importantly, these calculations are not only for the text and participle, matching degree of research, but also through statistics, linear mathematics, logic, behavior, psychology and many other disciplines of the essence of the calculation of the design of the algorithm structure, and constantly repair the perfect, speaking of this algorithm, Baidu has a description of "massive basic algorithm", not to mention each algorithm of the subject itself difficult, this is the bitter forced SEO delay can not understand the root cause of Baidu algorithm, of course, as the bitter force of the SEO, Shing Peng also do not understand, if you can understand, most of them are math or computer genius or top talent, Early to do their own research or invention to go, as well as chasing after Baidu picky?

Moreover, Baidu itself for the search results of "human intervention" and "monopoly" have brought all kinds of reprimands, what's more, SEO for one's own benefit constantly brush rankings to users recommend low quality information, it is more understood and understand the search algorithm of the cattle people look down ... So see here, if you think you're very bull, do not do seo, if as SEO you understand the author Shing Peng write this article intention, then you stand in SEM or network operation, network marketing height to look at SEO, and not for the midnight lying in front of the computer outside the chain of mixed business and SEO.

Pull far, return to the topic, do not like Baidu design algorithm that ability, if also can from the semantic analysis of mining point to SEO helpful things, then Hing Xinpeng suggest you can go to study the optimization you are doing the relevant words of user search habits, such as, Hing Xinpeng recently to Shanghai Zhi Bao Mei Regulation car www.zhibaosuv.com network Operation service, found that the term "American regulation car" is getting more and more attention, and do this word optimization of many SEO or webmaster have to resist the "American regulation car" a word to do, and the word user search, may be derived from the "US-regulated car" "US-regulated car SUV" "American regulation car SUV" "American Regulation SUV" "American regulation Car Sales" "The United States Regulation car Distribution" "The United States Regulatory vehicle dealers," the United States regulation of car distribution "," the United States regulation of car sales "," the U.S. regulation of automobile import Agent "and many other derivative words, even" where does the American car buy "? Shanghai where the United States to sell the car "such a more meaningful long tail keyword, if you understand the user's search intentions, and then targeted to do SEO, so that the effect will be better."

Third, on the keyword matching degree:

1. Keyword Segmentation matching key order:

This is Shing Peng according to the actual operation of SEO combined with the summary of user sharing, accuracy is not high, but can be used as a reference. The general meaning of the word segmentation algorithm is "keyword ratio": The calculation of the keyword in the page information in the proportion, usually contains parameters: title (page title), Meta Description (Web page description/summary), meta Keywords (web keywords), Web page h1~h6 tag , anchor text (sorted by emphasis and page position), content text (highlighting such as font, size, color, surrounding background or text, etc., the general position order is from top left to bottom right), picture and other page file HTML Markup Language attribute.

2. Keyword Matching degree calculation:

Participle, to the phrase in the keyword "Socou", if a word in the phrase and other words are not relevant, will remove the match, but other words to calculate the matching degree as the word count. To "how Baidu rankings" to analyze: In general sense, this search phrase is divided into "how Baidu rankings"; Baidu how "+" rank ";" Baidu ranked "+" how "... : Then "Baidu how to rank" matching degree is 100%, followed by "How Baidu Rankings", "How to rank Baidu", "how Baidu Rankings", "Baidu rankings how", "ranking how Baidu"; Baidu ranked "The matching degree is 1/3+1/3=2/3;" How to rank "the match degree is 1/2;" Baidu "The match degree is 1/3 ... The above is only a rough estimate, the specific number of Word segmentation algorithm to add the relevant parameters, such as order priority, reverse priority, double order priority, the least cut the degree of the word ... (The specific algorithm because Shing Peng knowledge is limited, can not share, this is just a basic idea of analysis, for friends to refer to, the other word contains a lot of punctuation marks, spaces, words, etc.)

3.title keyword matching degree:

The key words in title are calculated in the same way as in 2 of the word segmentation match in the title itself, Shing Peng here would like to explain two: A. Based on the observation that, after the snapshot of Baidu, the snapshot of the archive should have done a possible word segmentation and matching degree of data tagging (if not so, So Baidu search efficiency will not be so high) B. Every time the user's search Baidu will be participle, and according to the results from the file from the snapshot of the word segmentation in the maximum matching.

In addition, the accepted length of title is generally considered to be no more than 80 characters (including punctuation and spaces, equivalent to Chinese characters about 40 words), but from Baidu search results of the snapshot title, for different sites Baidu according to the weight will have different restrictions, generally 60 characters, some stations can reach 70 characters, More than the part used "..." instead, but does not mean that Baidu does not count, in order to "www.zhibaosuv.com", Shing Peng added the title of the time will be "wisdom Bao Mei Regulation car SUV" put to the last, but you Baidu "wisdom Bao US car SUV" when the reality of the snapshot title can be normal display " Wisdom Bao Mei Regulation car SUV "and will title more than the display of the section by the paragraph before the omission of the way shown.

Generally, if there is no special need, it is recommended not to exceed the recognized 80 characters, otherwise, not only dilute the keyword matching degree, but also affect the search engine on the snapshot of the scoring.

"The skill of doing title", write to this, incidentally share under Shing Peng's a little skill, enterprise website because the page is less, generally easy to get ranked main is the homepage, so the title of the homepage must carefully decorate, if really not put the keyword to put in the description position, in addition, It is recommended to put the site name in the back, to ensure that key keywords to get better matching degree, site name with "" "Up, although wasted 4 characters, but in the search results will be more prominent, to attract the attention of users and improve the visibility of the site and access rate.

Incidentally, Hing Xinpeng found in the operation, if the head label update frequently excessive will be reduced weight processing (general head changes will enter the snapshot observation period, the search results for the revised title will be delayed for 1-3 weeks, Specific according to different keywords in the content of the page reflected in the update and the external link anchor text contains the keyword of the update and delay time range, and the head label modified more than 2 times a month, Baidu will directly grab the page text as a description summary. Google for title update frequent pages, will directly grasp the page layout of the focus of a phrase to do the title.

4.description keyword matching degree:

And the title of the calculation is similar, but description will not be like the title of Baidu is participle, and only as a title in the keywords and keyword keywords and to the page to bring large flow of keyword matching calculation, The matching degree of the keyword in description is based on the order priority principle, and the occupancy ratio and coherence of the keyword in the description total character are calculated.

Description is a summary of the page, do SEO child shoes must comply with the rules, do not have irrelevant information or said the page text does not contain the keywords stacked to this, so as not to drop points.

Description recognized that the maximum number of characters allowed to 200, Baidu snapshot display is generally 140 characters or so, Hing Xinpeng recommended not more than 160 characters, because this not only dilute keyword matching, and Baidu's recent algorithm adjustment, Keyword matching will no longer be done for description beyond the snapshot display. The same as the wisdom of the United States to explain the car www.zhibaosuv.com, Hing Xinpeng will be the United States to put the GMC in the description of the best, the latest algorithm after the adjustment does not show (of course, may be a case, for reference only).

5.keywords keyword matching degree:

Keywords for Baidu, seemingly not as a match, but there are 100 degrees very care: do not add the keyword to the page does not keywords, if so, there may be considered cheating, this is more so for Google, Google is more stringent than Baidu for keywords cheating.

  

Keywords generally recognized not more than 100 characters, this point, Shing Peng's understanding is that for Google: keywords must not be too much, to match the page, the General page can tolerate more than 10 keywords; for Baidu speaking, Recommended keywords design according to Baidu weight (can be used webmaster tools or Love station network test) keyword to design, the right to heavy words, can add to the keywords.

For corporate web sites, because the title and description restrictions and the words are limited, unable to accommodate the company full name, this time you can consider the company full name and abbreviation in the keywords embodiment, because the page copyright information will generally include the company name and abbreviation.

6. The keyword match degree in the page content:

Page content does not do word segmentation calculation, but the word in the label and the snapshot of the archived participle in the page of the column calculation will be included in the page to match the keywords and the number of calculations and the total page characters in the proportion.

The key to the keyword of the page is the H tag and other important tags, of course in the Baidu snapshot is mainly in accordance with the text of the page family display as the standard, the general link anchor text contains keywords, page highlighting position of the keywords, in a prominent way (font, color) to show the key words will be more important, This should be based on the specific page analysis, SEO friends in the search keyword results directly to see Baidu snapshot of the keyword to match the degree, the highest yellow, followed by red and blue, green.

The snapshot is stored in the Baidu database static page, not the real page, so there is a snapshot update said. From the snapshot page source code can be seen, Baidu snapshot only recorded the basic page code and text files, and for the storage of photos and other documents, in the snapshot of the real picture is recorded from the page file included in the snapshot of the file address called over.

Baidu snapshot of the existence of everyone is concerned about Baidu Fast Station update the root cause, because if the snapshot is not with the new, the opportunity to get the rankings will be less, this time the snapshot of your site in the Baidu snapshot database like a foundling. Writing this, the author Hing Xinpeng His observations again: In the past, we all think that static pages are more popular, with the development of 2.0 and the trend of internet socialization, it seems that this is being rewritten and moving in the opposite direction, static pages, pseudo static began to be the search program to abandon ... Shing Peng is this understanding, if the page is static, then the search engine is more likely to think that your page content updates will be slower, so the natural impact of the frequency of the visit, spiders also reduce the frequency ...

Four, keyword matching operation--case analysis

The above describes the SEO Shing Peng for Baidu search Chinese word segmentation and semantic analysis, keyword matching fur understanding, the following through an example to focus on how to make the Web page and keyword matching. Usually, SEO generally receives the task is the customer/leader to throw over a station, specify a few keywords, and then let go to do, in addition to the head tag plus keywords, a large number of collection of key words related articles, the rest seems to be using a variety of tools to do a lot of "external link production" work, for a time, including " Www.zhibaosuv.com "The mess of information flooded into major forums, blogs, shops, classified information ... (Of course, Hing Xinpeng also very vulgar, do outside the chain is also roughly the operation, but basically do not use tools, as far as possible to reduce the matching degree of high correlation of the site, targeted to hair outside the chain).

In fact, better SEO way, is in the ranking optimization operation, according to the needs of users, do survey analysis statistics, and then cooperate with other customer needs, planning site program, will be SEO intention in the construction of the site (www.jianzhan001.com Media Production) in the process of good integration, So SEO do not tired, but also easy to achieve more ideal results, the above article Shing Peng referred to the services of customers in Shanghai Zhi Bao famous Car example, the beginning of the establishment of the station, all the media in accordance with the franchise of customers imported American regulation car SUV characteristics, through Baidu search index, Google keyword list, Baidu related search recommendations, Webmaster Tools (tool.chinaz.com) have carried out a more detailed statistical analysis, finally, according to the customer's main U.S. rules Mercedes-Benz, the United States and the U.S. Regulations of the United States, Audi, the United States and the U.S. regulation of the Cayenne, Land Rover, the United States, Ford, the United States, the United States regulations of the Toyota, the U.S. The planning of the key words to consider Baidu competitive degree of competition, the number of pages included, the first page results of the snapshot update degree and Baidu full summary, to determine the degree of difficulty, combined with the budget and workload to determine ").

In the website design project, the public volunteer media will display this column design as "the American Regulation Vehicle channel", and in turn, the key words as a classification, and the following way to pull the menu to achieve (Shing Peng remind: Navigation bar anchor text appears key words is very important, and now do optimization, users of the number of keywords required more and more, In combination with this situation, Hing Xinpeng recommends that navigation be the list channel on the left side of the page, which is found to be effective in combat. To three-wo color steel as an example, "next consider the Greek Drop-down Table menu and the most recent popular page in the bottom of the row-and-column navigation, in the home page content arrangement is limited, at the bottom of the keyword corresponding to the column page URL do the auxiliary navigation , in the first page text information appropriate to the anchor text into the main picture to do the Alt attribute.

In the title design, of course, "American regulation car" preferred, secondly, according to the priority of the keyword order, the homepage title is designed as "the United States Regulation Car _ the United States regulation Mercedes-Benz, the United States regulation of BMW, the United States planning Land Rover, the United States to regulate the Cayenne, the U.S. regulation of Audi" intelligent Bao Mei car SUV ", because the other The amount and value of the search is not very high, put in the description, and in the beginning of the description to join the "Shanghai Zhi Bao name Car company, the top U.S. regulatory vehicle importers, luxury car SUV U.S.-specific distribution Monopoly", that shows the company name, while highlighting the company's characteristics and in This embodies the core keyword "The United States Car", followed by the "United States regulations BMW x5x6, the United States-run ML/GL series, the United States, Porsche Cayenne, the United States and the U.S. Regulations of the Audi Q7, the United States, the U.S. "is the key product model keyword embodiment, such as" the United States X5 BMW "," the United States and the Audi Q7 "and so on. After all, the page's head file character restrictions, resulting in a lot of limited keywords can not be reflected, for the www.zhibaosuv.com car this site, the local volunteers to do the optimization of the chain and the various pages of the code optimization work, improve the site of all the pages of the head tag and other pages of the label, links Ensure that each page name is not duplicated. To the United States to regulate the car channel "http://www.zhibaosuv.com/Brand.asp" This page, the title of "The United States to use the car, the United States regulation Mercedes-Benz configuration, luxury car SUV U.S. standard version of the price _ smart Bao Mei Regulation Car channel", the core keywords, page key keywords, Site name and page name in the title has a good performance, and the column page corresponding to the product sub-page is the background of the release of new products generated, each page title and description of the summary is dynamically called the release of the product name a few summaries.

In the operation of the website, did not get more valuable keyword traffic, wisdom bao Mei Regulation Vehicle news release, as far as possible using original information, and with beautiful pictures and forms to enhance the readability of the Web page information, at the same time, the author will not forget the keyword in the article in the form of highlighting and add links to form the anchor text, More conducive to the construction of the site's internal links and rich, this in the operation to obtain a clear search performance. In addition, news updates, side weight are included limited keywords, in the first page to call the latest release news headlines to ensure that the homepage of the update degree.

Write a bit cumbersome, Baidu's algorithm is not one or two bureau said clearly, blog Media Network Marketing Consultants in the collation of the release, but also only fur, from the value of SEO, is an understanding of SEO and Baidu keyword matching calculation method of analysis, welcome to SEO Children's shoes to join the discussion, Shing Peng's microblog: http:// T.qq.com/zhyhyhz to the onlookers to shoot bricks. This article from Shing Peng's blog (http://www.jiangxinpeng.com/) reproduced please link form to indicate

This article address: http://www.jiangxinpeng.com/?p=45



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.