Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall
Today we have a brief introduction to the search engine's entire workflow of the second system: Data analysis System, which is the search engine Web Capture system after the system. Search engine data analysis system is mainly used to handle crawling back pages. Here are some of the major knowledge points and main processes of the system: how does the data analysis system handle these pages?
1. Extract text
We all know that the Web page contains a variety of code (Html, JavaScript, etc.), which can not be used for ranking calculations, so the first thing the data analysis system to do is to remove the code, extract the text content. Figure 1 below to extract the text, Figure 2 is after the text is extracted:
Figure 1 Figure 2 Extract text This part of the clear, we should all understand it.
2. Content denoising
Many of our web site has no impact on the content of the main content, search engine rankings are useless, such as navigation text, the bottom of the copyright information, these content is compared to the noise of the Web page, search engines will remove them, the whole process known as "de-noising." So how does a search engine determine which content is noise? For example, each content page in addition to the real content is not the same, the general other "noise" content is the same, such as navigation text, each page is the same, the bottom of the copyright is also each page is the same.
3, word processing
Participle is simply to divide a sentence or a phrase into n words. As for how to divide the word, search engine will be based on their own thesaurus dictionary and segmentation algorithm to carry out participle, each search engine is not the same. Participle is divided into Chinese participle and English participle. For participle technology, are search engine internal things, we seoer can do very little, mainly in the Web site to write the title and calculate the density of keywords will be taken into account.
4, go to useless words
No matter Chinese or English articles, there will be a lot of influence on the content, there are very high frequency of words, such as: Chinese, earth, ah, ah, etc., English such as: the, to, of, a, a and so on
5, page weight
This is very good understanding, meaning that the search engine will be you this page and its previous crawl page for targeted contrast, if there are duplicates, it will be deleted to reduce meaningless duplication of information. This is our webmaster everywhere looking for original, false original article reason. Search engine's algorithm is more powerful, such as the general simple increase "" "" "" "," or simply to change the order of the paragraph of the so-called false original and can not escape its discernment.
6, the link to the page analysis
This is the last step of the search engine data analysis system, mainly through the internal chain of the page and outside the chain analysis, calculate its weight value, and then according to the weight of the page to affect the ranking of keywords.
This article is from: http://www.lxmseo.com/data-analysis.html