Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall
Data analysis System, is the search engine the whole workflow of the second system, that is, the search engine spider crawl system after a system, the search engine data analysis system is mainly used to deal with spiders crawl back to the Web page, today, Xiao Qiang will give you detailed explanation, The workflow of the data analysis system of the search engine and several important knowledge points. We just said, the data analysis system is mainly to analyze spiders crawl back content, then how to analyze it?
Web page structure
What is the structure of the Web page? We want to know the Web page, is made up of HTML, search engine spider finally crawl back, also are HTML code page, simply, the page structure, that is, delete the HTML code, and then leave the content, the following figure, Figure 1 is the Web page before the structure of the Figure 2 is the structure of the Web page.
Before the Web page is structured
After the Web page is structured
Content denoising for Web pages
After the Web page structure, there are still some search engines do not need content, for example, navigation bar menu text, the bottom of the copyright information, and so on, these are search engines do not need, search engines only need content, then at this time will be the structure of the page after the content of noise elimination, simply said, The elimination of noise is the content of all the text deleted, such as the text on the menu, the bottom of the copyright text and so on.
Then the search engine data analysis system How to determine what is the menu text and what is copyright information?
In fact, is very simple, such as a content page, in addition to the content is not the same, the other content is almost the same, such as navigation, each page has navigation, and the same text, copyright is also, of course, according to the HTML source to analyze.
Check the weight of the page
Check the weight of the page in fact very good understanding, is the search engine spiders crawl You this site all the pages, with you this page to compare the crawl page to see if there is duplication, if there is, then delete.
Segmentation
What is a participle? To put it simply, to cut a sentence into n words, participle is divided into Chinese participle and English participle, search engine has a database of their own dictionary, there are many words, and then to follow the dictionary for participle; there is a point, that is, in participle, will be some useless words removed, for example, the , Ah, and so on.
Analysis of page corresponding URL
This is the last step of the Web analytics system, mainly according to some external factors, intrinsic factors of the page corresponding to the weight of the URL of the judgment, such as outside the chain, the chain, etc., this impact on the page keyword rankings.
This article address: http://www.shizhanqiang.com/2012071065.html