Introduction to the data analysis system of search engine

Source: Internet
Author: User
Keywords SEO

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

     

Today we have a brief introduction to the search engine's entire workflow of the second system: Data analysis System, which is the search engine Web Capture system after the system. Search engine data analysis system is mainly used to handle crawling back pages. Here are some of the major knowledge points and main processes of the system: how does the data analysis system handle these pages?

1. Extract text

We all know that the Web page contains a variety of code (Html, JavaScript, etc.), which can not be used for ranking calculations, so the first thing the data analysis system to do is to remove the code, extract the text content. Figure 1 below to extract the text, Figure 2 is after the text is extracted:

Figure 1 Figure 2 Extract text This part of the clear, we should all understand it.

2. Content denoising

Many of our web site has no impact on the content of the main content, search engine rankings are useless, such as navigation text, the bottom of the copyright information, these content is compared to the noise of the Web page, search engines will remove them, the whole process known as "de-noising." So how does a search engine determine which content is noise? For example, each content page in addition to the real content is not the same, the general other "noise" content is the same, such as navigation text, each page is the same, the bottom of the copyright is also each page is the same.

3, word processing

Participle is simply to divide a sentence or a phrase into n words. As for how to divide the word, search engine will be based on their own thesaurus dictionary and segmentation algorithm to carry out participle, each search engine is not the same. Participle is divided into Chinese participle and English participle. For participle technology, are search engine internal things, we seoer can do very little, mainly in the Web site to write the title and calculate the density of keywords will be taken into account.

4, go to useless words

No matter Chinese or English articles, there will be a lot of influence on the content, there are very high frequency of words, such as: Chinese, earth, ah, ah, etc., English such as: the, to, of, a, a and so on

5, page weight

This is very good understanding, meaning that the search engine will be you this page and its previous crawl page for targeted contrast, if there are duplicates, it will be deleted to reduce meaningless duplication of information. This is our webmaster everywhere looking for original, false original article reason. Search engine's algorithm is more powerful, such as the general simple increase "" "" "" "," or simply to change the order of the paragraph of the so-called false original and can not escape its discernment.

6, the link to the page analysis

This is the last step of the search engine data analysis system, mainly through the internal chain of the page and outside the chain analysis, calculate its weight value, and then according to the weight of the page to affect the ranking of keywords.

This article is from: http://www.lxmseo.com/data-analysis.html

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.