The process of Web page structuring is the process of preserving valuable information.

Source: Internet
Author: User

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

Introduced the goal of Web page structure, the structure of the process is the site has the value of information is preserved in the process, today, and use such a title to write an article, in fact, there is a purpose, is to remind the website SEO personnel again, search engine working principle is to do a good job on the basis of website optimization.

Now no more than a few years ago, one said the optimization of the site SEO, modified Title,discription,keyword, hair outside the chain, the ranking effect of the page will have. But now just hope that this has not worked, everyone will things, you take out as their own advantages, is actually outdated, coupled with the rampant optimization of Black hat seo, for Baidu optimization, for Google's optimization, search engines are constantly adjusting the algorithm. To do SEO optimization of the striker, according to the development of the Internet changes, always keep a clear mind, absorbing new knowledge, only such a site's optimization effect may be in your control.

Back to the point, to explain briefly, the process of Web page structure is the process of preserving the value of information. When you understand the goal of Web page structure, you should understand that the 5 attributes that embody the value and content of the Web page are extracted, that is, the title, anchor text (anchor text), body title (content title), body (content), and forward links (link). For search engines, these 5 attributes are valuable information (and, of course, valuable content to the user).

Let's take a concrete look at how the Web page is structured.

The process of Web page structure is first analyzed by the tag tree to get the corresponding label, and then through the voting algorithm to determine the text and map, and so on only from the HTML tags can not judge the page data. Here are two steps for professionals to conclude:

HTML tag Tree

Create an HTML tag tree (tag-tree).

Most static pages on the World Wide Web are in the form of HTML pages, an identifier language (Markup Language), which stores the entire contents of its description in the label according to the HTML syntax. To better describe the organizational structure of the Web page content, the labels in the Web page are sorted out in the order they appear, and the appropriate structure is recorded. Because of the nesting relationship between the labels, the result of the collation is naturally a tree structure, and we call the tree structure that gets the tags in a Web page called the "tag tree" of the page.

Obviously, the users who view the page see very friendly information. Those HTML tags in the actual source file, such as and (understandably, used to help IE to understand the Web page), are not actually presented to the user. Therefore, the search engine analysis system needs to learn IE browser to understand the Web page way to understand the Web page, in the understanding process needs to build an HTML tag tree tree structure. By establishing the label tree and identifying the text described by the label, the process of Web page construction is an important step, which can successfully extract the title of the webpage. But the actual Web page, the same label describes the text content is not unique, such as advertising content may be placed in the label, and this is not the real text, it will affect the user's search experience, so there is the following voting law of the text.

Second, by voting method to identify the text block, and according to the depth first traversal rules organized as the text.

Determining which text block is the body uses a method called "voting algorithm", which is particularly common in search engines. In daily life, almost everyone will have a vote or election experience, such as election cadres and the adoption of resolutions need to vote, as well as the athlete's set of action needs the referee rating. The rationale is that most people's opinions are often correct. Most people's unified subjective opinion will become more objective. Although each individual's given score is subjective, the methods and results of such judgments are considered relatively objective and credible. What is the process of the voting algorithm for the text extraction? First, the search engine will define a set of rules and then rate each text block by these rules. The probability that the highest score is considered to be the text is large enough and acceptable. Search engine rules, but also need to pass enough pages for feedback, before you can get a fair and objective rating. Because of the nesting characteristics of HTML tags, the order of depth-first traversal can be organized into a complete body.

(Source: www.zhangxundf.cn)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.