An idea of the inner chain of the website article

Source: Internet
Author: User

Inner chain: As implies means to add links to relevant text in the content of your website and link to related pages within the website. Reasonable site inside the empty link construction, can enhance the search engine's collection and the website weight. Relative to external links, internal links are also important.

  Traditional way

When we used to make a fuss about the system or the press release system, when the inside chain (label), usually through the following ways to achieve:

Database: Article (article table) field (ID, title, body, Adddate, userid), keyword (inside list) field (ID, name, link)

In the release of the article, the loop within the list of all, to replace the body of the article.

It does achieve the desired functionality, but if we have a large number of data in our inner list, such as 2W, 5W, or more. The efficiency of each post and revision of an article is conceivable. Then NetEase news, Baidu Encyclopedia, such as dozen large Web site is how to achieve it? If you follow the above approach, the system will collapse in a few months.

  Analysis and comparison

How many words will there be in a normal article (excluding HTML code)? 1W? I think the 1W article has been a long time, and would like to be directly in a page to display 1W article believe that few people have the patience to read. For the sake of the page and the user experience, editors are usually divided into articles, or chapters, that have too much content and are very lengthy. If we can be in advance may appear in the chain of words from the article extracted, and then retrieved from the database, so the efficiency can be greatly improved? The answer is yes. We take 1 W word article, assume that all text of the article need to chain, the number of cycles is 1W times. Much better than the example above?

In the traditional way, whether you want to, put the contents of the list all over the side. And the following kind of thinking is in advance will be likely to appear in the chain of words all sorted out, and then use these words to retrieve the linked list respectively. Such a comparison problem comes out.

The new idea is: from the article to take out the words need the chain, and then go to the query inside the list.

  Feasible operation

For the moment, we call the traditional way passive and the new way of thinking as the active way.

The implementation method of the active way is as follows.

The use of Chinese word segmentation technology, we can be a piece of text segmentation. Then, according to the vocabulary after the word, filter out the common possessive pronouns, adverbs, exclamations and so on. Put nouns, brands, place names, trademarks and so on, or according to their own thesaurus table to participle. Then the remaining words to retrieve the list, if there is, we will sit on the link, does not exist on pass.

The above is only a preliminary idea, in the actual implementation process need to consider a lot of factors. I think the key point is in the segment.

Source: Reader Shen Li submission



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.