Inner chain: As implies means to add links to relevant text in the content of your website and link to related pages within the website. Reasonable site inside the empty link construction, can enhance the search engine's collection and the website weight. Relative to external links, internal links are also important.
Traditional way
When we used to make a fuss about the system or the press release system, when the inside chain (label), usually through the following ways to achieve:
Database: Article (article table) field (ID, title, body, Adddate, userid), keyword (inside list) field (ID, name, link)
In the release of the article, the loop within the list of all, to replace the body of the article.
It does achieve the desired functionality, but if we have a large number of data in our inner list, such as 2W, 5W, or more. The efficiency of each post and revision of an article is conceivable. Then NetEase news, Baidu Encyclopedia, such as dozen large Web site is how to achieve it? If you follow the above approach, the system will collapse in a few months.
Analysis and comparison
How many words will there be in a normal article (excluding HTML code)? 1W? I think the 1W article has been a long time, and would like to be directly in a page to display 1W article believe that few people have the patience to read. For the sake of the page and the user experience, editors are usually divided into articles, or chapters, that have too much content and are very lengthy. If we can be in advance may appear in the chain of words from the article extracted, and then retrieved from the database, so the efficiency can be greatly improved? The answer is yes. We take 1 W word article, assume that all text of the article need to chain, the number of cycles is 1W times. Much better than the example above?
In the traditional way, whether you want to, put the contents of the list all over the side. And the following kind of thinking is in advance will be likely to appear in the chain of words all sorted out, and then use these words to retrieve the linked list respectively. Such a comparison problem comes out.
The new idea is: from the article to take out the words need the chain, and then go to the query inside the list.
Feasible operation
For the moment, we call the traditional way passive and the new way of thinking as the active way.
The implementation method of the active way is as follows.
The use of Chinese word segmentation technology, we can be a piece of text segmentation. Then, according to the vocabulary after the word, filter out the common possessive pronouns, adverbs, exclamations and so on. Put nouns, brands, place names, trademarks and so on, or according to their own thesaurus table to participle. Then the remaining words to retrieve the list, if there is, we will sit on the link, does not exist on pass.
The above is only a preliminary idea, in the actual implementation process need to consider a lot of factors. I think the key point is in the segment.
Source: Reader Shen Li submission