Your own search engine 1, full text index 1

Source: Internet
Author: User
The first step of the search engine is to build its own full-text engine. The so-called full-text engine is an inverted table. The so-called inverted table refers to the index from words to documents. For our common ArticleIt is an article that contains many words, that is, the article is a keyword and the word is content. So when we need to find all the articles with a word in many articles, the problem arises without inverted indexes, we must use this word to match the content of all articles. This efficiency is obviously intolerable. This is the time for the inverted index to debut. The inverted index indicates that the word is a keyword, And the content is all the documents and location information containing this keyword. Therefore, theoretically, the process of creation is from:
Doc => (word1, word2....)
Word1 => (doc1, doc2....) process.

During conversion, you need to consider a performance issue, because for each word in a document, the first thing to do is to check whether the word has already appeared in the article. There are actually two similar operations here. The first operation is to convert the words in this article into inverted format, the second operation is to merge the inverted indexes generated in the first step to the top of the index. I think it is better to separate the two operations, because if you merge the words in the article into a large index at one time, there will be many unnecessary comparisons, this reduces performance.
To improve the index creation performance, I want to use a balanced binary tree here and use all words as the node of this balanced binary tree. In this case, the number of times a node is inserted for comparison will be Ln (n) times, which will greatly improve the linear n. Tomorrow is the weekend.CodeImplemented.

To be continued.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.