One algorithm per week (1)---inverted index

Source: Internet
Author: User

Inverted indexes inverted index, the first contact is in the elasticsearch inside, the index is used this, in fact, ES is also used lucene for the bottom, inverted index is the core algorithm of Lucene.

Online, "Inverted index" is the best way to implement the word-to-document mapping relationship.

Why is it called an inverted index? In fact, I think the Chinese translation of this name is not good, (in fact, I feel that programming above the terms are not turned up well, this is also a hindrance to the programmer to understand the important reason for learning, but everyone is so called, you have to follow the call, sometimes really understand this concept, you do think the Chinese name is too bad, So experience is: see a term, immediately to check English, and English documents, this "platoon" word is very misleading, in fact, I think translated into "reverse index" better.

Because, inverted index means, "Use content to index location" instead of the usual "Use location index content".

Next, talk about personal understanding:

In Lucene, for a document processing, first of all to analyze, (for English) is to remove the "Stop word", the size of the uniform, the change of speech to remove (all revert to the most original word), ES inside also emphasize this analyze process, It also supports user-specified analyzer (specific language uses a specific analyzer).

And then, the process of building the index:

The detailed process is written in this blog, Gray often understand.

Http://www.cnblogs.com/fly1988happy/archive/2012/04/01/2429000.html

The basic meaning is that for a word (i.e., the result of the above analyze step), you should count the number of articles it is in, the frequency of occurrences, and the position in each article.

This generates a dictionary file (term Dictionary), a frequency file (frequencies), a location file (positions)

The dictionary file also records additional information: The keyword points to the frequency file and location file pointers, and field information (the fields that the keyword belongs to)

One algorithm per week (1)---inverted index

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.