Inverted Index Technology

Source: Internet
Author: User
The inverted index technology is mainly used for full-text search. By analyzing non-primary keywords, the key between content and PK is established. Index file 4 F3 B $ g, v, k * @ "W n + H
Storage is provided to the application for efficient retrieval .) M ^ 1 S6 S/f * n # P
If you need to customize the inverted index, Algorithm Perform the following steps:; U) Q: D # W3 h q. E "p" ^

1. Information Analysis: Performs lexical and Syntactic Analysis on K in the input data item or record (P, K. + J: M7 H2 v % W: Z. m
It mainly performs word segmentation, filtering, analysis, and other operations on key words in data to obtain a set of key words (K1, K2, k3 ......).
2. Establish the inverted relationship. The difference between inverted and general indexes mainly uses the keywords in the content to establish the relationship with the primary key p 8 u * l )~ "^ $ I6 @
In this way, the data direction (P, K) is divided into an inverted relationship set (K1, P), (K2, P), (K3, p )....
3. after a series of relationships are established, you can think of K1 and K2 as primary keywords and create a set of indexes from multiple K to PN to form (K1, (P1, p2, P3 ,....)), (K2, (P1, P2, P3 ,....)). Www. dmresearch. net8 A (]! ? 8 ^ # D: O) d
4. Index storage: stores all key K index sets in a unified data structure., Z % X2 y + A7 F9 S (Z
5. optimization mechanism. generally, the key to memory K formation is relatively large. In this way, the index set is relatively large. Generally, we need to add an optimized processing mechanism. for example, compression technology. this can be referred to the implementation of lucence (compress keywords into <prefix length, suffix)

6. [Optional] application interface. You have implemented the above algorithm and must provide an interface (API) to the application.ProgramUse. Main applications and full-text search. Through analysis of non-primary keywords, the key between content and PK is established. Index file 4 F3 B $ g, v, k * @ "W n + H
Storage is provided to the application for efficient retrieval .) M ^ 1 S6 S/f * n # P
If you need to customize the inverted index, the algorithm should implement the following steps:; U) Q: D # W3 h q. E "p" ^

1. Information Analysis: Performs lexical and Syntactic Analysis on K in the input data item or record (P, K. + J: M7 H2 v % W: Z. m
It mainly performs word segmentation, filtering, analysis, and other operations on key words in data to obtain a set of key words (K1, K2, k3 ......).
2. Establish the inverted relationship. The difference between inverted and general indexes mainly uses the keywords in the content to establish the relationship with the primary key p 8 u * l )~ "^ $ I6 @
In this way, the data direction (P, K) is divided into an inverted relationship set (K1, P), (K2, P), (K3, p )....
3. after a series of relationships are established, you can think of K1 and K2 as primary keywords and create a set of indexes from multiple K to PN to form (K1, (P1, p2, P3 ,....)), (K2, (P1, P2, P3 ,....)). Www. dmresearch. net8 A (]! ? 8 ^ # D: O) d
4. Index storage: stores all key K index sets in a unified data structure., Z % X2 y + A7 F9 S (Z
5. optimization mechanism. generally, the key to memory K formation is relatively large. In this way, the index set is relatively large. Generally, we need to add an optimized processing mechanism. for example, compression technology. this can be referred to the implementation of lucence (compress keywords into <prefix length, suffix)

6. [Optional] application interface. You have implemented the above algorithm and must provide an interface (API) for use by the application.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.