The Dictionary of information retrieval and its fault-tolerant retrieval

Source: Internet
Author: User
Tags data structures mysql index
1, the main content: there are spelling errors in the query or there are different spellings of the robustness of the spelling correction technology ①, supporting the dictionary to quickly find multiple data structures; ②, wildcard query, ③, misspelled query, automatic correction technology (independent correction for individual words; for the entire query string Overall correction technology); 2. The data structure of the dictionary search: determine if each query is in the glossary, see: 1, data structures and algorithmic principles behind the MySQL index, 2. From B-Tree, ④, b* tree to R-Tree; 3. In-depth study of B-Tree index ①, ha Method of ②, search tree; There is a way to allow the sub-tree of the internal performance point to change in a fixed interval to achieve the tree insert delete operation, B-Tree: B-tree is characterized by the two-tree multi-layered "flattening" to a layer and get the tree structure, in memory space is not enough to This is very effective when you have to save all the dictionaries and you have to have a subset of the dictionaries resident in the disk, because in this case "flattening" allows for a pre-read of the next-known test when the dictionary is transferred into memory. 3, wildcard query: [K-gram index] 4, spelling correction: see: ①, English version: How to write a spelling Corrector②, Chinese version: How to write a spelling checker ③, Bayesian push Break and its Internet application (iii): Spell check ④, spell correction 5, pronunciation-based correction technology [SOUDEX algorithm] basic idea: For each word item, a speech hash operation, the pronunciation of similar terms are mapped to the same value.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.