Sinsing analysis of full-text index in PostgreSQL (top)

Source: Internet
Author: User
Tags postgresql

A full-text search is usually a textual search that provides the ability to recognize natural languages that satisfy a query, and to arbitrarily sort by relevance queries. The most common type of search is to find all the records that contain the given query terms and return them in a similar query order.

For the ~, ~*, like, and ilike operators, they lack many features: ① is less supportive of language support, such as the plural form of words that cannot be recognized. ② does not have an effective classification and sequencing means. ③ is slow to execute, usually because the index cannot be used efficiently.

A full-text index allows a document to be preprocessed, and it can save an index for a quick search behind it. The main tasks of preprocessing are: ① parsing document tags. For example, it can identify different categories of tokens, which can distinguish between numbers, compound words, e-mail addresses, and so on, they use different processing methods. Our Pgsql uses parsers to perform this step. The ② conversion is marked as a word. A word is a string, just like a tag, but it is normalized so that the same word is the same in different forms. For example, they typically remove suffixes. Our pgsql uses a dictionary to perform this step. It can provide a variety of standard dictionaries, as well as custom dictionaries created for specific needs. ③ preprocessing documents for optimized search storage. Typically, each document can be represented as a normalized glossary of sorted arrays.

For dictionaries, it allows fine-grained control over how to normalize tags with appropriate dictionaries. We can: ① define the mask words that are not indexed. ② maps synonyms to a word. ③ uses a synonym dictionary to map a phrase to a word. ④ maps different forms of words to a paradigm according to dictionaries. ⑤ maps the different forms of a word into a paradigm based on the root.

Usually a document is a search unit for a full-text search system, which can be an article or an email. Text Search engines must be able to parse this document, and can store their connection to the keyword, which is used to search for documents that contain query terms.

In Pgsql, a document is usually a text field in a row in a database table, or a possible combination of these fields (cascading), which may be stored in multiple tables or dynamically obtained. That is, a document can be made up of different parts of the index, and it does not have to be stored as a whole.

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Sinsing analysis of full-text index in PostgreSQL (top)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.