Using PostgreSQL to achieve millisecond full-text indexing

Source: Internet
Author: User
Tags apache solr postgresql

Lateral is a content referral service provider whose simulations use PostgreSQL to store documents. Each document contains a text column and a JSON column that stores metadata such as the title, date, and URL. They want to create a quick search function for the simulation program, search the full document and title, and generate recommendations. Recently, lateral chief technology officer Max wrote about their approach.

To achieve this, you can choose an open source solution Apache SOLR or Elasticsearch, or choose a managed solution elastic or algolia, but for the following reasons, they chose the full-text search feature of PostgreSQL:

    • No additional software or libraries to install
    • Can reuse the database interfaces they use in the application
    • No need to configure additional servers
    • No increase in costs
    • Data can be stored in a controlled location
    • No need to synchronize data between different data sources

Although the accuracy of PostgreSQL search and the speed of large-scale queries are flawed, Max believes that it can meet their application scenarios. Here's what they're doing:

  • Create a column, TSV, to store tsvector values;
  • Create an index on the newly created column and populate the column with the following statement:
    UPDATE data_rows SET tsv=setweight (To_tsvector (COALESCE (meta->> ' title ', ') '), ' A ') | | Setweight (To_tsvector (Coalesce (text, ')), ' D ');

    Note here that the weight of the JSON column is A,text, and the weight of the column is D;
  • Create a TSV column update function, create a trigger on a table, and perform a TSV column update function when updating and adding rows.
  • When everything is ready, replace "your query" in the following code and execute:
    Select ID, meta->> ' title ' as title, meta from (  SELECT ID, Meta, TSV  from Data_rows, plainto_tsquery (' Your query ') As Q  WHERE (TSV @@ Q)) as T1 ORDER by TS_RANK_CD (T1.TSV, plainto_tsquery (' Your query ')) DESC LIMIT 5;

    After testing, the query can be completed in about 50 milliseconds. If the full document is returned, it will increase by approximately 350 milliseconds, which is more likely to be affected by Network load. If only 200 characters in the document are returned, only about 100 milliseconds are added.

  • Using PostgreSQL to achieve millisecond full-text indexing

    Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.