Lateral is a content referral service provider whose simulations use PostgreSQL to store documents. Each document contains a text column and a JSON column that stores metadata such as the title, date, and URL. They want to create a quick search function for the simulation program, search the full document and title, and generate recommendations. Recently, lateral chief technology officer Max wrote about their approach.
To achieve this, you can choose an open source solution Apache SOLR or Elasticsearch, or choose a managed solution elastic or algolia, but for the following reasons, they chose the full-text search feature of PostgreSQL:
- No additional software or libraries to install
- Can reuse the database interfaces they use in the application
- No need to configure additional servers
- No increase in costs
- Data can be stored in a controlled location
- No need to synchronize data between different data sources
Although the accuracy of PostgreSQL search and the speed of large-scale queries are flawed, Max believes that it can meet their application scenarios. Here's what they're doing:
Create a column, TSV, to store tsvector values;Create an index on the newly created column and populate the column with the following statement:UPDATE data_rows SET tsv=setweight (To_tsvector (COALESCE (meta->> ' title ', ') '), ' A ') | | Setweight (To_tsvector (Coalesce (text, ')), ' D '); Note here that the weight of the JSON column is A,text, and the weight of the column is D;Create a TSV column update function, create a trigger on a table, and perform a TSV column update function when updating and adding rows.When everything is ready, replace "your query" in the following code and execute:Select ID, meta->> ' title ' as title, meta from ( SELECT ID, Meta, TSV from Data_rows, plainto_tsquery (' Your query ') As Q WHERE (TSV @@ Q)) as T1 ORDER by TS_RANK_CD (T1.TSV, plainto_tsquery (' Your query ')) DESC LIMIT 5; After testing, the query can be completed in about 50 milliseconds. If the full document is returned, it will increase by approximately 350 milliseconds, which is more likely to be affected by Network load. If only 200 characters in the document are returned, only about 100 milliseconds are added. |
Using PostgreSQL to achieve millisecond full-text indexing