Describes the jsonb data type in PostgreSQL, postgresqljsonb

Source: Internet
Author: User

Describes the jsonb data type in PostgreSQL, postgresqljsonb

PostgreSQL 9.4 is loading a new feature called jsonb, which is a new type of data that can store JSON data that supports GIN indexes. In other words, the most important feature in the coming update is that, if this feature is not important, place ipvs in the recommended position of the file-based database system.

Since January 1, 9.2, an integrated JSON data type already exists, with a complete set of functions (such as data generation and data deconstruct), and 9.3 new operators. When the JSON data type is used, the data is stored as a completely identical copy, and the function still works on it. In addition, the background operation is required for re-analysis.

The JSONB data type is stored in degraded 2 yuan format. Therefore, inserting this data is more efficient than JSON data, because the background does not need to be re-analyzed, so it can run more quickly, GIN indexes are also considered. For this reason, we recommend that you use jsonb instead of json to create a program (you can also use json as needed ). Remember that jsonb uses the same operator and function. Readers can read my previous post to inspire you (or simply read the Postgres file ).
 

Now let's take a look at how JSONB works and compare it with JSON. The tested data is 8.6 million geobase type data, which is about 1 GB in size, including many fields such as city name and country code (see the complete list here. First, store the data to a new table in the database through raw copy, and then convert the table to JSON/JSONB through a set of tables with a fill factor of 100, then we can see how much space each occupies.
 

=# COPY geodata FROM '$HOME/Downloads/allCountries.txt';COPY 8647839=# CREATE TABLE geodata_jsonb (data jsonb) with (fillfactor=100);CREATE TABLE=# CREATE TABLE geodata_json (data json) with (fillfactor=100);CREATE TABLE=# \timingTiming is on.=# INSERT INTO geodata_json SELECT row_to_json(geodata) FROM geodata;INSERT 0 8647839Time: 287158.457 ms=# INSERT INTO geodata_jsonb SELECT row_to_json(geodata)::jsonb FROM geodata;INSERT 0 8647839Time: 425825.967 ms

It takes a little longer to generate JSONB data. Is the size different?
 

=# SELECT pg_size_pretty(pg_relation_size('geodata_json'::regclass)) AS json,     pg_size_pretty(pg_relation_size('geodata_jsonb'::regclass)) AS jsonb; json  | jsonb ---------+--------- 3274 MB | 3816 MB(1 row)

Indexing JSON data starts from version 9.3, for example, using an operator (note that because it returns text, '->' is used, and varies depending on the query, different keywords are used for indexing)
 

=# CREATE INDEX geodata_index ON  geodata_json ((data->>'country_code'), (data->>'asciiname'));CREATE INDEX=# SELECT pg_size_pretty(pg_relation_size('geodata_index'::regclass))  AS json_index; json_index ------------ 310 MB(1 row)=# SELECT (data->>'population')::int as population,     data->'latitude' as latitude,     data->'longitude' as longitude  FROM geodata_json WHERE data->>'country_code' = 'JP' AND    data->>'asciiname' = 'Tokyo' AND    (data->>'population')::int != 0; population | latitude | longitude ------------+----------+-----------  8336599 | 35.6895 | 139.69171(1 row)=# -- Explain of previous query                            QUERY PLAN                            ------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on geodata_json (cost=6.78..865.24 rows=215 width=32)  Recheck Cond: (((data ->> 'country_code'::text) = 'JP'::text) AND ((data ->> 'asciiname'::text) = 'Tokyo'::text))  Filter: (((data ->> 'population'::text))::integer <> 0)  -> Bitmap Index Scan on geodata_index (cost=0.00..6.72 rows=216 width=0)     Index Cond: (((data ->> 'country_code'::text) = 'JP'::text) AND ((data ->> 'asciiname'::text) = 'Tokyo'::text)) Planning time: 0.172 ms(6 rows)

In this example, planner can use bitmap index scanning and use the previously generated index.


Now, a new feature of JSONB is to check the data capacity containing the operator @>. Such data can be indexed using GIN. Such operator data also includes ?,? | And? & (To check whether a given keyword exists ). GIN indexes work for two types of operators:

Default operator class, the four listed earlier;

Jsonb_hash_ops, which only supports @>. However, it performs well when searching data and occupies less disk space;

Here is how it works:
 

=# CREATE INDEX geodata_gin ON geodata_jsonb   USING GIN (data jsonb_hash_ops);CREATE INDEX=# SELECT (data->>'population')::int as population,   data->'latitude' as latitude,   data->'longitude' as longitude  FROM geodata_jsonb WHERE data @> '{"country_code": "JP", "asciiname": "Tokyo"}' AND    (data->>'population')::int != 0; population | latitude | longitude ------------+----------+-----------  8336599 | 35.6895 | 139.69171(1 row) =# SELECT pg_size_pretty(pg_relation_size('geodata_gin'::regclass)) AS jsonb_gin; jsonb_gin----------- 1519 MB(1 row)=# -- EXPLAIN of previous query                   QUERY PLAN                   ------------------------------------------------------------------------------------- Bitmap Heap Scan on geodata_jsonb (cost=131.01..31317.76 rows=8605 width=418)  Recheck Cond: (data @> '{"asciiname": "Tokyo", "country_code": "JP"}'::jsonb)  Filter: (((data ->> 'population'::text))::integer <> 0)  -> Bitmap Index Scan on geodata_gin (cost=0.00..128.86 rows=8648 width=0)     Index Cond: (data @> '{"asciiname": "Tokyo", "country_code": "JP"}'::jsonb) Planning time: 0.134 ms

Based on application requirements, you may want to use low-space indexes, such as the index type of B-tree built on JSON data. GIN indexes have more advantages, because it overwrites all JSON fields and checks the capacity;

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.