Describes the jsonb data type in PostgreSQL, postgresqljsonb
PostgreSQL 9.4 is loading a new feature called jsonb, which is a new type of data that can store JSON data that supports GIN indexes. In other words, the most important feature in the coming update is that, if this feature is not important, place ipvs in the recommended position of the file-based database system.
Since January 1, 9.2, an integrated JSON data type already exists, with a complete set of functions (such as data generation and data deconstruct), and 9.3 new operators. When the JSON data type is used, the data is stored as a completely identical copy, and the function still works on it. In addition, the background operation is required for re-analysis.
The JSONB data type is stored in degraded 2 yuan format. Therefore, inserting this data is more efficient than JSON data, because the background does not need to be re-analyzed, so it can run more quickly, GIN indexes are also considered. For this reason, we recommend that you use jsonb instead of json to create a program (you can also use json as needed ). Remember that jsonb uses the same operator and function. Readers can read my previous post to inspire you (or simply read the Postgres file ).
Now let's take a look at how JSONB works and compare it with JSON. The tested data is 8.6 million geobase type data, which is about 1 GB in size, including many fields such as city name and country code (see the complete list here. First, store the data to a new table in the database through raw copy, and then convert the table to JSON/JSONB through a set of tables with a fill factor of 100, then we can see how much space each occupies.
=# COPY geodata FROM '$HOME/Downloads/allCountries.txt';COPY 8647839=# CREATE TABLE geodata_jsonb (data jsonb) with (fillfactor=100);CREATE TABLE=# CREATE TABLE geodata_json (data json) with (fillfactor=100);CREATE TABLE=# \timingTiming is on.=# INSERT INTO geodata_json SELECT row_to_json(geodata) FROM geodata;INSERT 0 8647839Time: 287158.457 ms=# INSERT INTO geodata_jsonb SELECT row_to_json(geodata)::jsonb FROM geodata;INSERT 0 8647839Time: 425825.967 ms
It takes a little longer to generate JSONB data. Is the size different?
=# SELECT pg_size_pretty(pg_relation_size('geodata_json'::regclass)) AS json, pg_size_pretty(pg_relation_size('geodata_jsonb'::regclass)) AS jsonb; json | jsonb ---------+--------- 3274 MB | 3816 MB(1 row)
Indexing JSON data starts from version 9.3, for example, using an operator (note that because it returns text, '->' is used, and varies depending on the query, different keywords are used for indexing)
=# CREATE INDEX geodata_index ON geodata_json ((data->>'country_code'), (data->>'asciiname'));CREATE INDEX=# SELECT pg_size_pretty(pg_relation_size('geodata_index'::regclass)) AS json_index; json_index ------------ 310 MB(1 row)=# SELECT (data->>'population')::int as population, data->'latitude' as latitude, data->'longitude' as longitude FROM geodata_json WHERE data->>'country_code' = 'JP' AND data->>'asciiname' = 'Tokyo' AND (data->>'population')::int != 0; population | latitude | longitude ------------+----------+----------- 8336599 | 35.6895 | 139.69171(1 row)=# -- Explain of previous query QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on geodata_json (cost=6.78..865.24 rows=215 width=32) Recheck Cond: (((data ->> 'country_code'::text) = 'JP'::text) AND ((data ->> 'asciiname'::text) = 'Tokyo'::text)) Filter: (((data ->> 'population'::text))::integer <> 0) -> Bitmap Index Scan on geodata_index (cost=0.00..6.72 rows=216 width=0) Index Cond: (((data ->> 'country_code'::text) = 'JP'::text) AND ((data ->> 'asciiname'::text) = 'Tokyo'::text)) Planning time: 0.172 ms(6 rows)
In this example, planner can use bitmap index scanning and use the previously generated index.
Now, a new feature of JSONB is to check the data capacity containing the operator @>. Such data can be indexed using GIN. Such operator data also includes ?,? | And? & (To check whether a given keyword exists ). GIN indexes work for two types of operators:
Default operator class, the four listed earlier;
Jsonb_hash_ops, which only supports @>. However, it performs well when searching data and occupies less disk space;
Here is how it works:
=# CREATE INDEX geodata_gin ON geodata_jsonb USING GIN (data jsonb_hash_ops);CREATE INDEX=# SELECT (data->>'population')::int as population, data->'latitude' as latitude, data->'longitude' as longitude FROM geodata_jsonb WHERE data @> '{"country_code": "JP", "asciiname": "Tokyo"}' AND (data->>'population')::int != 0; population | latitude | longitude ------------+----------+----------- 8336599 | 35.6895 | 139.69171(1 row) =# SELECT pg_size_pretty(pg_relation_size('geodata_gin'::regclass)) AS jsonb_gin; jsonb_gin----------- 1519 MB(1 row)=# -- EXPLAIN of previous query QUERY PLAN ------------------------------------------------------------------------------------- Bitmap Heap Scan on geodata_jsonb (cost=131.01..31317.76 rows=8605 width=418) Recheck Cond: (data @> '{"asciiname": "Tokyo", "country_code": "JP"}'::jsonb) Filter: (((data ->> 'population'::text))::integer <> 0) -> Bitmap Index Scan on geodata_gin (cost=0.00..128.86 rows=8648 width=0) Index Cond: (data @> '{"asciiname": "Tokyo", "country_code": "JP"}'::jsonb) Planning time: 0.134 ms
Based on application requirements, you may want to use low-space indexes, such as the index type of B-tree built on JSON data. GIN indexes have more advantages, because it overwrites all JSON fields and checks the capacity;