Pg_shard is a postgresql sharding extension. Can be used for shards, replicates tables, and high availability. It can be distributed seamlessly (or distributed) without modifying the applications. ) SQL. As a standalone extension,pg_shard is applicable with many nosql scenarios.
For a pg_shard cluster, the various nodes of PG are divided into master node and worker node two classes. Master node is primarily used to store metadata and as a portal for all queries.
You can select any PG node in cluster as master and the other nodes as workers.
The simplest test method is to configure both master and worker instances on the same host. When the same host is configured, each PG instance runs on a different port and can simply apply "localhost" as the node name of the worker node. In general, you can configure one PG instance per host, which is better suited to the complexity of the production environment. In this configuration scenario, these PG instances need to be configured to communicate with each other. (primarily configuration of the postgresql.conf listen_address parameters and pg_hba.conf files). Regardless of how it is deployed, master must be able to connect to Workder Node via TCP without a password.
Pg_shard Environment Construction
Compiling and installing Pg_shard
Modify the shared_preload_libraries= ' Pg_shard ' of the postgresql.conf file
Create the Pg_shard configuration file pg_worker_list.conf file in the following format:
# hostname Port-number
Worker-1 5432
Worker-2 5433
You need to restart master node.
To shard a table
On the master node:
CREATE EXTENSION Pg_shard;
SELECT master_create_distributed_table (' table_name ', ' partition_column ');
On the worker node:
SELECT master_create_worker_shards (' table_name ', ' shards_num ', ' rep_num ');
Once you have created the shards, you can query on the cluster. Currently, update and delete need to include partition Column in the WHERE Condition clause.
Pg_shard's management tools
Pgs_distrubution_metadata.partition;
Pgs_distrubution_metadata.shard;
Pgs_distrubution_metadata.shard_placement;
Restrictions on the use of Pg_shard
1. Cross-shard transactions are not supported
2. The only constraint on other column except partition key and foreign key is not supported
3, does not support distrubuted Join (but Pg_shard's production company's CITUSDB is supported, not supported in the open source version)
Syntax not supported by Pg_shard
1, does not support the modification of the table, if you want to modify the table, you need to execute the script on each worker
2. DROP TABLE
3. INSERT into ... select ...
Currently unsupported and imperfect technical points are many, not recommended in production use. Can be used in a single, simple place in the application scenario. For example:
Tbl_example (ID integer,val jsonb);
Such a table to emulate a MongoDB-like Shard.
---------------------------------------------------------------------------
At present, only some simple and common functions are tested, red bold is not supported, red non-bold is error information, for reference.
Test environment: Master + 3 worker
To shard a table:
sharddb=# SELECT master_create_distributed_table (' customer_reviews ', ' customer_id ');
sharddb=# SELECT master_create_worker_shards (' customer_reviews ', 16, 2);
At this point, the tables you see at different nodes are different:
Master
sharddb=# \d+
List of relations
Schema | Name | Type | Owner | Size | Description
-----{}---------------{}-------+----------
Public | Customer_reviews | Table | Postgres | KB |
(1 row)
Worker-1:
sharddb=# \d+
List of relations
Schema | Name | Type | Owner | Size | Description
-----{}---------------------{}-------+----------
Public | customer_reviews_10000 | Table | Postgres | KB |
......
Public | customer_reviews_10015 | Table | Postgres | KB |
(rows)
Worker-2:
sharddb=# \d+
List of relations
Schema | Name | Type | Owner | Size | Description
-----{}---------------------{}-------+----------
Public | customer_reviews_10000 | Table | Postgres | KB |
......
Public | customer_reviews_10015 | Table | Postgres | KB |
(rows)
Worker-3:
sharddb=# \d+
List of relations
Schema | Name | Type | Owner | Size | Description
-----{}---------------------{}-------+----------
Public | customer_reviews_10001 | Table | Postgres | KB |
......
Public | customer_reviews_10014 | Table | Postgres | KB |
(rows)
On the three worker nodes, the tables are distributed differently, with a total of 16 tables, and the tables are created separately in the form of 22 save copies.
Simple writing of data and queries:
sharddb=# INSERT into Customer_reviews (customer_id, review_rating) VALUES (' HN802 ', 5);
sharddb=# INSERT into Customer_reviews VALUES (' HN802 ', ' 2004-01-01 ', 1, ten, 4, ' b00007b5dn ', ' Tug of War ', 133191, ' Music ', ' Indie Music ', ' Pop ', ' {} ');
sharddb=# INSERT into Customer_reviews (customer_id, review_rating) VALUES (' Fa2k1 ', 10);
Select without a WHERE clause
sharddb=# select * from Customer_reviews;
customer_id | Review_date | review_rating | Review_votes | Review_helpful_votes | product_id | Product_title | Product_sales_rank | Product_group | product_category | Product_subcategory | Similar_product_ids
----------{}----------------{}------------------------{}---------------{}-----------------------{}------------- --------------+------------------
HN802 | | 5 | | | | | | | | |
HN802 | 2004-01-01 | 1 | 10 | 4 | B00007b5dn | Tug of War | 133191 | Music | Indie Music | Pop | {}
Fa2k1 | | 10 | | | | | | | | |
(3 rows)
AVG without a WHERE clause
sharddb=# SELECT avg (review_rating) from Customer_reviews;
Avg
--------------------
5.3333333333333333
(1 row)
AVG with GROUP BY clause
sharddb=# SELECT Customer_id,avg (review_rating) from Customer_reviews GROUP by customer_id;
customer_id | Avg
----------+------------------
Fa2k1 | 10.0000000000000000
HN802 | 3.0000000000000000
(2 rows)
AVG with HAVING clause
sharddb=# SELECT Customer_id,avg (review_rating) as avgrating from Customer_reviews GROUP by customer_id have Customer_i D <> ' fa2k1 ';
customer_id | Avgrating
----------+-----------------
HN802 | 3.0000000000000000
(1 row)
AVG without a WHERE clause
sharddb=# SELECT avg (review_rating) from customer_reviews WHERE customer_id = ' HN802 ';
Avg
--------------------
3.0000000000000000
(1 row)
COUNT, null value
sharddb=# SELECT Count (*) from customer_reviews;
Count
-------
3
(1 row)
sharddb=# SELECT Count (*) from Customer_reviews WHERE review_helpful_votes <> 4;
Count
-------
0
(1 row)
sharddb=# SELECT Count (*) from customer_reviews WHERE review_helpful_votes = 4;
Count
-------
1
(1 row)
sharddb=# SELECT Count (*) from Customer_reviews WHERE review_helpful_votes is NULL;
Count
-------
2
(1 row)
^
sharddb=# SELECT Count (*) from customer_reviews WHERE review_helpful_votes are not NULL;
Count
-------
1
(1 row)
Update operation with partition condition column:
sharddb=# UPDATE customer_reviews SET review_votes = ten WHERE customer_id = ' HN802 ';
UPDATE 2
sharddb=#
Update operation without a partition condition column:
sharddb=# UPDATE customer_reviews SET review_votes = 1 WHERE review_votes = 10;
Error:cannot modify multiple shards during a single query
sharddb=#
Delete operation without a partition condition column:
sharddb=# DELETE from Customer_reviews WHERE review_votes <> 99;
Error:cannot modify multiple shards during a single query
sharddb=#
Update operations with partitioning criteria columns and other columns:
sharddb=# UPDATE customer_reviews SET review_votes = 1 WHERE customer_id = ' HN802 ' and review_votes = 10;
UPDATE 2
sharddb=#
Management tools:
Pgs_distribution_metadata SCHEMA master node for storing metadata
sharddb=# \dn+
List of schemas
Name | Owner | Access Privileges | Description
------------------------{}--------------------+---------------------
Pgs_distribution_metadata | Postgres | |
Public | Postgres | postgres=uc/postgres+| Standard Public schema
| | =uc/postgres |
(2 rows)
sharddb=# SELECT * from Pgs_distribution_metadata.partition;
relation_id | Partition_method | Key
----------{}----------------------
24842 | H | customer_id
(1 row)
sharddb=# SELECT * from Pgs_distribution_metadata.shard;
ID | relation_id | storage | min_value | Max_value
----{}----------{}-----------------
10000 | 24842 | t |-2 147483648 | -1879048194
10001 | 24842 | t | -1879048193 | -1610612739
10002 | 24842 | t | -1610612738 | -1342177284
10003 | 24842 | s [ -1342177283 | -1073741829
10004 | 24842 | t | -1073741828 | -805306374
10005 | 24842 | t | -805306373 | -536870919
10006 | 2 4842 | s [ -536870918 | -268435464
10007 | 24842 | t | -268435463 |-9
10008 | 24842 | t |-8 | 268435446
10009 | 24842 | t | 26843544 9 2 536870901
10010 | 24842 | t | 536870902 | 805306356
10011 | 24842 | t | 805306357 | 1073741811
10012 | 24842 | s [ 1073741812 | 1342177266
10013 | 24842 | t | 1342177267 | 1610612721
10014 | 24842 | t | 1610612722 | 1879048176
10015 | 248 42 | s [ 1879048177 | 2147483647
(rows)
sharddb=# SELECT * from Pgs_distribution_metadata.shard_placement;
ID | shard_id | Shard_state | Node_name | Node_port
--------------{}-------------
1 | 10000 | 1 | localhost | 5433
2 | 10000 | 1 | localhost | 5434
......
32 | 10015 | 1 | localhost | 5434
(rows)
add a table, but first write a few data to do Shard, there will be a few serious errors, so be sure to follow the creation of table--shard--> write data, otherwise before doing shard write data are in an invisible state, and no hint :
1) data cannot be redistributed automatically
2) The table was not successfully created in worker nodes, and there are no error hints
3) The master node queries all Shard, as well as the information for the Customer_detail table (releation_id=24940), and in fact, after dropping the table, the table in the PG system table has been deleted.
(master)
sharddb=# INSERT into Customer_detail VALUES (' HN802 ', ' a '), (' HN802 ', ' B '), (' Fa2k1 ', ' C ');
INSERT 0 3
sharddb=#
sharddb=# SELECT master_create_distributed_table (' Customer_detail ', ' customer_id ');
Master_create_distributed_table
---------------------------------
(1 row)
sharddb=#
sharddb=# SELECT master_create_worker_shards (' Customer_detail ', 16, 2);
Master_create_worker_shards
-----------------------------
(1 row)
sharddb=#
sharddb=#
sharddb=# select * from Customer_detail;
customer_id | Customer_val
----------+-----------
(0 rows)
(Worker nodes)
sharddb=# drop table Customer_detail;
Error:table "Customer_detail" does not exist
sharddb=#
(master)
sharddb=# SELECT * from Pgs_distribution_metadata.shard;
ID | relation_id | Storage | Min_value | Max_value
----{}----------{}-----------------
10000 | 24842 | T | -2147483648 | -1879048194
...... Slightly
10031 | 24940 | T | 1879048177 | 2147483647
(rows)
sharddb=# CREATE TABLE tbl_detail (customer_id text, fid integer, detailval text);
CREATE TABLE
sharddb=#
sharddb=#
sharddb=# SELECT master_create_distributed_table (' Tbl_detail ', ' customer_id ');
Master_create_distributed_table
---------------------------------
(1 row)
sharddb=# SELECT master_create_worker_shards (' Tbl_detail ', 16, 2);
Master_create_worker_shards
-----------------------------
(1 row)
sharddb=#
sharddb=#
sharddb=# select customer_id from Customer_reviews;
customer_id
-------------
HN802
HN802
Fa2k1
(3 rows)
Insert test data, you cannot use the following syntax to BULK INSERT, only one row of inserts
sharddb=# INSERT into Tbl_detail VALUES (' HN802 ', 1, ' a '), (' HN802 ', 2, ' B '), (' HN802 ', 3, ' C '), (' Fa2k1 ', 4, ' d ');
Error:cannot perform distributed planning for the given query
Detail:multi-row inserts to distributed tables is not supported.
sharddb=#
sharddb=# INSERT into Tbl_detail VALUES (' HN802 ', 1, ' a ');
INSERT 0 1
sharddb=# INSERT into Tbl_detail VALUES (' HN802 ', 2, ' B ');
INSERT 0 1
sharddb=# INSERT into Tbl_detail VALUES (' HN802 ', 3, ' C ');
INSERT 0 1
sharddb=# INSERT into Tbl_detail VALUES (' Fa2k1 ', 4, ' d ');
INSERT 0 1
sharddb=#
sharddb=#
sharddb=#
sharddb=# SELECT * from Tbl_detail;
customer_id | FID | Detailval
----------{}-----------
HN802 | 1 | A
HN802 | 2 | B
HN802 | 3 | C
Fa2k1 | 4 | D
(4 rows)
A simple join test
sharddb=#
sharddb=# SELECT a.*,b.* from Customer_reviews A, tbl_detail B WHERE a.customer_id = b.customer_id;
Error:cannot perform distributed planning for the given query
Detail:joins is not supported in distributed queries.
sharddb=#
Unable to view explain, this is a mishap, and VACUUM, ANALYZE also need to operate separately in each worker.
sharddb=# EXPLAIN SELECT * from Tbl_detail;
Error:explain commands on distributed tables is unsupported
sharddb=#
Indexes are created on master and cannot be synchronized on other workers, and drop is not valid for worker
sharddb=# CREATE INDEX concurrently on Tbl_detail (CUSTOMER_ID);
CREATE INDEX
sharddb=# \d+ Tbl_detail
Table "Public.tbl_detail"
Column | Type | Modifiers | Storage | Stats Target | Description
----------{}----------{}------------+----------
customer_id | text | | Extended | |
FID | Integer | | Plain | |
Detailval | text | | Extended | |
Indexes:
"Tbl_detail_customer_id_idx" Btree (customer_id)
sharddb=#
sharddb=# DROP INDEX Tbl_detail_customer_id_idx;
DROP INDEX
sharddb=#
ALTER table will not throw an error, but it will no longer be able to read the data correctly if the same operation is not done on other nodes
sharddb=# ALTER TABLE tbl_detail ADD COLUMN newcolumn text DEFAULT NULL;
ALTER TABLE
sharddb=# select * from Tbl_detail;
Warning:bad result from localhost:5434
Detail:remote message:column "Newcolumn" does not exist
Warning:bad result from localhost:5433
Detail:remote message:column "Newcolumn" does not exist
Error:could not receive query results
sharddb=#
DROP DATABASE needs attention order
When the master node exists SHARDDB, when the worker deletes the database, it will report an error and drop the object on the master node before you can manually delete the object:
postgres=# DROP DATABASE sharddb;
Error:database "SHARDDB" is being accessed by other users
Detail:there is 1 other session using the database.
[Original]pg_shard use scene and function test