Installation and use of CSTORE_FDW and source code analysis

Source: Internet
Author: User
Tags bulk insert disk usage

I. Introduction to CSTORE_FDW

HTTPS://GITHUB.COM/CITUSDATA/CSTORE_FDW, this external table extension is developed by Citusdata Corporation, which uses the Orc_file format to store data in columns.

  

Advantage 1: Because of the compression, so the storage on disk greatly reduced, compression ratio can reach 2-4 times

Advantage 2: Data internal block storage, the block data for the max and Min value of the record, in the query can be a jump block query

Advantage 3: In the query, not all the disk data is load to memory, but rather select the column according to the recorded skiplist offset to load the required data, reduce IO

Second, installation and use

[[email protected] ~]# git clone https://github.com/citusdata/cstore_fdw.git

After downloading, modify the Pgconfig in the makefile file to be specified in the installation directory for example:/usr/local/postgres/bin/pgconfig

[[email protected] ~]# make && make install

Configure the end of the postgres.conf file to add:

shared_preload_libraries = ' CSTORE_FDW '

Start the database:

[Email protected] ~]$ pg_ctl-d db1-l logfile start-m fast

[Email protected] ~]$ Psql

  

postgres=# Create extension CSTORE_FDW; Create extensionpostgres=# Create server Cstore_server foreign data wrapper cstore_fdw; Create serverpostgres=# Create FOREIGN TABLE customer_reviewspostgres-# (postgres (#     customer_id text,postgres (#     review_date date,postgres (#     review_rating integer,postgres (#     Review_votes integer,postgres (#     Review_helpful_votes integer,postgres (#     product_id CHAR (#), Postgres (#     Product_title text,postgres (#     Product_sales_rank bigint,postgres (#     Product_group text,postgres (#     product_category TEXT,postgres (#     Product_subcategory textpostgres (#) postgres-# SERVER cstore_serverpostgres-# OPTIONS (compression ' pglz ');

PG native table takes up disk size:

postgres=# INSERT INTO Customer_reviews select * from Customer;insert 0 176774postgres=# Select Pg_relation_size (' Custome R '); Pg_relation_size------------------        145489920 (1 row)

Disk size consumed after CSTORE_FDW external expansion compression:

[Email protected] 13056]$ ll/home/postgres/db1/cstore_fdw/13056

-RW-------1 postgres postgres 6236569 Dec 5 10:07 278237
-RW-------1 postgres postgres 5 10:07 278237.footer

After comparison, disk usage is reduced a lot!!

Third, source code analysis

The implementation of the external table in Postgres is equivalent to an engine that is implemented by hooking up the C-language function pointer

datumcstore_fdw_handler (Pg_function_args) {Fdwroutine *fdwRoutine = Makenode (fdwroutine); fdwroutine->getforeignrelsize = Cstoregetforeignrelsize;fdwroutine->getforeignpaths = Cstoregetforeignpaths;fdwroutine->getforeignplan = Cstoregetforeignplan;fdwroutine->explainforeignscan = Cstoreexplainforeignscan;fdwroutine->beginforeignscan = cstorebeginforeignscan;//1fdwroutine-> Iterateforeignscan = Cstoreiterateforeignscan;//2fdwroutine->rescanforeignscan = CStoreReScanForeignScan;// 3fdwroutine->endforeignscan = Cstoreendforeignscan;//4fdwroutine->analyzeforeigntable = Cstoreanalyzeforeigntable;fdwroutine->planforeignmodify = cstoreplanforeignmodify;//5fdwroutine-> beginforeignmodify = Cstorebeginforeignmodify;//6fdwroutine->execforeigninsert = CStoreExecForeignInsert;// 7fdwroutine->endforeignmodify = Cstoreendforeignmodify;//8pg_return_pointer (fdwRoutine);} 

1, 2, 3, 4 constitute a query operation such as: SELECT * from Customer_reviews;

5, 6, 7, 8 constitute an insert operation, for example: INSERT INTO customer_reviews select * from Customer;

It is particularly important to note that the Tableentry->rtekind = = Rte_subquery is judged by cstoreplanforeignmodify in this function when inserting,

So insert into XX values xxx This insertion is not supported.

It is observed from the source that the flushstripe operation is performed in the cstoreendforeignmodify, that is, whether inserting a data or inserting the data in bulk, the flushstripe operation will be performed.

If you insert a piece of data, this data takes up a stripe of disk space

If it is bulk INSERT, then according to the default stripe size, block size to split, to meet the stripe to brush the disk, and then the remaining does not meet stripe as another stripe, if the data is a stripe, the query load data will be quite slow.

Finally, it is concluded that the compression efficiency is not obvious for a single insert or transactional database, and the compression ratio is quite significant for bulk inserts, and the query will be faster.

Rcfile format vs. orc format:

There is also for rcfile this format, the string type of compression is not very obvious processing, unlike the Orc format, orc with dictionary compression processing, and Rcfile did not

Https://github.com/gokhankici/orc_fdw

This external table extension simply reads the files in the ORC format and does not have a write operation, and the operation of the write file is developed using the Java language.

Installation and use of CSTORE_FDW and source code analysis

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.