Research on fast Insert mass data of PostgreSQL 9.5.4 database

Source: Internet
Author: User
Tags bulk insert create index postgresql

9.5.4 database Fast Insert large amount of data research background


In some scenarios, it is necessary to quickly load large amounts of data into the Abase database, such as database migrations, SQL log analysis, and so on. There are several scenarios for quickly inserting data on PG, how efficient are each scenario? How can I tune the speed of data loading?


Scene settings


SQL Log analysis is a tool for acquiring JDBC logs, parsing SQL, and sending analysis results. In the analysis phase, a large number of JDBC logs need to be parsed, and the parsed structured results are loaded into the database for subsequent processing. To test the data loading efficiency of different schemes, the analysis phase is the experimental scenario, the parsing of the JDBC log (multiple) begins with the completion of the structured data loading (completion of the package indexing) .


Environment preparation
    • Database environment
name value
Operating system CENTOS 6.5
Cpu Intel (R) Xeon (r) CPU e5-2698 v3 @ 2.30GHz, logic 64 core
Memory 316G
Disk RAID 10, write speed 1gb/s
Database version PostgreSQL 9.5.4
Database Memory Parameters shared_buffers:30g WORK_MEM:4MB MAINTENANCE_WORK_MEM:64MB
Database CPU Parameters Max_worker_processes:16


Build a Table statement


drop table if exists T_JDBC_SQL_RECORD;
--No primary key, C_BH query is not used, increase the insert speed and remove first
create table T_JDBC_SQL_RECORD (
C_BH VARCHAR (32),
C_BH_PARSE VARCHAR (32) NULL,
C_BH_GROUP VARCHAR (32) NULL,
C_BH_SQL VARCHAR (32) NULL,
DT_ZXSJ TIMESTAMP NULL,
N_RUNTIME INT NULL,
C_RZLJ VARCHAR (600) NULL,
N_STARTLINE INT NULL,
N_ENDLINE INT NULL,
N_SQLTYPE INT NULL,
N_SQLCOMPLEX INT NULL,
C_IP VARCHAR (100) NULL,
C_PORT VARCHAR (100) NULL,
C_XTBS VARCHAR (100) NULL,
N_CHECKSTATUS INT default 0,
N_SQL_LENGTH INT NULL,
N_SQL_BYTE INT NULL,
N_5MIN INT NULL,
C_METHOD VARCHAR (600) NULL,
C_PSSQL_HASH VARCHAR (300) NULL,
N_IS_BATCH INT,
N_RESULTSET INT
);

drop table if exists T_JDBC_SQL_CONTENT;
CREATE TABLE T_JDBC_SQL_CONTENT (
C_BH VARCHAR (32) NOT NULL,
C_PSSQL_HASH VARCHAR (300) NULL,
C_SQL_TEXT varchar NULL,
C_PSSQL_TEXT varchar NULL
);
Index statement
create index i_jdbc_sql_record_zh01 on t_jdbc_sql_record (c_bh_group, dt_zxsj, N_CHECKSTATUS, C_PSSQL_HASH);
create index i_jdbc_sql_record_pshash on t_jdbc_sql_record (c_pssql_hash);
create index i_jdbc_sql_content_pshash on t_jdbc_sql_content (c_pssql_hash);
alter table t_jdbc_sql_content add constraint t_jdbc_sql_content_pkey primary key (C_BH);
Asynchronous submission and unlogged table
 -Asynchronous commit, restart database after changes
 alter system set synchronous_commit to off;
 -unlogged table
 create unlogged table t_jdbc_sql_record
 ...
 create unlogged table t_jdbc_sql_content
 ...
JDBC log volume
19 JDBC log files, a total of 2G logs, 6 million records

Scheme setting
Solution name Solution description
Solution 1: Establish a structured table and its index, and multithread a single insert to load data
Solution two: Establish a structured table and its index, multi-thread batch insert to load data
Solution three: Establish a structured table and its index, the library is set to submit asynchronously, and the multi-thread batch insert loads data
Solution four: Establish a structured table, the library is set to submit asynchronously, multi-threaded insert inserts data, and establish an index
Solution five: Establish a structured table and its index, the table is set to unlogged table, multi-threaded bulk insert to load data
Solution six: Establish a structured table, set the table to unlogged table, multi-thread batch insert to load data, and establish an index
Solution 7: Establish a structured table, multi-thread batch insert to load data, and establish index experiment results
During each experiment, the parsed JDBC log volume, parsing code, and middleware environment remain unchanged. Only adjust process sequence and database parameters.

Number of Experiments Option One Option Two Option Three Option Four Option Five Option Six Option Seven
First time 3596s 2043s 1164s 779s 545s 528s 1192s
Second time 4092s 2068s 1283s 843s 528s 528s 1227s
Third time 3891s 2177s 1378s 858s 536s 537s 1248s
Average 3859s 2096s 1275s 826s 536s 531s 1222s result analysis
Comparison of scheme one and scheme two, the database parameters remain unchanged, and the process sequence remains unchanged
Option 1: A single insert submission takes 3859 seconds
Option 2: Batch insert submission takes 2096 seconds
Comparing scheme two, scheme three and scheme five, the sequence of the process is unchanged, and all are built table-> built index-> multi-threaded bulk insert
Scheme 2: Synchronous submission (waiting for WAL log to complete), it takes 2096 seconds
Option 3: Asynchronous submission (without waiting for WAL log completion), takes 1275 seconds
Solution 5: Does not record WAL log, takes 536 seconds
Comparison between scheme two and scheme seven, all submitted synchronously
Option 2: Create index before inserting data, it takes 2096 seconds
Solution 7: Indexing after inserting data, it takes 1222 seconds
Comparison of scheme three and scheme four, all are submitted asynchronously
Option 3: Create an index before inserting data, it takes 1275 seconds
Option 4: After inserting the data to build the index, it takes 826 seconds
Comparison between scheme five and scheme six, both do not record WAL log
Option 5: 536 seconds to create an index before inserting data
Option 6: After inserting the data to build the index, it took 531 seconds
to sum up
In this scenario:

Batch submission is 55% faster than single submission
Asynchronous submission is 40% faster than synchronous submission
Unlogged commits are 75% faster than synchronous commits
When logging and submitting synchronously, the later index is 40% faster than the first index
When logging and submitting asynchronously, the later index is 35% faster than the first index
When not recording logs, indexing later is slightly faster than indexing first, but the difference is not much
The fastest combination of inserted data is:
unlogged table + multi-threaded batch insert + post-build index

guess:
During the insert process, the index maintenance time accounts for 35% to 40% of the total time, and is mainly spent on log persistence.

other:
At the same time, some other indicator information during the experiment, such as the database write IO under different schemes never exceeds 100MB / s, need to continue analysis.

PostgreSQL 9.5.4 database fast INSERT large amount of data research

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.