The various load file modes and time of MySQL and PostgreSQL were evaluated on the virtual machine. Because it is a test on the virtual machine, so time only to do the reference, not too serious, look good.
MySQL Tools:
1. Bring your own Mysqlimport tool.
2. Command line load Data infile ...
3. Use Mysql-connector-python driver to write the script.
PostgreSQL Tools:
1. Pgloader third-party tools.
2. Command line copy ... from ...
3. Use the Python script written by PSYCOPG2.
measured frequently structure:
mysql> desc t1;
+ ---------- + ----------- + ------ + ----- + ------------- ------ + ------- +
| Field | Type | Null | Key | Default | Extra |
+ ---------- + ----------- + ------ + ----- + ------------- ------ + ------- +
| id | int (11) | NO | PRI | NULL | |
| rank | int (11) | NO | | NULL | |
| log_time | timestamp | YES | | CURRENT_TIMESTAMP | |
+ ---------- + ----------- + ------ + ----- + ------------- ------ + ------- +
3 rows in set (0.00 sec)
mysql> select count (*) from t1;
+ ---------- +
| count (*) |
+ ---------- +
| 1000000 |
+ ---------- +
1 row in set (6.80 sec)
Test the CSV file:
t1.csv
MySQL's own loader: (time 24 wonderful)
mysql> load data infile ‘/tmp/t1.csv’ into table t1 fields terminated by ‘,’ enclosed by ‘” ’lines terminated by‘ \ r \ n ’;
Query OK, 1000000 rows affected (24.21 sec)
Records: 1000000 Deleted: 0 Skipped: 0 Warnings: 0
MySQL python script: (time 23 seconds)
>>>
Running 23.289 Seconds
MySQL comes with mysqlimport: (time 23 seconds)
[[email protected] ~] # time mysqlimport t_girl '/tmp/t1.csv' --fields-terminated-by = ',' --fields-enclosed-by = '"' --lines-terminated-by = ' \ r \ n '--use-threads = 2 -uroot -proot
t_girl.t1: Records: 1000000 Deleted: 0 Skipped: 0 Warnings: 0
real 0m23.664s
user 0m0.016s
sys 0m0.037s
PostgreSQL's own COPY: (time 7 seconds)
t_girl = # copy t1 from ‘/tmp/t1.csv’ with delimiter ‘,’;
COPY 1000000
Time: 7700.332 ms
Psycopg2 driver copy_to method: (time 6 seconds)
[[email protected] scripts] # python load_data.py
Running 5.969 Seconds.
Pgloader import CSV: (time 33 seconds)
[[email protected] ytt] # pgloader commands.load
table name read imported errors time
ytt.t1 1000000 1000000 0 33.514s
------------------------------ --------- ---------- ------- --------------
------------------------------ --------- ---------- ------- --------------
Total import time 1000000 1000000 0 33.514s
Pgloader pulls data directly from MySQL: (time 51 seconds)
[[email protected] ytt] # pgloader commands.mysql
table name read imported errors time
fetch meta data 2 2 0 0.138s
------------------------------ --------- ---------- ------- --------------
t1 1000000 1000000 0 51.136s
------------------------------ --------- ---------- ------- --------------
------------------------------ --------- ---------- ------- --------------
------------------------------ --------- ---------- ------- --------------
Total import time 1000000 1000000 0 51.274s
Attach commands.load and commands.mysql
commands.load:
LOAD CSV
FROM ‘/tmp/ytt.csv’ WITH ENCODING UTF-8
(
id, rank, log_time
)
INTO postgresql: // t_girl: [email protected]: 5432 / t_girl? Ytt.t1
WITH skip header = 0,
fields optionally enclosed by ‘" ‘,
fields escaped by backslash-quote,
fields terminated by ‘,‘
SET work_mem to ‘32 MB ’, maintenance_work_mem to ‘64 MB’;
commands.mysql:
LOAD DATABASE
FROM mysql: // python_user: [email protected]: 3306 / t_girl? T1
INTO postgresql: // t_girl: [email protected]: 5432 / t_girl? Ytt.t1
with data only
SET maintenance_work_mem to ‘64MB’,
work_mem to ‘3MB’,
search_path to ‘ytt’;
Attached pgloader manual:
http://pgloader.io/howto/pgloader.1.html