Various load file methods and time for MySQL and PostgreSQL are evaluated on virtual machines. Because it is the evaluation on the virtual machine, so the time is only for reference, don't be too real, just look at it.
MySQL tool:
1. built-in mysqlimport tool.
2. load data infile...
3. scripts written using mysql-connector-python Driver.
PostgreSQL tool:
1. pgloader third-party tools.
2. Command Line copy... from...
3. python scripts written using psycopg2.
Test Table Structure:
mysql> desc t1;+----------+-----------+------+-----+-------------------+-------+| Field | Type | Null | Key | Default | Extra |+----------+-----------+------+-----+-------------------+-------+| id | int(11) | NO | PRI | NULL | || rank | int(11) | NO | | NULL | || log_time | timestamp | YES | | CURRENT_TIMESTAMP | |+----------+-----------+------+-----+-------------------+-------+3 rows in set (0.00 sec)mysql> select count(*) from t1;+----------+| count(*) |+----------+| 1000000 |+----------+1 row in set (6.80 sec)
Test CSV file:
T1.csv
MySQL loader: (24 wonderful times)
mysql> load data infile '/tmp/t1.csv' into table t1 fields terminated by ',' enclosed by '"' lines terminated by '\r\n'; Query OK, 1000000 rows affected (24.21 sec)Records: 1000000 Deleted: 0 Skipped: 0 Warnings: 0
MySQL python script: (23 seconds)
>>>
Running 23.289 Seconds
MySQL comes with mysqlimport: (Time: 23 seconds)
[root@mysql56-master ~]# time mysqlimport t_girl '/tmp/t1.csv' --fields-terminated-by=',' --fields-enclosed-by='"' --lines-terminated-by='\r\n' --use-threads=2 -uroot -proott_girl.t1: Records: 1000000 Deleted: 0 Skipped: 0 Warnings: 0real 0m23.664suser 0m0.016ssys 0m0.037s
PostgreSQL COPY: (7 seconds)
t_girl=# copy t1 from '/tmp/t1.csv' with delimiter ',';COPY 1000000Time: 7700.332 ms
Psycopg2 drive copy_to method: (time 6 seconds)
[root@postgresql-instance scripts]# python load_data.py Running 5.969 Seconds.
Pgloader import CSV: (time 33 seconds)
[root@postgresql-instance ytt]# pgloader commands.load table name read imported errors time ytt.t1 1000000 1000000 0 33.514s------------------------------ --------- --------- --------- -------------------------------------------- --------- --------- --------- -------------- Total import time 1000000 1000000 0 33.514s
Pgloader pulls data directly from MySQL: (Time: 51 seconds)
[root@postgresql-instance ytt]# pgloader commands.mysql table name read imported errors time fetch meta data 2 2 0 0.138s------------------------------ --------- --------- --------- -------------- t1 1000000 1000000 0 51.136s------------------------------ --------- --------- --------- -------------------------------------------- --------- --------- --------- -------------------------------------------- --------- --------- --------- -------------- Total import time 1000000 1000000 0 51.274s
Attach commands. load and commands. mysql
Commands. load: load csv from '/tmp/ytt.csv' with encoding UTF-8 (id, rank, log_time) INTO postgresql: // t_girl: t_girl@127.0.0.1: 5432/t_girl? Ytt. t1 WITH skip header = 0, fields optionally enclosed by '"', fields escaped by backslash-quote, fields terminated by ', 'set work_mem to '32 mb ', maintenance_work_mem to '64 mb'; commands. mysql: load database from mysql: // python_user: python_user@192.168.1.131: 3306/t_girl? T1 INTO postgresql: // t_girl: t_girl@127.0.0.1: 5432/t_girl? Ytt. t1 with data only SET maintenance_work_mem to '64mb', work_mem to '3mb', search_path to 'ytt'; attached pgloader Manual: http://pgloader.io/howto/pgloader.1.html