Quick de-duplication Analysis of tens of millions of mysql merged tables target: existing tables a and B, merge the data in the two tables to the c table. The data volume in Table a and table B is about 2 million. Basic information operating system version: CentOS release 5.6 64-bit operating system memory: 8 GB database version: 5.1.56-community 64-bit database initialization parameters: default database table and data table: www.2cto.com mysql> desc a2kw; + ------- + ------------- + ------ + ----- + --------- + ------- + | Field | Type | Null | Key | Default | Extra | + ------- + ------------- + ------ + ----- + --------- + ------- + | c1 | varchar (20) | YES | MUL | NULL | c2 | varchar (30) | YES | NULL | c3 | varchar (12) | YES | NULL | c4 | varchar (20) | YES | NULL | + ------- + ------------- + ------ + ----- + --------- + ------- + 4 rows in set (0.00 sec) Table bmysql> desc b2kw; + ------- + ------------- + ------ + ----- + --------- + ------- + | Field | Type | Null | Key | Default | Extra | + ------- + ------------- + ------ + ----- + --------- + ------- + | c1 | varchar (20) | YES | NULL | c2 | varchar (30) | YES | NULL | c3 | varchar (12) | YES | NULL | c4 | varchar (20) | YES | NULL | + ------- + ------------- + ------ + ----- + --------- + ------- + 4 rows in set (0.00 sec) the data in tables a and B is as follows: mysql> select * from a2kw limit 10; + ----------- + ------ + ---------- + | c1 | c2 | c3 | c4 | + ----------- + ------ + ---------- + | 662164461 | 131545534 | TOM0 | 20120520 | 226662142 | 605685564 | | TOM0 | 20120516 | 527008225 | 172557633 | TOM0 | 20120514 | 574408183 | 350897450 | TOM0 | 20120510 | 781619324 | 583989494 | TOM0 | 20120510 | 158872754 | 775676430 | TOM0 | 20120512 | 815875622 | 631631832 | TOM0 | 20120514 | 905943640 | 477433083 | TOM0 | 20120514 | 660790641 | 616774715 | TOM0 | 20120512 | 999083595 | 953186525 | TOM0 | 20120513 | + ----------- + ----------- + ------ + ---------- + 10 rows in set (0.01 sec) basic Steps: www.2cto.com 1. Create an index on Table B. mysql> select count (*) from b2kw; + ---------- + | count (*) | + ---------- + | 20000002 | + ---------- + 1 row in set (0.00 sec) mysql> create index ind_b2kw_c1 on b2kw (c1); Query OK, 20000002 rows affected (1 min 2.94 sec) Records: 20000002 Duplicates: 0 Warnings: 0 data volume: 20000002, time: 1 min 2.94 sec2, insert a and B into the temp table of the intermediate table, create mysql> create table temp select * from c2kw where 1 = 2; Query OK, 0 rows affected (0.00 sec) Records: 0 Duplicates: 0 Warnings: 0 insert data mysql> insert into temp select * from a2kw; Query OK, 20000002 rows affected (13.23 sec) Records: 20000002 Duplicates: 0 Warnings: 0 mysql> insert into temp select * from b2kw; Query OK, 20000002 rows affected (13.27 sec) Records: 20000002 Duplicates: 0 Warnings: 0 www.2cto.com mysql> select count (*) from temp; + ---------- + | count (*) | + ---------- + | 40000004 | + ---------- + 1 row in set (0.00 sec) data volume: 40000004, time: 26.50 sec3, temp Joint index, forced index remove duplicate data mysql> create index ind_temp_c123 on temp (c1, c2, c3); Query OK, 40000004 rows affected (3 min 43.87 sec) Records: 40000004 Duplicates: 0 Warnings: 0 view execution plan mysql> explain select c1, c2, c3, max (c4) from temp FORCE INDEX
(Ind_temp_c123) group by c1, c2, c3; + ---- + ------------- + ------- + --------------- + ----------
----- + --------- + ------ + ---------- + ------- + | Id | select_type | table | type | possible_keys | key
| Key_len | ref | rows | Extra | + ---- + ------------- + ------- + --------------- + -------------
-- + --------- + ------ + ---------- + ------- + | 1 | SIMPLE | temp | index | NULL | ind_temp_c123 | 71
| NULL | 40000004 | + ---- + ----------- + ------- + www.2cto.com ------- + --------------- + --------
------- + --------- + ------ + ---------- + ------- + 1 row in set (0.05 sec) mysql> insert into c2kw select c1, c2, c3, max (c4) from temp
Force index (ind_temp_c123) group by c1, c2, c3; Query OK, 20000004 rows affected (2 min 0.85 sec) Records: 20000004 Duplicates: 0 Warnings: 0 the actual cost is: 6 min
4. Delete the intermediate table mysql> drop table temp; Query OK, 0 rows affected (0.99 sec). The actual cost is: 1 sec.
5. create a c index mysql> create index ind_c2kw_c1 on c2kw (c1); Query OK, 20000004 rows affected (49.74 sec) Records: 20000004 Duplicates: 0 Warnings: 0 mysql> create index ind_c2kw_c2 on c2kw (c2); Query OK, 20000004 rows affected (1 min 47.20 sec) Records: 20000004 Duplicates: 0 Warnings: 0 mysql> create index ind_c2kw_c3 on c2kw (c3); Query OK, 20000004 rows affected (2 min 42.02 sec) Records: 20000004 Duplicates: 0 Warnings: 0 the actual cost is: 5 minutes www.2cto.com
6. Clear mysql tables a and B> truncate table a2kw; Query OK, 0 rows affected (1.15 sec) mysql> truncate table b2kw; Query OK, 0 rows affected (1.34 sec) the actual cost is: the total time spent by 3sec is about 15 minutes. The author RuleV5