How MySQL inserts random string data implementation method _mysql

Source: Internet
Author: User

Application Scenario:
Sometimes you need to test the records that you insert into the database to test, so you need these scripts very much.

To create a table:

CREATE TABLE ' tables_a ' (
  ' id ' int ') NOT null default ' 0 ',
  ' name ' char (#) default NULL,
  PRIMARY KEY (' id ') c6/>) Engine=innodb DEFAULT Charset=utf8;

To create a function that produces a random string:

Set global log_bin_trust_function_creators = 1;
DROP FUNCTION IF EXISTS rand_string;
DELIMITER//
CREATE FUNCTION rand_string (n INT)
RETURNS VARCHAR (255)
BEGIN
        DECLARE chars_str VARCHAR ( ) DEFAULT ' abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 ';
        DECLARE return_str varchar (255) DEFAULT ";
        DECLARE i INT DEFAULT 0;
        While I < n do
                SET return_str = concat (return_str,substring (Chars_str, FLOOR (1 + RAND () *62), 1));
                SET i = i +1;
        End While;
        return return_str;
End//

How much of the procedure,x to create the Insert table begins. Y is how much the end, Z is the number of random digits produced

Delimiter// 
CREATE PROCEDURE Test (x Int (ten), Y Int (ten), Z Int ())
begin
  DECLARE i int DEFAULT x;
  While I<y does
 insert into tables_a values (i,rand_string (z));
 Set i=i+1;
 End WHI

MySQL random data generation and insert

There are few references in the DBLP database, with an average of 0.2 articles quoted. A paper using DBLP to do experimental datasets mentions that reference information can be added randomly. Inspired by this, I'm going to add 20 random references to each paper, so I write the following SQL statement:

String sql = INSERT INTO citation (PID1,PID2) VALUES (select PID from papers limit?, 1), (select PID from papers limit?, 1 ))";

Use PreparedStatement to submit the database in a batch manner.

The first parameter is the ROWID information for paper, from 0~n (N is the total row of papers). The second parameter is a 20-0-n random number generated by Java, with a range of. It is then nested within the for loop, each 1w data is submitted to the database once.

This code clever use limit characteristic completes randomly chooses tuple, originally was secretly complacent. Since the thought of all the select to the database to do, eliminating the multiple connections through JDBC, should be able to run quickly completed. However, inserted in the 10w (10000*10) data, it takes 22 minutes. The final experiment needs to insert 400w data, which means it will take about 14h.

Then began to rethink, and constantly do write similar programs to find time bottlenecks, and ultimately locked in select limit, this operation is extremely time-consuming. The original choice of limit, the reason is: randomly generated is a number, to map the number to tuple, that is, corresponding to the ROWID; the default rowid does not exist because the primary key of the papers table is not ascending int. Later on, you can add a auto_increment temp column to the papers table, complete the citation insert, and then delete it. So the SQL statement is changed to:

String sql = INSERT INTO citation (PID1,PID2) VALUES (select PID from papers where temp=?), (select PID from papers where temp=)) ";

Insert 10w Data again, it takes 38s. Efficiency has increased dramatically, but it is not known if it can be further optimized.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.