How to insert random string data in mysql, mysql string
Application scenarios:
Sometimes it is necessary to test the records inserted into the database, so these scripts are very necessary.
Create a table:
CREATE TABLE `tables_a` ( `id` int(10) NOT NULL DEFAULT '0', `name` char(50) DEFAULT NULL, PRIMARY KEY (`id`)) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Create a function that generates a random string:
set global log_bin_trust_function_creators = 1;DROP FUNCTION IF EXISTS rand_string;DELIMITER //CREATE FUNCTION rand_string(n INT)RETURNS VARCHAR(255)BEGIN DECLARE chars_str varchar(100) DEFAULT 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'; DECLARE return_str varchar(255) DEFAULT ''; DECLARE i INT DEFAULT 0; WHILE i < n DO SET return_str = concat(return_str,substring(chars_str , FLOOR(1 + RAND()*62 ),1)); SET i = i +1; END WHILE; RETURN return_str;END //delimiter ;
Create a procedure for inserting a table, starting with the number of x. How many random numbers are generated by z and how many random numbers are generated by z?
delimiter // create procedure test(x int(10),y int(10),z int(10))begin DECLARE i INT DEFAULT x; while i<y do insert into tables_a values(i,rand_string(z)); set i=i+1; end whi
Mysql random data generation and insertion
Dblp databases only reference a small amount of information, with an average of 0.2 articles referenced in one paper. Reference information can be randomly added as mentioned in a paper on the experiment dataset using dblp. Inspired by this, I intend to add 20 random references to each paper, so I wrote the following SQL statement:
String SQL = "insert into citation (pId1, pId2) values (select pId from papers limit ?, 1), (select pId from papers limit ?, 1 ))";
Use preparedstatement to submit the database in batch mode.
The first parameter is the rowid of the paper, from 0 ~ N (N is the total row of papers ). The second parameter is the 20 non-repeated random numbers generated by Java. The value range is 0-N. The data is then nested in the for loop, and each pieces of data is submitted to the database once.
This code cleverly uses the limit feature to complete random tuple selection, which was originally a secret. I thought that all the select statements were handed over to the database, saving the need for multiple jdbc connections, which should be completed quickly. It takes up to 22 minutes to insert 10 million (10000*10) data records. The final experiment needs to insert million pieces of data, that is, it takes about 14 h.
As a result, I began to reflect and constantly write similar programs to find the time bottleneck, and finally locked in select limit. This operation consumes a lot of time. When limit was selected, the reason was that the number is randomly generated, and the number needs to be mapped to tuple, that is, corresponding to rowid. Because the primary key of the papers table is not an incremental int, therefore, the default rowid does not exist. Later, you can add an auto_increment temp column in the papers table, and then delete the citation column. In this way, the SQL statement is changed:
String SQL = "insert into citation (pId1, pId2) values (select pId from papers where temp = ?), (Select pId from papers where temp = ?)) ";
Insert 10 million data records again, which takes 38 s. The efficiency is greatly improved, but I don't know if further optimization is possible.