PHP processes TXT files and imports massive data into the database. There is a TXT file containing 0.1 million records, in the format of: Column 1 column 2 column 3 Column 4 column 5a%313100adductive #1 adducting #1 adducent # 1a%335600nascent # 1a%355300em there is a TXT file, contains 0.1 million records in the following format:
Column 1 column 2 column 3 Column 4 column 5
A 00003131 0 0 adductive #1 adducting #1 adducent #1
A 00003356 0 0 nascent #1
A 00003553 0 0 emerging #2 emergent #2
A 00003700 0.25 0 dissilient #1
........................ There are 0.1 million more ..................
The requirement is to import data into the database. the data table structure is
Automatic word_id increment
Word [adductive #1 adducting #1 adducent #1] This TXT record must be converted to three SQL records
Value = Third Column-fourth column; if it is 0, this record is omitted from the data table.
[Php]
$ File = 'words.txt '; // TXT source file with 10 million records
$ Lines = file_get_contents ($ file );
Ini_set ('memory _ limit ','-1'); // do not limit the Mem size. Otherwise, an error is returned.
$ Line = explode ("\ n", $ lines );
$ I = 0;
$ SQL = "INSERT INTO words_sentiment (word, senti_type, senti_value, word_type) VALUES ";
Foreach ($ line as $ key => $ li)
{
$ Arr = explode ("", $ li );
$ Senti_value = $ arr [2]-$ arr [3];
If ($ senti_value! = 0)
{
If ($ I >= 20000 & $ I <25000) // batch import to avoid failure
{
$ Mm = explode ("", $ arr [4]);
Foreach ($ mm as $ m) // [adductive #1 adducting #1 adducent #1] This TXT record must be converted to 3 SQL records {
$ Nn = explode ("#", $ m );
$ Word = $ nn [0];
$ SQL. = "(\" $ word \ ", 1, $ senti_value, 2),"; // note that word may contain single quotes (such as jack's ), therefore, we must use double quotation marks to include word (escape)
}
}
$ I ++;
}
}
// Echo $ I;
$ SQL = substr ($ SQL, 0,-1); // remove the last comma
// Echo $ SQL;
File_put_contents('20000-25000.txt ', $ SQL); // It takes about 40 seconds to import 5000 entries at a time. importing too many max_execution_time entries at a time may fail.
?>
$ File = 'words.txt '; // TXT source file with 10 million records
$ Lines = file_get_contents ($ file );
Ini_set ('memory _ limit ','-1'); // do not limit the Mem size. Otherwise, an error is returned.
$ Line = explode ("\ n", $ lines );
$ I = 0;
$ SQL = "INSERT INTO words_sentiment (word, senti_type, senti_value, word_type) VALUES ";
Foreach ($ line as $ key => $ li)
{
$ Arr = explode ("", $ li );
$ Senti_value = $ arr [2]-$ arr [3];
If ($ senti_value! = 0)
{
If ($ I >= 20000 & $ I <25000) // batch import to avoid failure
{
$ Mm = explode ("", $ arr [4]);
Foreach ($ mm as $ m) // [adductive #1 adducting #1 adducent #1] This TXT record must be converted to 3 SQL records {
$ Nn = explode ("#", $ m );
$ Word = $ nn [0];
$ SQL. = "(\" $ word \ ", 1, $ senti_value, 2),"; // note that word may contain single quotes (such as jack's ), therefore, we must use double quotation marks to include word (escape)
}
}
$ I ++;
}
}
// Echo $ I;
$ SQL = substr ($ SQL, 0,-1); // remove the last comma
// Echo $ SQL;
File_put_contents('20000-25000.txt ', $ SQL); // It takes about 40 seconds to import 5000 entries at a time. importing too many max_execution_time entries at a time may fail.
?>
1. when importing massive amounts of data, pay attention to some restrictions on PHP. you can temporarily adjust them; otherwise, an error will be reported.
Allowed memory size of 33554432 bytes exhausted (tried to allocate 16 bytes)
2. use PHP to operate TXT files
File_get_contents ()
File_put_contents ()
3. during massive import, it is best to import data in batches, with a lower chance of failure.
4. before importing a large volume of data, the script must be tested multiple times before use. for example, use 100 pieces of data for testing.
5. after the import, if PHP's mem_limit is still insufficient, the program will still fail to run.
(We recommend that you modify php. ini to improve mem_limit, instead of using temporary statements)
Values column 1 column 2 column 3 Column 4 column 5 a 00003131 0 0 adductive #1 adducting #1 adducent #1 a 00003356 0 0 0 nascent #1 a 00003553 0 em...