Use PHP to read an instance code _php instance of a very large file

Source: Internet
Author: User
Tags fread mssql
At the end of last year, a variety of Web site account information database leakage, is to force Ah, took the opportunity to download a few databases, ready to learn data analysts to analyze these account information. Although these data information has been "collated", but it is useful to learn by themselves, after all, there is such a large amount of data.

The large amount of data brings the problem is a single file is very large, can open this file is not easy, notepad do not expect, decisive panic. Using MSSQL clients can not open such a large SQL file, directly reported that the memory is not enough, because MSSQL is said to read data, is a one-time read to the data in memory, if the amount of data is too large, and insufficient memory, will directly cause the system to collapse.

Navicat Premium
Here recommend a software navicat Premium, quite to force ah, hundreds of trillion of SQL files easily opened, not at all. And this client software supports MSSQL, MYSQL, Oracle ... And so on a variety of database connections, many other functions on their own slowly studied.

Although use Navicat can open csdn this 274MB of SQL file, but content is meaningless, and also inconvenient to these account information query, classification, statistics and so on operation. The only way is to read the data one by one, and then split the different fragments of each record, and then put the fragments into the database in the format of the data fields, so that they can be used later.

reading oversized files using PHP
PHP has a number of ways to read files, depending on the target file, take a more appropriate approach, can effectively improve the efficiency of execution. Because the CSDN database file is very large, so we try not to read it all in a short time, after all, each read a piece of data will be split and write operations. So the more appropriate way is to read the file in the subregion, through the use of PHP fseek and fread combination, you can do to read a file in a certain part of the data, the following is the instance code:

Copy Code code as follows:

function Readbigfile ($filename, $count =, $tag = "\ r \ n") {
$content = "";//final content
$current = "";//Current read content hosting
$step = 1;//How many characters to go each time
$tagLen = strlen ($tag);
$start = 0;//starting position
$i = 0;//counter
$handle = fopen ($filename, ' r+ ');/read-write mode open file, pointer to file starting position
while ($i < $ Count &&!feof ($handle)) {
Fseek ($handle, $start, seek_set);//pointer set at start of file
$current = fread ($handle, $step //Read the file
$content. = $current;//Combined string
$start + = $step;//By step forward
//by length of delimiter intercept string last to avoid several characters
$substrTa g = substr ($content,-$tagLen);
if ($substrTag = = $tag) {//To determine if it is a newline or other separator
$i + +;
$content. = "<br/>";
}

//Close file
fclose ($handle);
//Returns the result
return $content;
}
$filename = "csdn.sql";//need to read the file
$tag = \ n;//Line separator Note This must be in double quotes
$count = 100;//Read the number of rows
$data = r Eadbigfile ($filename, $count, $tag);
Echo $data;

As for the value of the variable $tag that the function passes in, the values passed in vary according to the system: Windows uses "\ r \ n", Linux/unix "\ n", and Mac OS "\ r".

The general process of program execution: first define some underlying variables to read the file, then open the file, position the pointer at the specified location of the file, and read the specified size. Stores the content in a variable every time it is read, until the number of rows or the end of the file that the read requires is reached.

Never assume that everything in the program will run as planned.

According to the code above, although you can get the specified location in the file, the size of the data specified, but the whole process only executed once, and do not get all the data. In fact, to get all the data, you can add to the outer layer of the loop to determine whether the file end of the loop, but this is a waste of system resources, and even because the file is too large to read and cause PHP execution timeout. Another method is to record and store the position of the pointer after the last time the data was read, and then execute the loop again, positioning the pointer at the last end, so there is no loop to read the file from the beginning to the end.

In fact, CSDN this database I have not yet to import the database, because at that time after the leak not a few days cnbeta on an analysis, OH, the movement too fast. When you see others have done this, there is not much motivation to do it, but in order to learn, still have to take time to complete the matter.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.