Use PHP to read instance code for oversized files

Source: Internet
Author: User
Tags count fread mssql split

At the end of last year, a variety of Web site account information database leakage, is to force Ah, took the opportunity to download a few databases, ready to learn data analysts to analyze these account information. Although these data information has been "collated", but it is useful to learn by themselves, after all, there is such a large amount of data.





the large amount of data brings the problem is a single file is very large, can open this file is quite difficult, notepad do not expect, decisive panic. Using MSSQL clients can not open such a large SQL file, directly reported that the memory is not enough, because MSSQL is said to read data, is a one-time read to the data in memory, if the amount of data is too large, and insufficient memory, will directly cause the system to collapse.





Navicat Premium


here recommend a software navicat Premium, quite to the power ah, hundreds of trillion of SQL files easily opened, not a bit. And this client software supports MSSQL, MYSQL, Oracle ... And so on a variety of database connections, many other functions on their own slowly studied.





Although use Navicat can open csdn this 274MB of SQL file, but content is meaningless, and also inconvenient to these account information query, classification, statistics and so on operation. The only way is to read the data one by one, and then split the different fragments of each record, and then put the fragments into the database in the format of the data fields, so that they can be used later.





use PHP to read oversized files


PHP has a number of ways to read files, depending on the target file, take a more appropriate approach to effectively improve the efficiency of execution. Because the CSDN database file is very large, so we try not to read it all in a short time, after all, each read a piece of data will be split and write operations. So the more appropriate way is to read the file in the subregion, through the use of PHP fseek and fread combination, you can do to read a file in a certain part of the data, the following is the instance code:





copy code code as follows:


function Readbigfile ($filename, $count =, $tag = "rn") {


$content = "";/final Content


$current = "";//Current read content hosting


$step = 1;//How many characters per walk


$tagLen = strlen ($tag);


$start = 0;//starting position


$i = 0;//counter


$handle = fopen ($filename, ' r+ ');/read/write mode open file, pointer to file start location


while ($i < $count &&!feof ($handle)) {


fseek ($handle, $start, seek_set);//pointer set at beginning of file


$current = fread ($handle, $step);//Read File


$content. = $current;//Combined String


$start + = $step;//move forward
according to step size

//The length of the delimiter to intercept the string finally to avoid a few characters


$substrTag = substr ($content,-$tagLen);


if ($substrTag = = $tag) {//To determine if it is a newline or other delimiter


$i + +;


$content. = "<br/>";


}


}


//Close file


fclose ($handle);


//Return results


return $content;


}


$filename = "csdn.sql";//files to read


$tag = "n";//Line separator Note This must be
with double quotes

$count = 100;//Read Rows


$data = Readbigfile ($filename, $count, $tag);


Echo $data;





about the value of the variable $tag the function passed in, depending on the system, the values passed in are different: Windows uses "RN", Linux/unix "n", and Mac OS "R".





program execution Process: First define some basic variables to read the file, then open the file, position the pointer at the specified location of the file, and read the specified size of the content. Stores the content in a variable every time it is read, until the number of rows or the end of the file that the read requires is reached.





never assume that everything in the program will run as planned.





according to the above code, although you can get the specified location in the file, the size of the data specified, but the whole process only executed once, and can not get all the data. In fact, to get all the data, you can add to the outer layer of the loop to determine whether the file end of the loop, but this is a waste of system resources, and even because the file is too large to read and cause PHP execution timeout. Another method is to record and store the position of the pointer after the last time the data was read, and then execute the loop again, positioning the pointer at the last end, so there is no loop to read the file from the beginning to the end.





In fact csdn This database I have not yet to import the database, because at that time after the leakage of a few days cnbeta on an analysis, oh, move too fast. When you see others have done this, there is not much motivation to do it, but in order to learn, still have to take time to complete the matter.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.