Use PHP to read the instance code of a large file

Source: Internet
Author: User
Tags mssql client
The problem caused by a large amount of data is that a single file is very large and it is not easy to open this file, so don't count on it in Notepad. decisively, the database of various website account information leaked at the end of last year, this is awesome. I also downloaded several databases and prepared to learn from data analysts to analyze the account information. Although the data information has been "organized", it is useful to learn it by yourself. after all, there is such a large amount of data.

The problem caused by a large amount of data is that a single file is very large and it is not easy to open this file. don't count on it in notepad, and choose to crash. The MSSQL client cannot open such a large SQL file and reports insufficient memory. it is said that when MSSQL reads data, it puts the read data in the memory at one time, if the data volume is too large and the memory is insufficient, the system will crash.

Navicat Premium
We recommend a software Navicat Premium, which is quite powerful. a few hundred megabytes of SQL files can be opened easily without any issue. In addition, this client software supports MSSQL, MYSQL, Oracle ...... And other database connections, and many other functions will be studied by myself.

Although Navicat can be used to open the 274MB SQL file of CSDN, the content is meaningless, and it is not convenient to query, classify, and count the account information. The only method is to read the data one by one, split different fragments of each record, and store these fragments in the database as data fields, this makes it easier for you to use it later.

Use PHP to read large files
PHP has many file reading methods. a more appropriate method can be adopted based on different target files to effectively improve execution efficiency. Because the CSDN database file is very large, we try not to read it all in a short time. after all, each piece of data to be read must be split and written. The appropriate method is to read the file in different regions. by combining fseek with fread in PHP, you can read a part of data in the file at will. The following is the instance code:

Copy codeThe code is as follows:
Function readBigFile ($ filename, $ count = 20, $ tag = "\ r \ n "){
$ Content = ""; // final content
$ Current = ""; // store the currently read content
$ Step = 1; // The number of characters each time
$ TagLen = strlen ($ tag );
$ Start = 0; // start position
$ I = 0; // counter
$ Handle = fopen ($ filename, 'R + '); // open the file in read/write mode. the pointer points to the start position of the file.
While ($ I <$ count &&! Feof ($ handle )){
Fseek ($ handle, $ start, SEEK_SET); // The pointer is set at the beginning of the file.
$ Current = fread ($ handle, $ step); // read the file
$ Content. = $ current; // composite string
$ Start + = $ step; // move forward by step
// Extract the last few characters of a string based on the delimiter length
$ SubstrTag = substr ($ content,-$ tagLen );
If ($ substrTag ==$ tag) {// determines whether it is a line break or another separator
$ I ++;
$ Content. ="
";
}
}
// Close the file
Fclose ($ handle );
// Return results
Return $ content;
}
$ Filename = "csdn. SQL"; // file to be read
$ Tag = "\ n"; // line separator note that double quotation marks must be used here
$ Count = 100; // Number of read rows
$ Data = readBigFile ($ filename, $ count, $ tag );
Echo $ data;

For the value of $ tag passed in a function, the input value varies according to the system: "\ r \ n" for Windows and "\ n" for linux/unix ", "\ r" is used for Mac OS ".

The general process of program execution: First define some basic variables for reading the file, then open the file, locate the pointer in the specified position of the file, and read the content of the specified size. Each read operation stores the content in a variable until the number of rows or files meet the read requirements are reached.

Never assume that everything in the program will run as planned.

According to the code above, although we can get data of the specified position and size in the file, this process is only executed once and cannot get all the data. In fact, to get all the data, you can add a loop to the outer layer of the loop to determine whether the file ends, but this is a waste of system resources, PHP execution times out even because the file is too large to be read. Another method is to record and store the position of the pointer after the last data read, and then locate the pointer at the end of the last loop when the loop is executed again, in this way, there is no need to read the file from the beginning to the end in a loop.

In fact, I haven't imported the CSDN database to the database until now, because there was an analysis on CNBETA in a few days after the leak. haha, the operation was too fast. When you see that someone else has done this, there is no motivation to do it automatically. however, to learn it, you still need to take the time to finish it.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.