Read and write files in PHP high concurrency

Source: Internet
Author: User
Tags file copy flock

Background

1, for PV is not high or the number of concurrent is not a very large application, do not consider these, general file operation method completely no problem

2, if the concurrency is high, when we read and write to the file, it is very likely that multiple processes into a file to operate, if not the corresponding exclusive access to the file, it is easy to cause data loss

For example: an online chat room (which assumes that the chat content is written to a file), at the same time, both user A and User B have to manipulate the data to save the file, the first is a open the file, and then update the data inside, but here B also opened the same file, also ready to update the data inside. When a writes a well-written file, it actually opens the file. But when B again save the file back, this has caused the loss of data, because here B users completely do not know that the file it opened when it changes it, a user also changed the file, so the last B user save changes, User A's update will be lost.

For such a problem, the general solution

1, when a process to operate the file, the first to lock the other

2, here only the process has the right to read the file, the other process if read now, is completely no problem, but if there is a process trying to update it, will be rejected by the operation,

3, the previous process of locking the file when the update operation of the file is complete, this frees the exclusive identity, then the file is restored to a state that can be changed

4, next similarly, if the process in the operation of the file, the file is not locked, then it can be assured that the bold file lock, alone to enjoy

So the general plan would be

$fp = fopen ("/tmp/lock.txt", "w+"), if (Flock ($FP, lock_ex)) {    fwrite ($fp, "Write something here\n");    Flock ($FP, lock_un);} else {    echo ' couldn ' t lock the file!} Fclose ($FP);

But in PHP, Flock seems to be not working so well! In the case of multiple concurrency, it seems that the resources are often exclusive, not released immediately, or not released at all, resulting in a deadlock, which makes the server CPU occupied very high, and sometimes the server is completely dead. It seems that in many Linux/unix systems, this happens.

So before using flock, be sure to consider carefully.

So there's no solution? In fact, it is not the case. If Flock () is used properly, it is possible to solve the deadlock problem. Of course, if you do not consider the use of the flock () function, there will also be a good solution to solve our problem.

Scenario One: When the file is locked, set a time-out.

The approximate implementation is as follows:

if ($fp = fopen ($fileName, ' a ')) {$startTime = Microtime ();d o {        $canWrite = Flock ($fp, LOCK_EX), if (! $canWrite) Usleep ( Round (rand (0, 100) *1000));} while ((! $canWrite) && ((Microtime ()-$startTime) <), if ($canWrite) {  fwrite ($fp, $dataToSave);} Fclose ($FP);}

Timeout set to 1ms, if there is no lock in time, it is repeatedly obtained, directly to the file operation right, of course. If the time-out limit is reached, it is necessary to exit immediately and let the other process do the lock.

Scenario Two: Do not use the flock function, borrowing temporary files to solve the problem of read and write conflicts

The general principle is as follows:

1, will need to update the file copy to our temporary file directory, the file last modified time to a variable, and for the temporary file to take a random, not easy to duplicate the file name

2, when the temporary file updated, and then detect the original file last update time and the previous saved time is consistent

3, if the last modification time consistent, the modified temporary file is renamed to the original file, in order to ensure the file status synchronization update, so need to clear the file status

4, however, if the last modification time and previously saved consistent, this means that during this period, the original file has been modified, at this point, you need to delete the temporary file, and then return false, indicating that there are other processes in the file

The approximate implementation code is as follows:

$dir _fileopen = "tmp"; function Randomid () {return time (). substr (MD5 (Microtime ()), 0, Rand (5, 12));    function Cfopen ($filename, $mode) {global $dir _fileopen;    Clearstatcache ();        do {$id = MD5 (Randomid (rand (), TRUE)); $tempfilename = $dir _fileopen. "    /". $id. MD5 ($FILENAME);    } while (File_exists ($tempfilename));        if (file_exists ($filename)) {$newfile = false;    Copy ($filename, $tempfilename);    }else{$newfile = true;    } $fp = fopen ($tempfilename, $mode); Return $fp? Array ($fp, $filename, $id, @filemtime ($filename)): false;} function Cfwrite ($fp, $string) {return fwrite ($fp [0], $string);}    function Cfclose ($fp, $debug = "Off") {global $dir _fileopen;    $success = fclose ($fp [0]);    Clearstatcache (); $tempfilename = $dir _fileopen. "    /". $fp [2].md5 ($fp [1]); if (@filemtime ($fp [1]) = = = $FP [3]) | | ($fp [4]==true &&!file_exists ($fp [1]) | |    $fp [5]==true) {rename ($tempfilename, $fp [1]); }else{Unlink ($TEMPFILename);//indicates that there are other processes in the operation target file, the current process is rejected $success = false; } return $success;} $fp = Cfopen (' Lock.txt ', ' A + '), Cfwrite ($fp, "Welcome to beijing.\n"), fclose ($fp, ' on ');

For the functions used in the above code, you need to explain:

1.rename () renames a file or a directory, which is more like a MV in Linux. It is convenient to update the path or name of the file or directory, but when the above code is tested in window, if the new file name already exists, it will give a notice that the current file already exists and works well under Linux.

2.clearstatcache (); Clears the status of the file. PHP caches all file attribute information to provide higher performance, but sometimes when a multi-process deletes or updates a file, PHP does not have time to update the file attributes in the cache, which can lead to access to the last update date that is not real data. So you need to use this function to clear the saved cache.

Scenario Three: Random Read and write of the manipulated files to reduce the likelihood of concurrency

This scenario seems to be used more when logging user access logs

Before the need to define a random space, the larger the space, the more likely the likelihood of concurrency, where random read and write space is assumed to be [1-500], then our log file distribution is log1~ to log500 range. Each time a user accesses, the data is randomly written to any file between log1~log500

At the same time, there are 2 processes logging, the A process may be the updated LOG32 file, and the B process? The update may be log399 at this point. You know, if you want the B process to operate LOG32, the probability is basically 1/500, approximately equal to zero.

When the access log needs to be analyzed, we just need to merge these logs before analyzing them.

When using this scenario to record one benefit of logging, process operations are less likely to be queued, allowing the process to complete each operation very quickly

Scenario Four: Put all the processes you want to manipulate into one queue. Then dedicate a service to complete the file operation

Each of the excluded processes in the queue is equivalent to the first specific operation, so the first time our service only needs to get the equivalent of the actual operation from the queue, if there are a lot of file operation process, it's OK, to queue up behind us, as long as willing to platoon, queue of how long it doesn't matter.

For the previous several programs, each has its own benefits! Broadly, it may be summed up into two categories:

1, need to queue (slow impact) such as plan one, two or four

2, do not need to queue. (Impact fast) Programme III

When designing a caching system, we typically do not use scenario three. Because the analysis program of scenario three and the writer is not synchronized, at the time of writing, completely regardless of the time to analyze the difficulty, just write the line. Imagine that when we update a cache, if we also take random file reading, it seems to be a lot more process to read the cache. However, the adoption of scenario one or two is completely different, although the write time needs to wait (when the acquisition lock is unsuccessful, will be repeatedly acquired),

But it is very convenient to read the documents. The purpose of adding a cache is to reduce the data read bottleneck, which improves system performance.

Original address: http://hqlong.com/2009/01/530.html

Read and write files in PHP high concurrency

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.