How to solve the problem of concurrent read and write file conflicts in PHP

Source: Internet
Author: User
Tags flock md5 rand

For the day IP is not high or the number of concurrent numbers is not a big application, generally do not consider these! Using a normal file operation method is completely fine. However, if the concurrency is high, when we read and write to the file, it is very likely that multiple processes to the file to operate, if the file access to the corresponding exclusive, it is easy to cause data loss.

For example: an online chat room (which assumes that the chat content is written to a file), at the same time, user A and User B have to manipulate the data to save the file, first is a open the file, and then update the data inside, but here B also just opened the same file, also ready to update the data inside. When a writes a good file, here actually B has the file open. But when B keeps the file back, this has caused the loss of data, because the B user is completely unaware of the file it opened when it made changes to it, a user also changed the file, so the last B user to save changes, User A's update will be lost.

For such a problem, the general solution when a process is operating on a file, locking the other first means that only the process has permission to read the file, and if the other process is reading it now, it is completely fine, but if there is a process that is trying to update it, it will be rejected by the operator. The process in which the file was previously locked then if an update to the file is completed, the exclusive identity is released, and the file is restored to a state that can be changed. Next, similarly, if the process in the operation of the file, the file is not locked, at this time, it can confidently boldly lock the file, enjoy alone.

So the general solution would be:

$fp =fopen ('/tmp/lock.txt ', ' w+ ');

if (Flock ($FP, lock_ex)) {

Fwrite ($fp, "Write something Heren");

Flock ($FP, lock_un);

}else{

echo ' couldn ' t lock the file! ';

}

Fclose ($FP);

But in PHP, Flock doesn't seem to work that well! In the case of multiple concurrency, it seems to be often exclusive resources, not immediately released, or is not released at all, resulting in deadlock, so that the server's CPU occupied very high, and even sometimes the server to completely die. It seems to happen in many Linux/unix systems. So before using flock, you must consider carefully.

So there's no solution? Actually, it's not like that. If flock () we use it properly, it is entirely possible to solve the deadlock problem. Of course, if you do not consider using the flock () function, there will also be a good solution to solve our problems. After my personal collection and summary, roughly summed up the solutions are as follows.

Scenario One: When you lock a file, set a time-out period. Generally implemented as follows:

if ($fp =fopen ($fileName, ' a ')) {

$startTime =microtime ();

do{

$canWrite =flock ($fp, LOCK_EX);

if (! $canWrite) {

Usleep (Round (rand (0,100) *1000));

}

}while ((! $canWrite) && ((Microtime ()-$startTime) <1000));

if ($canWrite) {

Fwrite ($fp, $dataToSave);

}

Fclose ($FP);

}

Timeout is set to 1ms, if there is no lock in time, it is repeatedly obtained, directly to the right to operate the file, of course. If the timeout limit is already in place, you must exit immediately, leaving the lock to allow other processes to operate.

Scenario Two: Do not use the Flock function to borrow temporary files to solve the problem of read-write conflict. The general principle is as follows:

(1) Consider a copy of the file that needs to be updated to our temp file directory, save the file last modification time to a variable, and take a random, not easily duplicated file name for this temporary file.

(2) When the temporary file is updated, then detect the original file's last update time and the previous saved time is consistent.

(3) If the last modification time is consistent, the modified temporary files are renamed to the original file, in order to ensure the status of the file synchronization update, so you need to clear the file status.

(4) However, if the last modification time is consistent with the one previously saved, this means that during this period the original file has been modified, and then the temporary file needs to be deleted and then returned false, stating that there are other processes in the file at this time.

The approximate implementation code is as follows:

$dir _fileopen= ' tmp ';

function Randomid () {

return time (). substr (MD5 (Microtime ()), 0,rand (5,12));

}

function Cfopen ($filename, $mode) {

Global $dir _fileopen;

Clearstatcache ();

do{

$id =md5 (Randomid (rand (), TRUE);

$tempfilename = $dir _fileopen. ' /'. $id. MD5 ($FILENAME);

while (File_exists ($tempfilename));

if (file_exists ($filename)) {

$newfile =false;

Copy ($filename, $tempfilename);

}else{

$newfile =true;

}

$FP =fopen ($tempfilename, $mode);

Return $fp Array ($fp, $filename, $id, @filemtime ($filename)): false;

}

function Cfwrite ($fp, $string) {

return fwrite ($fp [0], $string);

}

function Cfclose ($fp, $debug = ' off ') {

Global $dir _fileopen;

$success =fclose ($fp [0]);

Clearstatcache ();

$tempfilename = $dir _fileopen. ' /'. $fp [2].md5 ($fp [1]);

if ((@filemtime ($fp [1]) = = $FP [3]) ($fp [4]==true&&!file_exists ($fp [1])) $FP [5]==true] {

Rename ($tempfilename, $fp [1]);

}else{

Unlink ($tempfilename);

Indicates that another process is operating on the target file and the current process is denied

$success =false;

}

return $success;

}

$fp =cfopen (' lock.txt ', ' A + ');

Cfwrite ($FP, "Welcome to BEIJING.N");

Fclose ($fp, ' on ');

For the function used by the above code, you need to explain:

(1) rename (); Renames a file or a directory that is more like the MV in Linux. It is convenient to update the path or name of the file or directory. But when I test the above code in window, if the new filename already exists, a notice is given, saying that the current file already exists. But working under Linux is good.

(2) Clearstatcache (); Clears the state of the file. PHP will cache all file attribute information to provide higher performance, but sometimes, multiple processes in the file to delete or update operations, PHP did not have time to update the cache file properties, easy to access to the last update time is not real data. So here you need to use this function to clear the saved cache.

Program III: The operation of the file to read randomly, to reduce the likelihood of concurrency.

This scenario appears to be more widely used when logging a user's access log. Prior to the need to define a random space, the larger the space, the greater the likelihood of concurrency, where random read-write space is assumed to be [1-500], then our log file distribution for log1~ to log500 ranged. Each time a user accesses, the data is randomly written to any file between the log1~log500. At the same time, there are 2 processes that log logs, the a process may be the updated LOG32 file, and the B process? Then the update may be log399. To know, if you want the B process to also operate LOG32, the probability is basically 1/500, almost equal to zero. When we need to analyze the access logs, we just need to merge the logs and analyze them. Using this scenario to record one benefit of the log, process operations are less likely to queue, allowing the process to complete each operation quickly.

Scenario Four: Put all the processes to be manipulated into one queue. Then a dedicated service completes the file operation. Each of the excluded processes in the queue corresponds to the first concrete operation, so the first time our service only need to get from the queue equivalent to the specific operational matters can be, if there are a large number of file operation process, it does not matter, platoon to the back of our queue, as long as willing to row, the queue is no matter how long.

For the previous several scenarios, each has its own benefits! It may be summed up in two categories:

(1) Need to queue (impact slow) such as program one, two or four

(2) There is no need to queue. (Impact fast) Programme III

When designing a caching system, we generally do not use scenario three. Because scenario three analysis program and write program is not synchronized, at the time of writing, completely do not take into account the difficulty of analysis, just write the line. Imagine if we were to update a cache, and if we used random file reads, there would seem to be a lot more flow in the read cache. But the adoption of scenario one or two is completely different, although write time needs to wait (when getting lock is unsuccessful, will be repeatedly acquired), but read the file is very convenient. The purpose of adding caching is to improve system performance by reducing data-reading bottlenecks.

From the personal experience and a summary of some information, there is nothing wrong, or did not mention the place, I welcome you to correct.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.