Consider the following Python code snippets. Perform some operations on the data in the file and save the results back to the file:
Copy codeThe Code is as follows:
With open (filename) as f:
Input = f. read ()
Output = do_something (input)
With open (filename, 'w') as f:
F. write (output)
It looks simple, right? It may not seem as easy at first glance. I debug applications on the product server and often encounter strange behavior.
Here is an example of the failure mode I have seen:
The out-of-control server process overflows a large number of logs and the disk is filled up. Write () throws an exception after the file is truncated, and the file will become empty.
Several instances of the application are executed in parallel. After each instance ends, the file content is eventually changed to tianshu because the output of multiple instances is mixed.
After the write operation is completed, the application will trigger some subsequent operations. Power failure after several seconds. After we restarted the server, we again saw the old file content. The data that has been passed to other applications is no longer consistent with what we see in the file.
There is nothing new here. This article aims to provide common methods and technologies for Python developers who lack experience in system programming. I will provide code examples so that developers can easily apply these methods to their own code.
What does "reliability" mean?
Broadly speaking, reliability means that all required functions can be executed under all specified conditions. As for file operations, this function is about creating, replacing, or appending file content. Here we can get inspiration from the database theory. The ACID properties of the classic transaction model serve as a guide to improve reliability.
Before we start, let's take a look at how our example relates to ACID4:
Atomicity requires that the transaction be either completely successful or completely failed. In the above instance, if the disk is full, some content may be written to the file. In addition, if other programs are reading files while writing content, they may obtain partially completed versions or even cause write errors.
Consistency indicates that the operation must be in one state to another. Consistency can be divided into two parts: internal and external consistency. Internal consistency means that the data structure of the file is consistent. External consistency means that the content of a file is consistent with its related data. In this example, it is difficult to infer whether the application is consistent because we do not understand the application. However, because consistency requires atomicity, we can at least say that internal consistency is not guaranteed.
Isolation if multiple identical transactions cause different results in concurrent execution transactions, the Isolation is violated. Obviously, the above Code does not protect operation failures or other isolation failures.
Durability means that the change remains unchanged. Before we tell the user success, we must ensure that our data storage is reliable and not just a write cache. The premise that the above Code has successfully written data is that if we call the write () function, the disk I/O will be executed immediately. But the POSIX standard does not guarantee this assumption.
Use the database system whenever possible
If we can obtain four properties of ACID, we have achieved long-term development in terms of increased reliability. However, this requires a lot of coding credit. Why did we reinvent the wheel? Most database systems already have ACID transactions.
Reliability Data storage is a problem that has been solved. If you need reliability storage, use the database. It is very likely that, without decades of effort, your ability to solve this problem on your own is not as good as those who have been focusing on this aspect for years. If you do not want to install a large database server, you can use sqlite, which has ACID transactions, is small and free, and is included in the standard Python library.
The article should have ended here, but there are still some underlying reasons, that is, not to use data. They are generally file format or file location constraints. Both of these are difficult to control in the database system. The reasons are as follows:
We must process files in a fixed format or location generated by other applications,
We must write files for the consumption of other applications (the same restrictions as the application)
Our files must be easily read or modified.
If we implement reliable file updates by ourselves, there are some programming techniques for your reference. The following describes four common file update modes. After that, I will discuss the steps to satisfy the ACID nature in each file update mode.
File update mode
Files can be updated in multiple ways, but I think there are at least four common modes. These four modes will serve as the basis for the rest of this article.
Truncation-write
This may be the most basic mode. In the following example, assume that the Domain Model Code reads data, executes some calculations, and re-opens the existing file in write mode:
Copy codeThe Code is as follows:
With open (filename, 'R') as f:
Model. read (f)
Model. process ()
With open (filename, 'w') as f:
Model. write (f)
A variant of this mode opens a file in read/write mode ("add" Mode in Python), finds the starting position, explicitly calls truncate (), and overrides the file content.
Copy codeThe Code is as follows:
With open (filename, 'a + ') as f:
F. seek (0)
Model. input (f. read ())
Model. compute ()
F. seek (0)
F. truncate ()
F. write (model. output ())
This variant only opens the file once and keeps the file open. For example, locking can be simplified.
Write-replace
Another widely used mode is to write new content to a temporary file and then replace the original file:
Copy codeThe Code is as follows:
With tempfile. NamedTemporaryFile (
'W', dir = OS. path. dirname (filename), delete = False) as tf:
Tf. write (model. output ())
Tempname = tf. name
OS. rename (tempname, filename)
This method is more robust to errors than the truncation-write method. See the discussion on Atomicity and consistency. This method is used by many applications.
These two modes are so common that the ext4 File System in the Linux kernel can even automatically detect these modes to automatically fix some reliability defects. But do not rely on this feature: You are not always using ext4, and the Administrator may turn this feature off.
Append
The third mode is to append new data to an existing file:
Copy codeThe Code is as follows:
With open (filename, 'A') as f:
F. write (model. output ())
This mode is used to write log files and other tasks that accumulate and process data. Technically, it is extremely simple. An interesting extended application is to update files only through append operations in regular operations, and then reorganize files regularly to make them more compact.
Spooldir
Here, we store the directory as the logical data and create a unique named file for each record:
Copy codeThe Code is as follows:
With open (unique_filename (), 'w') as f:
F. write (model. output ())
This mode has the same accumulation characteristics as the additional mode. A huge advantage is that we can put a small amount of metadata in the file name. For example, this can be used to convey the processing status information. A particularly clever implementation of the spooldir mode is the maildir format. Maildirs uses the naming scheme of adding sub-directories to perform update operations in a reliable and lockless manner. The md and gocept. filestore libraries provide convenient encapsulation for maildir operations.
If your file name generation does not guarantee unique results, it may even require the file to be actually new. The following code calls a low-level OS. open () with a suitable flag ():
Copy codeThe Code is as follows:
Fd = OS. open (filename, OS. O_WRONLY | OS. O_CREAT | OS. O_EXCL, 0o666)
With OS. fdopen (fd, 'w') as f:
F. write (...)
After opening the file in O_EXCL mode, we use OS. fdopen to convert the original file descriptor to a common Python file object.
Apply ACID properties to file updates
Next, I will try to enhance the file update mode. In turn, let's see what we can do to satisfy the ACID attribute. I will try to keep it as simple as possible, because we do not want to write a complete database system. Note that the materials in this section are not thorough, but they can provide a good starting point for your own experiments.
Atomicity
The write-replace mode provides atomicity because the underlying OS. rename () is atomic. This means that at any given time point, the process can see the old file or the new file. This mode has a natural robustness to write errors: If a write operation triggers an exception, the RENAME operation will not be executed, and the risk of overwriting the correct old file with no corrupted new files will be eliminated.
The attaching mode is not atomic because there is a risk of attaching incomplete records. However, there is a trick to make updates atomic: Mark the checksum for each write operation. When reading logs, ignore all records with no valid checksum. In this way, only the complete record will be processed. In the following example, the application performs periodic measurements and attaches a JSON record to the log each time. We calculate the CRC32 checksum in the byte representation of the record and then attach it to the same row:
Copy codeThe Code is as follows:
With open (logfile, 'AB') as f:
For I in range (3 ):
Measure = {'timestamp': time. time (), 'value': random. random ()}
Record = json. dumps (measure). encode ()
Checksum = '{: 8x}'. format (zlib. crc32 (record). encode ()
F. write (record + B ''+ checksum + B '\ n ')
The sample code simulates a measurement by creating a random value each time.
Copy codeThe Code is as follows:
$ Cat log
{"Timestamp": 1373396987.258189, "value": 0.9360123151217828} 9495b87a
{& Quot; timestamp & quot;: 1373396987.25825, & quot; value & quot;: 0.40429005476999424} 149afc22
{"Timestamp": 1373396987.258291, "value": 0.232021160265939} d229d937
To process this log file, we read a row of records each time, separate the checksum, and compare it with the records we read.
Copy codeThe Code is as follows:
With open (logfile, 'rb') as f:
For line in f:
Record, checksum = line. strip (). rsplit (B '', 1)
If checksum. decode () = '{: 8x}'. format (zlib. crc32 (record )):
Print ('read measure: {} '. format (json. loads (record. decode ())))
Else:
Print ('checksum error for record {} '. format (record ))
Now we can simulate the truncated write operation by truncating the last row:
Copy codeThe Code is as follows:
$ Cat log
{"Timestamp": 1373396987.258189, "value": 0.9360123151217828} 9495b87a
{& Quot; timestamp & quot;: 1373396987.25825, & quot; value & quot;: 0.40429005476999424} 149afc22
{& Quot; timestamp & quot;: 1373396987.258291, & quot; value & quot;: 0.23202
When reading logs, the last incomplete line is rejected:
Copy codeThe Code is as follows:
$ Read_checksummed_log.py log
Read measure: {'timestamp': 1373396987.258189, 'value': 0.9360123151217828}
Read measure: {'timestamp': 1373396987.25825, 'value': 0.40429005476999424}
Checksum error for record B '{"timestamp": 1373396987.258291, "value ":'
The method for adding checksum to log records is used in a large number of applications, including many database systems.
A single file in spooldir can also add a checksum in each file. Another simpler method is to borrow the write-replace mode: first write the file to one side, and then move it to the final location. Design a naming scheme to protect files being processed by consumers. In the following example, all files ending with. tmp are ignored by the read program, so they can be safely used during write operations.
Copy codeThe Code is as follows:
Newfile = generate_id ()
With open (newfile + '. tmp', 'w') as f:
F. write (model. output ())
OS. rename (newfile + '. tmp', newfile)
Finally, truncation-write is non-atomic. Unfortunately, I cannot provide variants that satisfy atomicity. After the screenshot operation is completed, the file is empty and no new content is written. If a concurrent program reads a file or encounters an exception, the program is aborted. We cannot see any new version of the concurrent program.
Consistency
Most of the atomic content I'm talking about can also be applied to consistency. In fact, atomic update is a prerequisite for internal consistency. External consistency means that several files are synchronously updated. This is not easy. The lock file can be used to ensure that read/write access does not interfere with each other. Consider that the files in a directory must be consistent with each other. The common mode is to specify the lock file to control access to the entire directory.
Example of program writing:
Copy codeThe Code is as follows:
With open (OS. path. join (dirname, '. lock'), 'a +') as lockfile:
Fcntl. flock (lockfile, fcntl. LOCK_EX)
Model. update (dirname)
Example of a read program:
Copy codeThe Code is as follows:
With open (OS. path. join (dirname, '. lock'), 'a +') as lockfile:
Fcntl. flock (lockfile, fcntl. LOCK_SH)
Model. readall (dirname)
This method takes effect only when all read programs are controlled. Because each time there is only one write Program Activity (the exclusive lock blocks all the shared locks), the scalability of all the methods is limited.
Furthermore, we can write-replace the entire directory application. This involves creating a new directory for each update and changing the matching link after the update. For example, the image application maintains a directory that contains a compressed package and an index file listing the file name, file size, and checksum. When a high-speed image is updated, it is not enough to update the atomic content of the compressed package and the index file in an isolated manner. On the contrary, we need to provide both the compressed package and the index file to avoid non-matching checksum. To solve this problem, we maintain a sub-directory for each generation, and then change the symbolic link to activate the generation.
Copy codeThe Code is as follows:
Mirror
| -- 483.
| -- A. tgz
| -- B. tgz
| '-- Index. json
| -- 484.
| -- A. tgz
| -- B. tgz
| -- C. tgz
| '-- Index. json
'-- Current-> 483
The new 484 is being updated. When all the packages are ready and the index file is updated, we can use an atomic call to OS. symlink () to switch the current symbolic link. Other applications always see completely old or completely new generation. The read program needs to use OS. chdir () to enter the current directory. It is important not to use the full path name to specify the file. No. When the read program opens current/index. json, and then opens current/a. tgz, but the symbolic link has changed, the competition condition will appear.
Isolation
Isolation means that concurrent updates to the same file are serializable-there is a serial scheduling so that the actual execution of Parallel Scheduling returns the same results. "Real" database systems use advanced technologies such as MVCC to maintain serializability while allowing high levels of parallelism. Back to our scenario, we finally used the lock to update the serial file.
It is easy to lock the truncation-write update. You can obtain an exclusive lock only before all file operations. The following sample code reads an integer from the file, increments it, and finally updates the file:
Copy codeThe Code is as follows:
Def update ():
With open (filename, 'r + ') as f:
Fcntl. flock (f, fcntl. LOCK_EX)
N = int (f. read ())
N + = 1
F. seek (0)
F. truncate ()
F. write ('{}\ n'. format (n ))
Using the write-replace mode to lock the update is a bit troublesome. Using locks like truncation-write may cause update conflicts. A naive implementation may look like this
Copy codeThe Code is as follows:
Def update ():
With open (filename) as f:
Fcntl. flock (f, fcntl. LOCK_EX)
N = int (f. read ())
N + = 1
With tempfile. NamedTemporaryFile (
'W', dir = OS. path. dirname (filename), delete = False) as tf:
Tf. write ('{}\ n'. format (n ))
Tempname = tf. name
OS. rename (tempname, filename)
What is the problem with this code? Imagine two processes competing to update a file. The first process runs in front, but the second process is blocked in the fcntl. flock () call. When the first process replaces the file and releases the lock, the file descriptor opened in the second process points to a "ghost" file containing the old content (any path name cannot be reached ). To avoid this conflict, we must check whether the opened file is the same as that returned by fcntl. flock. So I wrote a new LockedOpen context manager to replace the built-in open context. To ensure that the correct file is opened:
Copy codeThe Code is as follows:
Class LockedOpen (object ):
Def _ init _ (self, filename, * args, ** kwargs ):
Self. filename = filename
Self. open_args = args
Self. open_kwargs = kwargs
Self. fileobj = None
Def _ enter _ (self ):
F = open (self. filename, * self. open_args, ** self. open_kwargs)
While True:
Fcntl. flock (f, fcntl. LOCK_EX)
Fnew = open (self. filename, * self. open_args, ** self. open_kwargs)
If OS. path. sameopenfile (f. fileno (), fnew. fileno ()):
Fnew. close ()
Break
Else:
F. close ()
F = fnew
Self. fileobj = f
Return f
Def _ exit _ (self, _ exc_type, _ exc_value, _ traceback ):
Self. fileobj. close ()
Locking an append update is as simple as locking a truncated-write update: an exclusive lock is required and the append is complete. If you need to run the file for a long time, you can release the lock when updating the file to allow other processes to enter.
The spooldir mode has a very elegant nature that it does not require any locks. In addition, you are using a flexible naming mode and a robust file name generation. The mail directory specification is a good example of the spooldir mode. It can easily adapt to other situations, not just handling emails.
Durability
Durability is a bit special because it depends not only on applications, but also on OS and hardware configuration. Theoretically, we can assume that if the data does not reach the persistent storage, no results will be returned if OS. fsync () or OS. fdatasync () is called. In actual situations, we may encounter several problems: We may face incomplete fsync implementations, or bad disk controller configurations, which cannot provide any persistence guarantee. There is a discussion from the MySQL developer about where errors will occur in detail. Some database systems, such as PostgreSQL, even provide the persistence mechanism, so that the administrator can choose the best one at runtime. However, unlucky people can only use OS. fsync () and expect it to be implemented correctly.
In the truncation-write mode, we need to send a synchronous signal before closing the file after the write operation is completed. Note that this usually involves another level of write cache. The glibc cache even stops the glibc cache before the write operation is passed to the kernel. Similarly, to get an empty glibc cache, we need to flush () It Before synchronization ():
Copy codeThe Code is as follows:
With open (filename, 'w') as f:
Model. write (f)
F. flush ()
OS. fdatasync (f)
Alternatively, you can call Python with the-u parameter to obtain unbuffered writes for all file I/O.
Most of the time I prefer OS. fdatasync () compared to OS. fsync () to avoid synchronizing metadata updates (ownership, size, mtime ...). Metadata Updates can eventually lead to disk I/O search operations, which slows down the entire process.
Using the same technique for writing-replace style updates is only half successful. Before replacing the old file, make sure that the content of the newly written file has been written into non-volatile memory. But what should we do with the replacement operation? We cannot ensure that the directory update is just executed. There are a lot of long articles on How to synchronize directory updates on the network. However, in this case, the old and new files are in the same directory. We can use a simple solution to avoid this problem.
Copy codeThe Code is as follows:
OS. rename (tempname, filename)
Dirfd = OS. open (OS. path. dirname (filename), OS. O_DIRECTORY)
OS. fsync (dirfd)
OS. close (dirfd)
We call the underlying OS. open () to open the directory (the open () method that comes with Python does not support opening the directory), and then execute OS. fsync () on the directory file descriptor ().
Append update is similar to the truncation-write method I mentioned.
The spooldir mode and the write-replace mode have the same directory synchronization problems. Fortunately, you can use the same solution: Synchronize files first and then synchronize directories.
Summary
This makes it possible to update files reliably. I have demonstrated the four major properties of ACID. The Demo code serves as a toolbox. Master this programming technology to meet your needs. Sometimes, you do not need to satisfy all ACID properties, but may only need one or two. I hope this article will help you make decisions that have been fully understood, what to implement, and what to discard.