A Free Trial That Lets You Build Big!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service
Consider the following Python code fragment. Do something with the data in the file, and then save the results back to the file:
Looks pretty simple, doesn't it? May not seem as simple as at first glance. I'm debugging apps on a product server, and there's always strange behavior.
This is an example of the failure mode I have seen:
The runaway server process overflowed a lot of logs and the disk was filled up. Write () throws an exception after the file is truncated, and the file becomes empty.
Several instances of the application are executed in parallel. At the end of each instance, the contents of the file eventually become a heavenly book because of the output mixed with multiple instances.
After the write operation is complete, the application triggers some subsequent actions. Power off after a few seconds. After we restarted the server, we again saw the old file contents. The data that has been passed to other applications is no longer consistent with what we see in the file.
There is nothing new in the next section. The purpose of this article is to provide common methods and techniques for Python developers who lack experience in system programming. I will provide code examples that allow developers to easily apply these methods to their own code.
What does "reliability" mean?
In a broad sense, reliability means that the operation can perform the functions it needs under all specified conditions. As for the operation of the file, this function is the question of creating, replacing, or appending the contents of the file. Here you can get inspiration from the database theory. The ACID properties of the classic transaction model are used as guidance to improve reliability.
Before we begin, let's take a look at how our examples relate to the ACID4 nature:
Atomicity (atomicity) requires that the transaction be either completely successful or fail completely. In the example above, a full disk may cause some content to be written to the file. In addition, if other programs are reading the files while they are being written, they may get a partially completed version, or even cause a write error
Consistency (consistency) indicates that the operation must be from one state of the system to another state. Consistency can be divided into two parts: internal and external consistency. Internal consistency means that the data structure of a file is consistent. External consistency means that the contents of a file are consistent with the data it is related to. In this example, because we do not understand the application, it is difficult to infer conformance. But because consistency requires atomicity, we can at least say that there is no guarantee of internal consistency.
Isolation (isolation) violates isolation if multiple identical transactions cause different results in concurrent execution transactions. It is clear that the code above does not protect against operation failure or other isolation failures.
Persistence (durability) means that change is permanent. Before we tell our users to succeed, we have to make sure that our data store is reliable and not just a write cache. The above code has successfully written to the data on the assumption that we call the Write () function and disk I/O is executed immediately. However, the POSIX standard does not guarantee this assumption.
Use database systems whenever possible
If we can obtain acid four properties, then we have achieved long-term growth in reliability. But it takes a lot of coding credit. Why repeat the invention of the wheel? Most database systems already have acid transactions.
Reliability data storage is already a problem that has been resolved. If you need reliability storage, use a database. It is likely that without decades of effort, your own ability to solve this problem is not as good as those who have been focused on this for years. If you don't want to install a large database server, you can use SQLite, which has acid transactions, is small, free, and it is included in the Python standard library.
The article should have ended here, but there are some well-founded reasons for not using the data. They are usually file formats or file location constraints. Both of these are difficult to control in the database system. The reasons are as follows:
We have to deal with other applications that produce a fixed format or a file in a fixed location,
We must write the file for the consumption of other applications (and apply the same restrictions)
Our documents must be easy for people to read or modify.
If we implement a reliable file update ourselves, here are some programming techniques for reference. I'll show you four common action file update modes below. After that, I'll discuss what steps to take to satisfy acid properties in each file update mode.
File update mode
Files can be updated in a variety of ways, but I think there are at least four common patterns. These four patterns will be the basis for the remainder of this article.
This is probably the most basic pattern. In the following example, the assumed domain model code reads the data, performs some calculations, and then reopen the existing file in write mode:
A variant of this pattern opens the file in read-write mode ("plus" mode in Python), looking for the starting position, explicitly calling truncate (), overwriting the file contents.
The advantage of this variant is to open the file only once and always keep the file open. This, for example, simplifies the addition of locks.
Another widely used pattern is to write new content to a temporary file, and then replace the original file:
This method is more robust to error than truncation-write method. Consider the following discussion of atomicity and consistency. Many applications use this method.
These two patterns are so common that the Ext4 file system in the Linux kernel can even detect these patterns and automatically fix some reliability flaws. But do not rely on this feature: You do not always use EXT4, and administrators may turn off this feature.
The third mode is to append new data to the existing file:
This pattern is used to write log files and other tasks that aggregate data processing. Technically speaking, its salient features are extremely simple. An interesting extension is that the regular operations update only by appending operations, and then periodically rearrange the files to make them more compact.
Here we make the directory a logical data store, creating a new uniquely named file for each record:
This pattern has the same cumulative characteristics as the attached mode. A huge advantage is that we can put a small amount of metadata in the filename. For example, this can be used to convey information about the processing status. A particularly ingenious implementation of the Spooldir pattern is the Maildir format. Maildirs uses a named scheme with additional subdirectories to perform the update operation in a reliable, unlocked manner. The MD and Gocept.filestore Libraries provide a convenient encapsulation for maildir operations.
If your filename generation does not guarantee unique results, it may even be necessary to require that the file be actually new. Then invoke the lower-level os.open () with the appropriate flags:
After opening the file in O_excl, we use Os.fdopen to convert the original file descriptor into a normal Python file object.
Apply ACID properties to file updates
Next, I will try to strengthen the file update mode. In turn, let's see what we can do to satisfy the ACID properties. I will keep it as simple as possible because we are not writing a complete database system. Please note that the material in this section is not exhaustive, but it can provide a good starting point for your own experiment.
The write-replace pattern provides atomicity because the underlying os.rename () is atomic. This means that at any given point in time, the process either sees the old file or sees the new file. This pattern is naturally robust to write errors: If the Write action triggers an exception, the rename operation is not executed, and all the risk of overwriting the correct old file with the corrupted new file is not available.
The attach mode is not atomic because there is a risk of attaching incomplete records. But there is a trick to make updates atomic: The checksum is labeled for each write operation. After reading the log, ignore any records that do not have a valid checksum. In this way, only the complete records will be processed. In the following example, the application does a periodic measurement, with a line of JSON records attached to the log each time. We calculate the CRC32 checksum for the byte representation of the record, and then attach to the same line:
The example code simulates the measurement by creating a random value each time.
To process This log file, we read one row at a time, separating the checksum from the record we read.
We now simulate the truncated write operation by truncating the last line:
When the log is read, the last incomplete line is rejected:
The method of adding checksums to logging is used in a large number of applications, including many database systems.
A single file in the Spooldir can also add checksums to each file. Another possible simpler approach is to borrow write-replace mode: first write the file to one side, then move to the final position. Design a naming scheme that protects files that are being processed by consumers. In the following example, all files ending with. tmp are ignored by the reader, so they can be used safely during the write operation.
Finally, the truncation-writes the non atomic sex. I'm sorry I can't provide a variant that satisfies the atom. After the interception operation is completed, the file is empty and no new content has been written. If the concurrent program is now reading the file or if there is an exception, the program is aborted, and we will not see a new version of the version as soon as we see it.
Much of what I'm talking about atomicity can also be applied to consistency. In fact, atomic updating is a prerequisite for internal consistency. External consistency means that several files are updated synchronously. This is not easy to do, and lock files can be used to ensure that read-write access does not interfere. Consider the files in a directory that need to be consistent with each other. The common pattern is to specify a lock file to control access to the entire directory.
Examples of writing programs:
Examples of reading programs:
This method only takes effect if all read programs are controlled. Because there is only one writer activity at a time (an exclusive lock blocks all shared locks), all of the methods have limited scalability.
Further, we can apply the write-replace pattern to the entire directory. This involves creating a new directory for each update and changing the compliance link after the update completes. For example, a mirror application maintains a directory containing compressed packages and index files that list file names, file sizes, and checksums. When a high image is updated, it is not enough to isolate the compression packet and the index file from the atomic update only. Instead, we need to provide both a compressed package and an index file to avoid checksum mismatches. To solve this problem, we maintain a subdirectory for each build, and then change the symbolic link to activate the build.
The new build 484 is in the process of being updated. When all the compression packs are ready and the index files are updated, we can switch the current symbolic link with an atomic call to Os.symlink (). Other applications always see either completely old or completely new builds. The reading program needs to use Os.chdir () to enter the current directory, and it is important not to specify the file with the full path name. No competition conditions occur when the Read program opens Current/index.json, and then the current/a.tgz is turned on, but the symbolic link has changed.
Isolation means that concurrent updates to the same file are serializable-there is a serial schedule that allows the actual execution of parallel scheduling to return the same result. The "real" database system uses advanced technologies such as MVCC to maintain serializable while allowing for high levels of parallelism. Back to our scene, we finally used the lock to update the serial file.
It is easy to lock the truncation-write update. It is only possible to obtain an exclusive lock before all file operations. The following example code reads an integer from the file and then increments it, and the file is last updated:
Using write-replace mode to lock updates is a bit of a hassle. Using locks like truncation-write can cause update conflicts. Some naïve implementation might look like this.
What's wrong with this code? Imagine two processes competing to update a file. The first process runs in the front, but the second process blocks the Fcntl.flock () call. When the first process replaces the file and releases the lock, the file descriptor that is now opened in the second process points to a "ghost" file containing the old content (no path name is available). To avoid this conflict, we must check that the open file is the same as the one returned by Fcntl.flock (). So I wrote a new Lockedopen context Manager to replace the built-in open context. To make sure that we actually have the correct file open:
def __init__ (self, filename, *args, **kwargs):
Self.filename = filename
Self.open_args = args
Self.open_kwargs = Kwargs
Self.fileobj = None
def __enter__ (self):
f = open (Self.filename, *self.open_args, **self.open_kwargs)
Fcntl.flock (f, fcntl. LOCK_EX)
fnew = open (Self.filename, *self.open_args, **self.open_kwargs)
If Os.path.sameopenfile (F.fileno (), Fnew.fileno ()):
f = fnew
Self.fileobj = f
def __exit__ (self, _exc_type, _exc_value, _traceback):
Adding locks is as simple as cutting-write updates: You need an exclusive lock and then the append is done. A long-running process that will open a file for long periods of time can release the lock when it is updated and allow the other to enter.
The Spooldir pattern has a very graceful nature that it does not require any locks. In addition, you build on the use of flexible naming patterns and a robust file generation. The Mail directory specification is a good example of a spooldir model. It can be easily adapted to other situations, not just mail processing.
Persistence is a little special because it depends not only on the application, but also on the OS and the hardware configuration. Theoretically, we can assume that the Os.fsync () or Os.fdatasync () call does not return results if the data does not reach persistent storage. In reality, we may encounter several problems: we may face an incomplete fsync implementation, or a bad disk controller configuration, they will not provide any warranty of permanence. There was a discussion from the MySQL developer about where the error occurred. Some database systems, such as PostgreSQL, even provide a choice of persistence mechanisms so that administrators can choose the best one at run time. However, unlucky people can only use Os.fsync () and expect it to be implemented correctly.
By truncating-write mode, we need to send a sync signal before we close the file at the end of the write operation. Note this usually involves another level of write caching. The glibc cache will even block it inside the process before it is passed to the kernel by the write operation. Also in order to get an empty glibc cache, we need to flush it before synchronizing ():
Alternatively, you can call Python with the parameter-U to obtain an buffered write for all file I/O.
Most of the time compared to Os.fsync () I prefer os.fdatasync () to avoid synchronizing metadata updates (ownership, size, mtime ...). ）。 Metadata updates can eventually result in disk I/O search operations, which can slow the process down a lot.
Using the same technique for writing-replacement style updates is only half the success. We have to make sure that the contents of the newly written file have been written to nonvolatile memory before replacing the old file, but what about the replacement operation? We cannot guarantee that the directory update is performing just fine. There are a lot of long speeches on the web about how to sync directory updates. But in our case, the old and new files are in the same directory, we can use a simple solution to avoid this problem.
We call the underlying Os.open () to open the directory (the Python's own open () method does not support opening the directory), and then executes Os.fsync () on the directory file descriptor.
Treating the Append update is similar to the truncation-write I said.
Spooldir mode is the same directory synchronization problem as the write-replace pattern. Fortunately, you can use the same solution: the first step is to synchronize the files, and then synchronize the directories.
This makes it possible to make a reliable update file. I have demonstrated the four properties that satisfy acid. The instance code for these shows acts as a toolbox. Master this programming technology to meet your needs most. Sometimes you don't need to satisfy all the acid properties, and you probably just need one or two. I hope this article will help you to make a fully understood decision, what to achieve and what to discard.
Start building with 50+ products and up to 12 months usage for Elastic Compute Service