Is the file actually written to the disk?

Source: Internet
Author: User

Author: eaglet

Please specify the source for Reference

 

Call FileStream. Close; FileStream. Flush; or using (FileStream fs = new FileStream (…) after writing the file (...)) {}, Is the file actually written to the disk? Most people may say they will write data to the disk, but I want to tell you, not necessarily!

 

Background

My company has thousands of computers running our system at the same time. During the actual operation, we found that sometimes the files we write will be all 0 or partially all 0, but in the program, it is certain that we have closed the file handle. This problem has plagued me for a long time. The probability is about 1‰, and most of them occur when the machine is restarted, that is, after we update the software, the machine must be restarted automatically, as a result, it is found that the size of some files in the updated software is correct, but the data is all 0.

Problem Analysis

The first consideration is whether it is a program bug. But after analysis, there is no problem with the program. We even use static functions such as File. WriteAllBytes to write files. This problem still occurs.

Is it because of the disk cache? Because Enable write caching on the device can be set on every disk in windows. if this function is enabled, the write operation will first write data to the disk cache, and then write data to the physical disk only when the size or time threshold is reached. If the content is not completely written to the disk, the computer will be restarted, resulting in data loss. Shows the settings:

 

 

So I canceled this function and found that the problem persists.

What the hell is going on?

In addition to providing cache at the hardware level of the disk, windows also has a file cache at the operating system level.

This function is called File Caching and is provided after windows 2000. As shown in:

 

 

When a process writes data to a disk, the file is cached to the System File Cache according to certain policies. After a certain threshold is reached, the file is written to the physical disk. Because the System File Cache is transparent to the application, we call Close of the file in the application. Flush can only ensure that the file has been written to the file cache of the operating system, however, it cannot be ensured that the file is actually written to the disk. Although this mechanism provides good write performance, it increases the risk of data loss. From the application point of view, we logically think that the write has been successful, but it is not actually written to the actual disk, that is, whether the write is successful or not, the software cannot know, this creates a lot of logic chaos. In particular, some service processes use File locks to control multiple process locks, such as mongoe.net and mongodb, which often occur when the file locks are locked after the restart. It is estimated that they are also related to the role of this mechanism.

 

So what are the advantages of this mechanism?

Of course, Microsoft provides this mechanism for some reason. Its biggest advantage is that it greatly improves the reading performance. We can do the following experiment:

When we open a large file and read the file in sequence, we find that the first read speed after the system starts up is very slow, which depends mainly on the disk read speed, because there is no cache for the first read. However, when we shut down the process and re-run the process to read the large file, whether it is sequential reading or random reading, it is a hundred times faster than the original, this is because the operating system cache plays a role in it and the data is read from the memory. Because the cache is global, the File Cache is not cleared after the process exits. The new version of my open-source full-text index project HubbleDotNet makes full use of this mechanism, greatly improving the speed of reading the index for the first time after the machine restarts.

The file caching mechanism for windows operating systems and how to optimize it are not covered in this article. I will talk about how this mechanism works in future articles.

Solution

Back to this question

Can we disable this file cache? The answer is no. Fortunately, windows provides an application with a flag calledFILE_FLAG_WRITE_THROUGHThis flag allows the application to directly write data to the disk while writing data to the cache.

The Code implemented in C # is as follows:

using (System.IO.FileStream fs = new FileStream(filePath, FileMode.Create, FileAccess.Write, FileShare.None,                8192, FileOptions.WriteThrough))

 

After the above changes have been made in the program, more than 2000 machines have been running for six months and no file data loss problem has been found (several times a month in the past ), this mechanism can basically prove to be effective.

In-depth questions:

1. Will data loss occur if the disk cache is not closed after WriteThrough is adopted?

Let's take a look at the figure above. The disk cache has two checkboxes. The first is whether to enable the disk cache, and the second is whether to disable the windows file write cache and refresh the disk. If the second check item is selected, data may be lost. If the second check item is not selected, no data is lost.

It is set by default. This field is not selected. From the actual test, the disk cache does not affect data loss.

2. Whether WriteThrough can reduce the write performance to the disk.

I think random write may have an impact, but if it is sequential write, the FileStream class already provides the caching function, which does not have much impact. Unless you directly call the windows File API to write files, and the content of each file written is small, this will indeed affect. Because each write operation triggers a physical disk write.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.