File system consistency is related to the method of the application, and if you do not call sync (), you need to do a good job of missing some of the data due to client exceptions or service-side failures. Missing data This is unacceptable to the application. So you need to call Sync () in the right place, for example, after writing a certain amount of data, although sync () is used to minimize the burden of HDFs, he still has a negligible overhead. So you need to weigh the robustness and throughput of your data, one of the good points of balance: Test your application to choose the balance between sync frequency performance
One of the design goals of Hadoop is the ability to store data on a reliable distributed cluster, where HDFS allows data loss, so data backup is significant. What data is backed up and where the data is backed up is critical. In the backup process, the most preemptive backup should be those that are not recoverable, the business value of important data emphasis: do not think that the HDFS copy mechanism can replace the data backup
The importance of consistency model to system design, data backup