Apple's latest file storage system in the eyes of a ZFS developer APFS

Source: Internet
Author: User

Objective:

This article is translated from a blog post that was involved in ZFS file system development, now Delphix company CTO, Adam Levinso (Adam Leventhal): Apfs detailed (Apfs in Detail).

The Apfs file system is the latest file system that Apple is preparing to launch in 2017, following hfs+, which, according to existing public information, has many of the advanced features of contemporary other file management systems, which will be the only file system that unifies all Apple product ecosystems, It is one of the most important technical updates for Apple in the next ten years. (Refer to >1)

The ZFS file system is one of the most modern and advanced file systems known to the industry, and the first Sun Company (later acquired by Oracle) developed for its Solaris operating system with a lightweight file system with high storage capacity, integrated file system and volume management concepts, and a new disk logical structure that supports write-time copying, Snapshots and clones, dynamic stripe, variable block size, encryption, data compression and other features. Released as early as 2004, it was incorporated into the Solaris system's skeleton code in 2005, was used in the production environment in 2012, and the stable version was introduced in 2013, which has evolved from prototype to maturation for nearly 10 years. (Refer to >2)

The author is involved in the development of several core technologies of ZFS, which, according to the author's experience, has a very deep understanding of Unix-like operating systems and contemporary file systems, and through this article, readers can not only do a deep understanding of the technical details of APFS, but also understand the history and development of some file systems. And some technical details of the difference.

So, as a man with such deep technical background, his evaluation of APFS is a critique, a criticism more than a compliment, or an expectation of excess disappointment, a nit-picking or a technological excellence? Every reader will get their own feelings from their own point of view, regardless of the hope that this can bring us a lot of inspiration.

Adam Introduction


From the available public information, Adam was first known as a member of the three-person development team at DTrace, who was the gold medalist for the 2006 Wall Street Times Innovation Award. After graduating in 2001, he joined the Solaris core development team at Sun, where he left Oracle in August 2010 after Oracle acquired Sun. During his tenure at Sun and Oracle, he was also involved in the development of a number of core technologies for the ZFS file system. (Reference >3, >4)

Blog Original

The blog is divided into six parts, the original address:
Apfs in Detail:overview
Apfs in Detail:encryption, snapshots, and Backup
Apfs in detail:space efficiency and clones
Apfs in Detail:performance
Apfs in Detail:data Integrity
Apfs in Detail:conclusions

Explanation of translation points:
1. Metadata (translated here as metadata) is an Apple term for system-structured data (non-user data) in its filesystem, and atomic data (translated here into atomic data) is a term for the author's information about the ZFS file system.
2. In translation, due to the differences in English and Chinese habits, as well as the author's habits, will be interpreted as more than the original language, to explain the context of the text meaning.

Blog text

Here is the translation of the original blog:

Overview

Apple has just released a new file system that will be used in all Apple OS variants (MacOS, TvOS, iOS and watchOS) next year. Much of the current media exposure is based on an extension of Apple's developer documentation. I took part in the lecture and question-and-answer section of WWDC's APFS team because I wanted to increase my understanding of its details. The team's two generals, Giampauro Domenek (Dominic Giampaolo) Hotan village, Eric Tamura, and other team members, gave an overview and patient answers in a crowded room. Based on this data and first-hand experience, I would like to make an analysis of it as a user of both Apple ecosystem products and long-term operating system and file system developers.

I've divided my comments into sections and posted them on several blogs, and I want you to jump or jump straight to the conclusion (or my tweet summary) between topics you're interested in. (In short, the APFS-translator note) is best encrypted and the data integrity is the worst.

Basis

APFS, the full name of the Apple File System, began in 2014 by the Domenek-led technical team, developed independently from the very basics (in my previous blog, based on the core storage corestoragede, Domenek corrected my guess). I asked Domenek if he was inspired by other modern file systems, such as BSD Hammer, the btrfs of Linux or OPENZFS (used by Solaris,illumos, FreeBSD, Mac OS x,ubuntu, etc.), All of this has the similar functionality that APFS wants to achieve (note that Apple has made a fairly complete zfs transplant from the past, but apparently Domenek is not involved). Domenek explained that as a self-proclaimed file system Developer (it developed the BeOS filesystem, unfortunately Apple bought NeXTSTEP), he knew these technologies, but he didn't delve into them in order not to get too much influence.

He praised his outstanding team, a widely known old adage is also important: 10 to create a file system (translator Note: The author of the implication, in contact with the team can not break the law). Based on my experience with ZFS, how much has verified it to be correct. Apple wants to fully promote the use of APFS over the next 3-4 years, so it needs to be accelerated to mature.

Debt repayment

1985 HFS File system was applied to Apple's flagship product Mac 512K (memory, I!—-Translator reference the results of Google translation, the original Holly smoke was translated as "Holy smoke!" "), later hfs+ as its important update was released on the 4GB HDD G3 PowerMac in 1998. Now that the storage has grown thousands of thousand, hfs+ has been used by different people in different devices to compete in a variety of ways (that is, the iOS development team secretly made a variant of HFS, even the Mac OS team is not very clear), and with different features (log, lower case is not sensitive, etc.). It's too old, it's confusing, and more importantly it lacks a lot of important features that most operating systems support the basic features of the enterprise, such as nanosecond timestamps, checksums, snapshots, and support for sparse files. Plus support for large devices, you've got a list of features that APFS need to support since then.

APFS first came to repay the hfs+ technical debt that had not been maintained in the past (this is equivalent to: ZFS was developed in 2001 to replace the 1977 UFS), unifying all variants, introducing the required functionality, and, of course, starting with the underlying code.

For many of the compression features that are common in file systems, it is clear that the APFS Support feature List is missing. Conceptually it was simple, in order to soothe Domenek's nostalgia for BeOS, I even recalled the interview 2000 years ago, talking about how compression is improving the performance of the whole system, because I/O is always more time-consuming than computing (now well-known, originally a novelty idea), I asked the development team (which we included in the initial ZFS development) Why this feature is not there? Apple employees, despite agreeing to this view, implicitly hint (-apple features-) that this will be a feature of the APFS that everyone expects. But I'm not surprised that the compression feature is not included in the public release.

Encryption

The obvious encryption is the core attribute of APFS, the demand comes from different devices and various needs, such as the need for the file system to support multiple keys on the iphone, and the user-based key on the power of the pen. The word "innovation" has been heard on WWDC many times, and Apfs's support for many different encryption options is the most appropriate term for "innovation." It supports:

    1. No encryption
    2. Metadata and user data encryption for a single key
    3. Multiple keys and encrypt the metadata, file, and even part of the file separately

Multi-key encryption is more efficient for mobile devices, and when all data is encrypted, a separate key is used to decrypt access to its data. Unfortunately, it is not supported in the beta version of MacOS Sierra (the file system reports that the volume is unencrypted when an encrypted new volume is generated using the Diskutil command).

Related to encryption, when I try to use the Diskutil command (unless you add an option to the command line:-ihavebeenwarnedthatapfsisprereleaseandthatimaylosedata, It is found in the output that the APFS data is likely to be corrupted and let the user confirm it, and APFS supports the linear time Encrypting File System deletion (original: "Constant-cryptographic. In the diskutil output, it is called rewritable. This may imply that the encryption key cannot be exported again from the AFPs, and if so, the security delete only needs to delete the key, without erasing and repeatedly erasing the entire hard disk data. Multiple iOS documents say this feature requires special hardware support, and what's interesting is what kind of special hardware MacOS will be. Anyway, don't tell the FBI or the NSA, everybody agree?

Snapshots and backups

APFS brings the most urgent need for file system features: Snapshots. A snapshot allows you to preserve the state of the file system at a specific point in time, preserving the old data while continuing to use and update. Of course, the effective use of space-based, retain the old data while effectively tracking and only add new data, this feature for backup has a potential special value, that is, can effectively track the last time since the data update.

ZFS includes a snapshot and serialization mechanism, which makes backing up the file system and transmitting data remotely more efficient. Is APFS going to do the same? Domenek's answer is probably not. ZFS only outputs changed data, while the time machine may have an exclusive list. While this can be overcome, let me see how Apple will do it. For now, APFS is incompatible with the time machine because APFS lacks support for hard links to folders, which is a pretty disturbing measure of time-machine stability. It is hoped that APFS can effectively support the serialization function to support time machine backup.

The tools required for the snapshot feature are not included in the beta version of MacOS Sierra, but the project manager Eric Tamura also demonstrates the snapshot functionality. I used DTrace (Apple's technology from OpenSolaris) to discover that the tool had a sultry name called Fs_snapshot. Let others reverse engineer it for the right use.

Management

AFPs has another new feature called Space Sharing (sharing). A single APFS container (container) can include multiple volumes (Volumes) across the device, and Apple has a flashy comparison of instances that support multiple hfs+ with fixed allocated space, in fact ZFS and btrfs have similar shared storage pool concepts with nested file systems.

With Domenek and other APFS members, we discussed how volumes act as a whole for user-controlled snapshots and encryption, and you may want multiple file volumes to use different policies, such as the need to snapshot and back up system data at the same time, without having to back up and not having a tube (also ignored during snapshots)/private/ Var/vm/sleepimage (used to store memory data during hibernation) file.

Space sharing is not so much a key feature as an operation, you can think of it as a special folder with snapshots and encryption – which is why Apple's marketing department hasn't recruited me yet. (Unfortunately, this feature is not included in the beta version of MacOS Sierra, and the inability to generate a multi-volume container, which has been deleted by the original author in the blog), adds a volume that produces an unknown error (Do you know what 69625 means?). ) and an oversized disk image can also cause this error.

Space efficiency

The trend of modern file systems is to store more efficiently rather than increase the size of the device's storage space. The common practice is to provide compression (as mentioned earlier) and to reduce duplication. A less repetitive approach is to avoid duplicate storage when discovering the same chunks, which is great for file servers, where multiple users and multiple virtual machines may have multiple copies of a file and may not be of much use to a single user or a few users of Apple (of course, They're class servers, but they're not. The experience of supporting ZFS also tells me that it is very difficult to do well.

Apple's special space efficiency is linear time file and folder cloning. By the way, for MacOS files in many cases is actually a folder, for logically associating multiple files as an inseparable unit of practice, is actually a convenient way, right click on an application and display the contents of the package, you know what I said. Therefore, I will use "files" rather than "files and folders" as a salutation to ease the patience of reading the reader here (the literal author means to use a short salutation, because this article is long enough, in fact, to ridicule).

If you copy files in the same file system (more likely the same container) in Apfs, there will be no duplicate data, instead just update a fixed number of metadata (metadata) and share the previously existing data, and for any copy of the modified version, will result in the allocation of new space (i.e. copy-on-write, short-cow).

Although this feature is well demonstrated, I have not seen it in other file systems, and I suspect its purpose (update: Btrfs supports this feature and is referred to as reflinks-reference-based connection). Copying data between devices, such as copying to a USB flash drive, will certainly take some time. Why would I copy a file locally? The general purpose I can think of is layman version control: proposition, proposition Backup, old proposition, and drunken edit proposition save.

There are basically three types of files:

    • Files that are always completely rewritten: images, Microsoft Office documents and video files, etc.
    • Files added: Just like a log file
    • Files based on a record structure, such as a database

For ordinary users, the vast majority of files are the first. For APFS I can use spatial sharing to copy documents, but once the new version is saved, the benefits of this feature no longer exist, but the idea may be better for large files.

Personally, the only way to copy the contents of the "fair use" chapter of a document, such as the copyright law of a game set of rights, is to copy it to Dropbox, and now I need to decide whether to move the file to Dropbox or not, but the same is true for hard links.

Cloning can create potential confusion, and copying and deleting a file may not occupy or free any storage space. Imagine that you need to remove all versions of a large file to free up system space.

APFS's technicians don't seem to have a lot of practical examples in their brains, and in WWDC they seek advice from developers (the best I've heard is a copy of VM (virtual machine); Of course not the mainstream market). If I were to compare with general versioning alone, I was surprised that Apple didn't give a more elegant solution. You can see if APFS allows a user to search for changes to any file using a file-based time machine backup, which automatically and completely transparently produces a new file for each version of a file. You can browse previous versions, modify history, or delete all versions at once. In fact, Apple introduced this 5 years ago, but I've never heard of it until I search this post (see if you've clicked "Browse all Versions ...") AFPs can make it more concise, simplify its use, and provide general support for all applications. (APFS features) there is no storage problem that can solve my power game, and of course I don't think it's a problem.
Side note: The Finder's copy uses space efficiency cloning technology, but the CP command line does not.

Performance

APFS claims to be optimized for flash. Flash memory (NAND) is the internal part of your high-speed SSD. As Apple applied flash to ipod and iphone-changing industries, a large number of basic needs changed the flash economy, and (Apple) as a flash consumer changed the impact on the industry (as it often does), making hybrid drives and pure flash arrays a boost in the market. 10 years ago SSDs were just as expensive as DRAM memory, and now they can compete for the market with hard drives.

SSDs mimic the block addressing of traditional hard drives, but the internal workings are completely different. The magnetic media can read and write directly to each sector, and the flash memory reads the page data, but only one large chunk is deleted. This operation is managed by the FTL Flash translation layer, which is more like a file system that generates virtual correspondence between chunks and physical addresses, allowing operations on chunks and pages to look more like hard disk (sector operations). Apple has full control over the SSD,FTL and file systems to optimize these parts in a completely different way so they work together. What APFS does is actually controlled by NAND, which is a file system that supports the Flash memory feature rather than the one written for the Flash interface that you expect in 2016.

The APFS includes support for trim, which is a command in the ATA protocol that allows the file system to notify the SSD (exactly the FTL) that a space is free, and that SSDs require specific free space for high performance. SSDs have more physical capacity than they claim, such as a 1TB SSD that actually has 1TB (230-10243) bytes of flash memory, but it only shows 931GB of free space to sneak up on the industry's self-discipline standard 1TB (10003 = 1 billion bytes). With these extra spaces, FTL can achieve high performance and long life. Trim is a required feature of the file system, so it's not surprising that APFS supports it. The problem with trim is that it is only useful when there is space to be released, especially for performance improvements. If your plate is nearly full, trim can't do anything for you. I doubt that trim can bring any benefit to APFS, but just a placebo for the user.

APFS is also concerned about latency, Apple's first goal is to avoid the death of colored balls, APFS using input/output QoS (Service quality assurance) to prioritize access, that is, to make visible user requests take precedence over those non-time-sensitive background activities. This gives the user an undeniable benefit, as well as the ability of a mature file system.

Data integrity

There is no doubt that the most important task of a file system is to protect the integrity of the data, "This is my data, don't lose it, don't change it casually." If the file system can be fully trusted, then the "only" reason for backup is the dumb x operator. There are mechanisms in the file system to ensure data security.

Redundancy

APFS does not explicitly provide data redundancy. As Tian Cun Eric said on WWDC, the vast majority of Apple products have only one storage device (a logical SSD) to implement RAID, in fact redundancy is provided at the lower level, such as Apple RAID, hardware Raid,sans, and even a single storage device itself.

In an internal note, many SSDs running APFS products are composed of multiple independent NAND chips, and high-end SSDs are actually redundant within the hardware, albeit at the expense of capacity and performance. As stated above, the APFS for SSD optimization does not go farther than the interface of the surface data blocks, which is actually the function of the hardware itself.

Moreover, Apfs deleted the usual user to achieve data redundancy means: Copy files. Copying a file actually produces a lightweight clone, not a copy of the data. Once the device is damaged, all "replicas" will be damaged, while local full replication may affect only one backup.

Full consistency

Computer systems can have problems at any time-crashes, defects, and power-down, so the file system recovers data from these situations. The oldest method is to use the tool to check and repair the file system (the inconsistency-translator's note) (FSCK, the abbreviation for file system check) at startup, and more modern systems use a consistent format or narrow the inconsistency window to reduce the overall fsck detection to achieve the same goal. ZFS, for example, uses an atomic-level operation on disk to switch atoms, generating a new state (ensuring consistency-the translator's note).

Erasing can produce inconsistencies if the file system needs to rewrite multiple areas, while different areas may be new or old (which creates inconsistencies). Copy-on-write (COW) is to avoid this conflict, it always first completely locate the new data, and then release the old data space, rather than modify the original data. It is said that Apfs uses a "novel copy-and-write framework for metadata", Domenek emphasized the novelty, but did not elaborate. Later in the conversation, he made it clear that Apfs did not use ZFS's one-atom mechanism to update file system results to replicate all the metadata of the changed data.

Surprisingly, Apfs also has a FSCK_APFS tool that doesn't understand why this tool is needed even after asking Domenek. Compared to ZFS, the file system itself knows what the problem is, rather than relying on fsck to discover file system problems. It seems that Domenek is a bit confused, why ZFS gave up fsck, so it might just be my personal idea.

Check

The APFS is clearly not mentioned in the introduction of calibration. A checksum is a data digest or summary that detects (or corrects) a data error. The difference between the stories here is very subtle. APFS checks only metadata, not user data. The only reason for metadata validation is that the metadata is small (and the checksum does not take up much space), and the loss of it can result in data loss, which can cause the entire Mr Pande data to be unreadable if the high-level metadata is compromised. ZFS holds a copy of the metadata and a triple backup of the top-level metadata, which is why.

The interesting thing is (why) not verifying user data. The APFS engineer who talked to me stressed that Apple's storage device itself has a strong ECC check function, both SSD and disk media use redundancy checking and correction errors, they emphasize that the Apple device will not return the wrong data, NAND uses more than 128 bytes per 4KB to ensure that the data is remediated (correct) (compared to ZFS using 32 bytes for each 512-byte checksum, compared to APFS, the difference is not much, but note that based on the uncertainty of the analog variable, the SSD needs to use ECC calibration). In the life cycle of a device, it is possible for a device to generate a bit error that is higher than the error free, and there are other causes of error, such as worthless file system redundancy checks. SSD has a variety of markets, batch of consumer products do not provide end-to-end ECC guarantee, so that the data in the transmission may be wrong, not to mention the firmware itself may also have errors resulting in data loss.

Apple employees are interested in the experience of device decay, which results in a loss of integrity over time. I've seen a lot of instances where the device doesn't have an error but ZFS correctly found the bug. Apple has a rigorous inspection of equipment suppliers, and I agree that their products are of high quality, claiming that Apple product users have no concerns about attenuation, but if your software does not detect errors, how do you know the actual performance of the device? ZFS has found data errors on a $ Millions of array storage device, and if you can't find Apple's TLC NAND chip errors, it's amazing to think about the recent recall of the iphone 6 storage issue, in fact the Apple device has gone wrong.

For those who care about users in Mac data, lost data in HFS, or even expensive enterprise-class devices also lose data, I would be willing to sacrifice 16 bytes per 4KB (in exchange for data consistency), which is just sacrificing 1% of the data space.

Scrub

As time goes by you may want to check the device attenuation, it seems FSCK_APFS can do it, as mentioned earlier, because there is no redundancy and user data validation, then scrubbing (scrub) operation only helps to find errors and not fix errors, If I'm buying a bargain from a fry store instead of Apple's gilded premium, this could persuade Apple to change its mind (by giving up the assumption that Apple users will only use Apple accessories, and adding features like file system checksum redundancy)?

Summarize

I wonder if Apple has to replace hfs+, but they have already given the impression that maintaining a more than 30-year old software is more expensive than a new one. APFS is based on this point of view.

Based on Apple's presentation, I speculate that its core purpose is to:
-Meet all users (pen, phone and watch)
-Encryption is the first
-Snapshot is a modern backup

All of this will benefit all Apple users, based on the WWDC demo, Apfs on the right track (although the beta version of MacOS Sierra is still a long way off)

In the process of implementing a new file system, APFS's team added some of the expected functionality. HFS is generated in the era of 400KB floppy disks (the obsolete, ubiquitous save icon) that governs the world, and any file system after 2014 years should consider large storage and SSD devices. Copy-on-write (COW) and snapshots are essential features, the Finder in the Copy command to become faster is not a detour, the user situation is not clear, the conclusion is typical nonsense, self-seeking trouble, but still an interesting demonstration, the ball crash is really APFS want to avoid.

There are some features missing, such as performance, openness, and completeness. Squeeze equipment throughput is not very important for watch watchOS, it is only useful for a small subset of MacOS users, and everyone is interested in the performance of APFS when it is released (premature comparisons are misleading and unfair to its team). APFS's developer documentation has an open source: "No open source at this time", not expecting APFS to be open source recently or in the future, Apple better prove I was wrong, if APFS can become a world-class product, I would like to see it in Linux and FreeBSD, Even Microsoft has given up its own refs. As far as my ZFS experience is concerned, open source can accelerate the pace at which it achieves excellence. It's a pity that Apfs lacks the verification function of user data and does not provide data redundancy. Data integrity is one of the tasks of the filesystem, and I believe this is just as important for watches and phones as for servers.

APFS also needs to be improved in stability, as is true for all Apple users and all devices. The chances of success and failure are the same, and since APFS has been shared with the world and developers, Apple certainly has come to the conclusion from the most basic start in the discussions of the last few years, rather than adopting an existing modern technology, then the data integrity and openness is the time to be valued. I was impressed by Apple's desire to use APFS within 18 months, and the transition will be exciting no matter what the process.

Reference:
    1. Apfs:https://en.wikipedia.org/wiki/apple_file_system
    2. Zfs:https://en.wikipedia.org/wiki/zfs
    3. Adam Leventhal (programmer): Https://en.wikipedia.org/wiki/Adam_Leventhal_ (programmer) #cite_note-4
    4. Adam's LinkedIn Introduction: HTTPS://WWW.LINKEDIN.COM/IN/ADAMLEVENTHAL?AUTHTYPE=NAME&AUTHTOKEN=USBC

Apple's latest file storage system in the eyes of a ZFS developer APFS

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.