Using ZFS (1) on Linux)

Source: Internet
Author: User

This article explores two methods for using ZFS in Linux. First, the Filesystem in Userspace (FUSE) system is used to push the ZFS File System to the user space to avoid license issues. The second method is a ZFS local port, which is used to integrate into the Linux kernel and avoid intellectual property issues.

Linux has an interesting relationship with the file system. Because Linux is open, it is often a key development platform for the next generation file system and innovative file system concepts. Two interesting new examples are the Ceph and continuous snapshot File System nilfs2 (of course, the main file system, such as the evolution of the fourth extended File System [ext4]) that can be expanded on a large scale ). It also serves as an archaeological site for legacy file systems-dos vfat, Macintosh (HPFS), VMS ODS-2, and Plan-9 remote file system protocols. However, for all the File systems that you find supported in Linux, there is a great interest in implementing the functions: the Zettabyte File System (Zettabyte File System, ZFS ).

ZFS was designed and developed by Sun Microsystems (under Jeff Bonwick). It was first published in 2004 and integrated into Sun Solaris in 2005 ). Although pairing the most popular open operating system with the most talked-about and most functional file system is the best match, the licensing issue restricts integration. Linux is protected through the GNU Public License (GPL), while ZFS is covered by Sun's Common Development and Distribution License (CDDL. These license agreements have different goals and introduce conflicting restrictions. Fortunately, this does not mean that you as a Linux User cannot enjoy ZFS and its functions.

ZFS Introduction

Calling ZFS a file system is a bit of a non-real name, because it is not just a file system in the traditional sense. ZFS combines the concept of logical volume manager with a rich array of features and a file system that can be scaled in a large scale. Let's start by exploring some principles that ZFS is based on. First, ZFS uses the pool storage model instead of the traditional volume-based model. This means that ZFS is stored as a shared pool that can be dynamically allocated (and reduced) as needed. This is better than traditional models where the file system is located on a volume and manages these assets using an independent volume manager. ZFS is embedded in the implementation of important feature sets (such as snapshots, write-as-you-go clone, continuous integrity checks, and data protection through RAID-Z. Furthermore, you can use your favorite File System (such as ext4) at the top of a ZFS volume ). This means that you can obtain the ZFS functions, such as snapshots in an independent File System (the file system may not directly support them ).

However, ZFS is not just a collection of functions that comprise a useful file system. Instead, it is a collection of integration and complementary functions for building outstanding file systems. Let's take a look at some of these features and then look at some of their practical applications.

Storage pool

As discussed earlier, ZFS incorporates the volume management feature to extract underlying physical storage devices to the file system. ZFS operates on a storage pool (called zpools), instead of directly viewing physical block devices. The storage pool is built from a Virtual Drive and can be physically represented by a drive or part of the drive. In addition, these pools can be dynamically constructed, even when these pools are active.

Instant copy

ZFS uses the "Write-as-you-go" model to manage stored data. Although this means that the data will never be written in place (never overwritten), it will write new blocks and update metadata to reference the data. There are multiple reasons for the advantage of writing and copying (not only because it can enable snapshots, cloning, and other functions ). Since data is never overwritten, this makes it easier to ensure that the storage is never in an inconsistent state (because the data is retained earlier after the new write operation is completed ). This allows ZFS to be based on transactions and is easier to implement functions similar to atomic operations.

An interesting side of the write-copy design is that all writes to the file system become sequential writes (because the re- ing is always performed ). This behavior avoids hotspot storage and utilizes sequential write performance (faster than random write ).

Data Protection

One of ZFS's many protection schemes can be used to protect a storage pool composed of Virtual Devices. You can not only mirror the pool across two or more devices (RAID 1), but also use parity to protect the pool (similar to RAID 5 ), you can also use the image pool across dynamic band widths (described later. Based on the number of devices in the pool, ZFS supports various parity schemes. For example, you can protect three devices through RAID-Z (RAID-Z 1); for four devices, you can use RAID-Z 2 (dual parity, similar to RAID6 ). For greater protection, you can use RAID-Z 3 for triple parity on a larger number of disks.

To speed up (data protection other than error detection does not exist), you can strip across devices (RAID 0 ). You can also create a striped image (to mirror a striped device), similar to RAID 10.

An interesting property of ZFS comes with the combination of RAID-Z, write-as-you-go transaction, and dynamic strip width. In the traditional RAID 5 architecture, all disks must have their own data in the Strip, or the strip is inconsistent. Because there is no way to automatically update all disks, this may cause a well-known RAID 5 Write Vulnerability (the Strip band in the drive of the RAID set is inconsistent ). If ZFS processes transactions and never needs to write data in place, the Write vulnerability is eliminated. Another convenience of this method is that when the disk fails and needs to be rebuilt. The traditional RAID 5 system uses data from other disks in the set to recreate the data on the drive. The RAID-Z traverses the available metadata to read only the data about the geometry and avoid reading the space that is not used on the disk. This behavior becomes more important as the disk grows and the number of rebuilds increases.


Although data protection provides the ability to re-generate data in case of a fault, it does not involve the effectiveness of the data in the first place. ZFS solves this problem by generating a 32-bit checksum (or 256-bit hash) for the metadata of each written block. This checksum is verified when the block is read to avoid silent data corruption. Backup data can be automatically read or regenerated in volumes with data protection (mirroring or AID-Z.

Checksum is stored together with metadata on ZFS, so misaligned writes can be detected and corrected-If Data Protection (RAID-Z) is provided )-.

Snapshots and clones

Due to the write-as-you-go nature of ZFS, functions similar to snapshot and clone become easy to provide. Because ZFS never overwrites data but writes it to a new location, it can protect earlier data (but is marked as deleted to reverse disk space when it is not important ). Snapshots are the storage of old blocks to maintain the file system status of a given instance in a timely manner. This method is also space-effective because there is no need to copy (unless you re-write all the data in the file system ). Clone is a form of snapshot in which you can get the snapshot that can be written. In this case, each clone shares the initial unwritten block, and the written block can only be used for cloning of a specific file system.

Variable Block Size

The traditional file system consists of blocks that match the static size of backend storage (512 bytes. ZFS implements variable block size for a variety of usage (usually up to kb, but you can change this value ). One important use of variable block size is compression (because the size of the result block during compression is ideally smaller than the initial size ). In addition to providing better storage network utilization, this function also minimizes the waste in the storage system (because it takes less time to transmit better data to the storage ).

In addition to compression, the support for variable block size also means that you can optimize the block size for the desired specific workload to improve performance.

Other functions

ZFS incorporates many other features, such as deduplication (minimizing data duplication), configurable replication, encryption, and Cache Management adaptive cache replacement, and online Disk Cleanup (identify and fix potential errors that can be fixed without protection ). It implements this feature through huge scalability and supports Addressable Storage (264 bytes) of 16 Gigabit megabytes ).

Related Article

E-Commerce Solutions

Leverage the same tools powering the Alibaba Ecosystem

Learn more >

Apsara Conference 2019

The Rise of Data Intelligence, September 25th - 27th, Hangzhou, China

Learn more >

Alibaba Cloud Free Trial

Learn and experience the power of Alibaba Cloud with a free trial worth $300-1200 USD

Learn more >

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.