Abridged summary version of Btrfs

Source: Internet
Author: User

Btrfs also has an important drawback that when an error occurs in a node in BTree, the file system loses all the file information under that node. EXT2/3, however, avoids this problem, which is called "Error diffusion".

Extensibility: Extent, B-tree, dynamic inode and other features ensure that Btrfs still has outstanding appearances on large machines, and the overall performance will not decrease with the increase of system capacity.

Data consistency: Only use cow things technology to ensure file system consistency. Btrfs also supports checksum and avoids the emergence of silent corrupt. Traditional file systems cannot do this.

Other Device management features: Snapshot, clone, manage multiple physical devices

Minimum unit: Extent: Some contiguous blocks, a extent defined by the starting block plus the length.

Dynamic Inode: Because the EXT2/3 uses a pre-allocated inode mechanism, the number of files is limited by the number of inode. Btrfs uses the dynamic inode allocation mechanism, where each inode is just one node in the B-tree, and the user can arbitrarily insert the new inode indefinitely, and its physical storage location is dynamically allocated. So Btrfs doesn't have a limit on the number of files.

For SSD-optimized file systems, Btrfs users can use the mount parameter to turn on special optimizations for SSDs.
Btrfs's COW technology fundamentally avoids repeated write operations on the same physical unit. If the user opens the SSD optimization option, Btrfs will be optimized on the underlying block space allocation policy: Aggregating multiple disk space allocation requests into a contiguous block of size 2M. The IO of a large contiguous address can improve IO performance by allowing micro code that is cured inside the SSD to be better read and write optimized.

COW transactions:
Understanding COW transactions, you must first understand the terms COW and transactions.
What is COW? The
so-called COW, that is, each time the disk data is written, the update data is written to a new block, when the new data is written successfully, and then update the relevant data structure point to the new block.
What is a transaction? The
COW can only guarantee the atomicity of a single data update. However, many operations in the file system need to update several different metadata, such as creating a file, you need to modify the following metadata:
Modify extent tree, allocate a disk space
to create a new inode, and insert the FS tree in
Add a directory entry, insert INTO FS
Any one step in the Tree has an error, the file cannot be created successfully, so it can be defined as a transaction. The
COW transaction guarantees file system consistency, and the system does not need to perform fsck after Reboot.

Checksum:
Checksum Technology ensures the reliability of the data and avoids the silent corruption phenomenon. For hardware reasons, the data read from the disk will be faulted. For example, the data stored in block A is 0x55, but the data read is 0x54, because the read operation is not an error, so this mistake can not be detected by the upper software. The
solution to this problem is to save the checksum of the data and check the checksum after reading the data. If not, you know the data is wrong. The
EXT2/3 does not have checksums and fully trusts the disk. Unfortunately, disk errors persist, not only on inexpensive IDE hard drives, but also in silent corruption problems with expensive RAID. And with the development of the storage network, even if the data is read from disk correctly, it is difficult to ensure that the network device can be safely traversed. The
Btrfs reads its corresponding checksum while reading the data. If the data that is eventually read from the disk is not the same as the checksum, Btrfs will first attempt to read the mirrored backup of the data, and Btrfs will return an error if the data does not have a mirrored backup. Before writing to disk data, Btrfs calculates the checksum of the data. The checksum and data are then written to disk at the same time.

Multi-Device Management:
Dynamic increase of equipment, dynamic expansion
Btrfs supports adding devices dynamically. After the user adds a new disk to the system, you can use Btrfs's command to add the device to the file system.
To make the most of the device space, Btrfs divides disk space into multiple chunk. Each chunk can use a different disk space allocation policy. For example, some chunk only store metadata, and some chunk only store data. Some chunk can be configured as mirror, while others chunk can be configured as stripe. This provides the user with very flexible configuration possibilities.

Subvolume:
Subvolume is a very elegant concept. That is, a part of the filesystem is configured as a complete sub-file system, called Subvolume.
With Subvolume, a large file system can be divided into sub-file systems that share the underlying device space and are allocated from the underlying device when disk space is required. This model has a number of advantages, such as the ability to make full use of disk bandwidth, which simplifies the management of space.
The so-called full use of disk bandwidth, refers to the file system can read and write in parallel to the underlying disk, this is because each file system can access all disk. Traditional file systems cannot share the underlying disk device, whether physical or logical, and therefore cannot be read and written in parallel.
The so-called simplified management is relative to the LVM and other volume management software. With the storage pool model, the size of each file system can be automatically adjusted. With LVM, if there is not enough space for a filesystem, the file system does not automatically use free space on other disk devices and must be manually adjusted using LVM's management commands.
The subvolume can be mounted as a root directory to any mount point. Subvolume is a very interesting feature and has many applications.
If the administrator only wants some users to access a portion of the file system, for example, if they want users to be able to access all of the content below/var/test/, they cannot access other content under/var/. Then/var/test can be made into a subvolume. /var/test This subvolume is a complete file system that can be mounted with the Mount command. For example, to mount to the/test directory, to give users access to/test, then the user can only access the contents of/var/test below.

Snapshots and clones:
Taking advantage of snapshots, administrators can stop the database at point-in-time T1 and establish a snapshot of the system. This process typically takes only a few seconds, and then the database service can be restored immediately. At any time thereafter, the administrator can back up the contents of the snapshot, at which time the user's modifications to the database do not affect the content in the snapshot. When the backup is complete, the administrator can delete the snapshot and free up disk space. The
snapshot is generally read-only and when the system supports a writable snapshot, this writable snapshot is called a clone. Cloning technology also has many applications. For example, install basic software in one system, then make different clones for different users, each user will use their own clone without affecting other users ' disk space. Very similar to virtual machines.
As previously described in Btrfs with COW transaction technology, as shown in Figure 1-10, after the end of the COW transaction, if the original node a,c,e is not deleted, then a,c,e,d,f still fully represents the file system before the start of the transaction. This is the basic principle of snapshot implementation. The
Btrfs uses the reference count to determine whether to delete the original node after the transaction commit. For each node, Btrfs maintains a reference count. When the node is referenced by another node, the count is added one, and when the node is no longer referenced by another node, the count is reduced by one. When the reference count is zero, the node is deleted. For normal Tree root, the reference count is added one at the time of creation because Superblock will refer to the root block. Obviously, the reference count for all other nodes in the tree is one in the initial case. When the COW transaction commits, Superblock is modified to point to the new root a ', the original reference count of root block A is reduced by one to 0, so a node is deleted. The deletion of a node causes the reference count of its descendants to be reduced by one, and the reference count of the descendant nodes under it, such as the B,c node, is also changed to 0, thus being deleted. When the D,e node is COW, the counter is added one because it is referenced by a ', so the counter is not zeroed at this time and thus is not deleted. When you create a Snapshot, Btrfs copies the Root A node to the SA and sets the reference count of the SA to 2. When a transaction commits, the reference count of the SA node is not zeroed, so the user can continue to access the files in the snapshot through Root SA.

Software RAID:
Btrfs metadata is RAID1 protected by default. As mentioned previously, btrfs divides the device space into chunk, and some chunk are configured as metadata, which stores only metadata. For this kind of chunk,btrfs divides the chunk into two bands, writes the metadata, simultaneously writes two strips, thus realizes to the metadata the protection.

Other Features:
Other features listed on the Btrfs home page are not easy to classify, and these features are advanced technologies in modern file systems that improve file system time or space efficiency.

Delay allocation:
In a file system, the frequent allocation and release of small spaces can result in fragmentation. Deferred allocation is a technique that saves data in memory when a user needs disk space. and send disk allocation requirements to the disk space allocator, the disk space allocator does not immediately allocate real disk space. Just record this request and return.
Disk space allocation requests can be frequent, so the disk allocator can receive a large number of allocation requests over a period of time that is deferred, some may be merged, and some requests may even be canceled during this delay. With such "Wait", it is often possible to reduce unnecessary allocations, as well as to consolidate multiple small allocation requests into one large request, thus improving IO efficiency.

Inline file:
A large number of small files, such as hundreds of bytes or smaller, are often present in the system. If you assign a separate data block to it, it can cause internal fragmentation and waste disk space. Btrfs stores the contents of small files in metadata and no longer allocates additional disk blocks for file data. Improved internal fragmentation and increased file access efficiency.
Due to the inline file technology, Btrfs handles small files efficiently and avoids disk fragmentation issues.

Directory Indexing Directory Index:
When the number of files in a directory is large, the directory index can significantly improve file search time. Btrfs itself uses BTree to store directory entries, so the efficiency of searching for files in a given directory is very high.

Compression:
We have used Zip,winrar and other compression software, a large file compression can effectively save disk space. Btrfs has a built-in compression function.
It is often thought that compressing data before it is written to disk consumes a lot of CPU time, which inevitably reduces the read and write efficiency of the file system. However, with the development of hardware technology, the gap between CPU processing time and disk IO time is increasing. In some cases, some CPU time and some memory, but can greatly save the number of disk IO, which can increase overall efficiency.
For example, a file requires 100 disk IO without being compressed. However, after a small amount of CPU time is compressed, only 10 disk IO is required to write the compressed file to disk. In this case, the IO efficiency is improved instead. Of course, this depends on the compression ratio. Currently Btrfs uses the defalte/inflate algorithm provided by zlib to compress and decompress. In the future, Btrfs should be able to support more compression algorithms to meet the different needs of different users.
Some types of files, such as JPEG files, can no longer be compressed. Trying to compress it will simply waste the CPU. For this reason, Btrfs will no longer compress the remainder of the file when compression is found to be weak after several blocks of a file have been compressed. This feature improves the IO efficiency of the file system to some extent.

Pre-allocation:
Many applications have the need to pre-allocate disk space. They can tell the filesystem to reserve a portion of space on the disk through the Posix_fallocate interface, but it does not write data for the time being. If the underlying file system does not support Fallocate, then the application only uses write to pre-write some useless information to reserve enough disk space for itself.
It is more efficient for the file system to support the reservation space and can reduce disk fragmentation because all the space is allocated once, making it more likely to use contiguous space. Btrfs supports Posix_fallocate.


Summarize:
At this point, we have a detailed discussion of the many features of Btrfs, but the features that Btrfs can offer are more than that. Btrfs is in the experimental development phase and will have more features.
Btrfs also has an important drawback that when an error occurs in a node in BTree, the file system loses all the file information under that node. EXT2/3, however, avoids this problem, which is called "Error diffusion".
But anyway, I hope you and I are starting to agree that Btrfs will be the most promising file system for Linux in the future.


File System creation:
Mkfs.btrfs
-L ' LABEL '
-D <type>: Raid0, RAID1, RAID5, Raid6, RAID10, single
-M <profile>: RAID0, RAID1, RAID5, Raid6, RAID10, single, DUP
-O <feature>
-O List-all: Lists all supported feature;
Properties View:
Btrfs filesystem Show
Mount File System:
Mount-t Btrfs/dev/sdb Mount_point

Transparent compression mechanism (copy over a file view size is exactly the same, this compression internal theory remains to be researched):
Mount-o Compress={lzo|zlib} DEVICE mount_point

Reduce the file system (I was in the test, two 20G of hard disk, I can only subtract 17G, and then more error):
Btrfs filesystem Resize-13g/mydata
Expansion File System:
Btrfs FileSystem Resize +13g/mydata
or (extended to maximum)
Btrfs FileSystem Resize Max/mydata
Add a hard disk:
Btrfs Device Add/dev/sdd/mydata
Balance (move) the other data in the/mydata to the newly added disk (which consumes IO resources):
Btrfs Balance Start/mydata
Btrfs Balance status# View status
Note: This may be a long time if the amount of data is too large
Balance also has pause (pause), resume, cancel, etc., which can be viewed through the man btrfs-balance
Remove a physical volume (Btrfs will automatically move the data first and then remove it):
Btrfs Device Delete/dev/sdb/mydata

Modify the RAID level (a bit dizzy, you need to check the relationship of these several organizational mechanisms)
Organizational mechanism for modifying data: Act on data chunks
Btrfs Balance-dconvert=raid0/mydata
Organizational mechanism for modifying metadata: Act on metadata chunks
Btrfs Balance-mconvert=raid1/mydata
Modify the organizational mechanism of the System: Act on system chunks
Btrfs Balance-sconvert=raid5/mydata
Btrfs Sub-command:
FileSystem
Device
Balance
subvolume# Management commands for sub-volumes

Create a child volume cache:
Btrfs Subvolume Create/mydata/cache
Hanging on a sub-volume
Mount-o subvol=cache/dev/sdb/mnt
You can also use SubID to mount
Mount-o subvolid=268/dev/sdb/mnt
View Volid:
Btrfs Subvolume List/mydata

Delete a sub-volume (Note: Please hang on the parent volume first):
Btrfs Subvolume Dalete/mydata/cache

Create a snapshot (the snapshot volume must be in the same volume group as the original volume):
Btrfs Subvolume Snapshot/mydata/logs/mydata/logs_snapshot
To delete a snapshot:
Btrfs Subvolume Dalete/mydata/logs_snapshot
To create a snapshot of a single file:
CP--reflink File.txt File.txt_snapshot
Convert other file systems to Btrfs (here take EXT4 test, other file systems must be strictly tested in advance):
Forced detection under:
Fsck-f/DEV/SDD1
Conversion (data not lost):
Btrfs-convert/dev/sdd1
See if it takes effect:
Btrfs filesystem Show
Rollback to the previous file system (previously EXT4):
Btrfs-convert-r/DEV/SDD1
View:
Blkid/dev/sdd1

Abridged summary version of Btrfs

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.