Introduction to new features of ext4

Source: Internet
Author: User
Ext4

From: http://kernelnewbies.org/Ext4

Ext4 is part of
Linux 2.6.28 kernel, read the previous link to know more details about that release.

  1. Introduction
  2. Ext4 features
    1. Compatibility
    2. Bigger filesystem/file sizes
    3. Sub directory scalability
    4. Extents
    5. Multiblock allocation
    6. Delayed allocation
    7. Fast fsck
    8. Journal checksumming
    9. "No journaling" Mode
    10. Online defragmentation
    11. Inode-related features
    12. Persistent preallocation
    13. Barriers on by default
  3. How to Use ext4
    1. Creating a New ext4 filesystem from the scratch
    2. Migrate existing ext3 filesystems to ext4
    3. Mount an existing ext3 filesystem with ext4 without changing the format
1. Introduction

Ext4 is the evolution of the most used linuxfilesystem, ext3. in always ways, ext4 is a deeper improvement over ext3than ext3 was over ext2. ext3 was mostly about adding journaling toext2, but ext4 modifies important data structures of the filesystemsuch
The ones destined to store the file data. The result is afilesystem with an improved design, better performance, reliability andfeatures.

2. ext4 features2.1. compatibility

Any existing ext3 filesystem can be migrated toext4 with an easy procedure which consists in running a couple ofcommands in read-only mode (described in the next section ). this meansthat you can improve the performance, storage limits and features ofyour
Current filesystems without reformatting and/or reinstalling youros and software environment. if you need the advantages of ext4 on AProduction system, you can upgrade the filesystem. the procedure issafe and doesn' t risk your data (obviusly, backup of critical
Data isrecommended, Even if you aren't updating your filesystem :). ext4 willuse the new data structures only on new data, the old structures willremain untouched and it will be possible to read/modify them whenneeded. this means, of course, that once you
Convert your filesystem toext4 you won't be able to go back to ext3 again (although there's apossibility, described in the next section, of mounting an ext3filesystem with ext4 without using the new disk format and you'll beable to mount it with ext3 again,
But you lose response of the advantagesof ext4 ).

2.2. Bigger filesystem/file sizes

Currently, ext3 support 16 TB of maximumfilesystem size, and 2 TB of maximum file size. ext4 adds 48-bit blockaddressing, so it will have 1 EB of maximum filesystem size and 16 tbof maximum file size. 1 EB = 1,048,576 Tb (1 EB = 1024 Pb, 1 Pb = 1024 TB, 1
TB = 1024 GB). Why 48-bit and not 64-bit? There are somelimitations that wowould need to be fixed before making ext4 fully failed, which have not been addressed in ext4. the ext4 datastructures have been designed Keeping this in mind, so a future ETO updat
Ext4 will implement full 64-bit support at some point. 1 EB will beenough (really
Until that happens. (Note: the code to create filesystems bigger than16 TB is-at the time of writing this article-not in any stablerelease of e2fsprogs. it
Will be in future releases .)

2.3. sub directory scalability

Right now the maximum possible number of subdirectories contained in a single directory in ext3 is 32000. ext4breaks that limit and allows an unlimited number of sub directories.

2.4. extents

The traditionally Unix-derived filesystems likeext3 use an indirect block mapping scheme to keep track of each blockused for the blocks corresponding to the data of a file. this isinefficient for large files, specially on large file Delete andtruncate operations,
Because the mapping keeps a entry for every singleblock, and big files have blocks-> huge mappings, slow tohandle. modern filesystems use a different approach called "extents ". an extent is basically a bunch of contiguous physical blocks. itbasically
Says "the data is in the next n blocks ". for example, a 100 MB file can be allocated into a single extent of that size, instead ofneeding to create the indirect mapping for 25600 blocks (4 kb perblock ). huge files are split in several extents. extents improve
Theperformance and also help to reduce the fragmentation, since an extentencourages continuous layouts on the disk.

2.5. multiblock allocation

When ext3 needs to write new data to the disk, there's a block allocator that decides which free blocks will be usedto write the data. but the ext3 block Allocator only allocates oneblock (4kb) at a time. that means that if the system needs to write the100
Mb data mentioned in the previous point, it will need to call theblock Allocator 25600 times (and it was just 100 MB !). Not only this isinefficient, It doesn't allow the block allocator to optimize theallocation policy because it doesn't knows how should total
Data is beingallocated, it only knows about a single block. ext4 uses a "multiblockallocator" (mballoc) which allocates blocks in a single call, instead of a single block per call, avoiding a of Lot overhead. thisimproves the performance, and it's special
Useful with delayedallocation and extents. This feature doesn't affect the disk format. Also, note that the ext4 block/inode Allocator has other improvements, described in detail

In this paper.

2.6. Delayed allocation

Delayed allocationis a performance feature (it doesn't change the disk format) found in afew modern filesystems such as XFS, ZFS, btrfs or Reiser 4, and itconsists in delaying
The allocation of blocks as much as possible, contrary to what traditionally filesystems (such as ext3, reiser3, etc) do: allocate the blocks as soon as possible. for example, if a processwrite () s, the filesystem code will allocate immediately the blockswhere
The data will be placed-even if the data is not being writtenright now to the disk and it's going to be kept in the cache for sometime. this approach has disadvantages. for example when a process iswriting continually to a file that grows, successive write () S
Allocateblocks for the data, but they don't know if the file will keep growing. delayed allocation, on the other hand, does not allocate the blocksimmediately when the process write () s, rather, it delays the allocationof the blocks while the file is kept in
Cache, until it is really goingto be written to the disk. this gives the block Allocator theopportunity to optimize the allocation in situations where the oldsystem couldn't. delayed allocation plays very nicely with the twoprevious features mentioned, extents
And multiblock allocation, becausein seconds workloads when the file is written finally to the disk It willbe allocated in extents whose block allocation is done with the mballocallocator. the performance is much better, and the fragmentation ismuch improved
In some workloads.

2.7. Fast fsck

Fsck is a very slow operation, especially thefirst step: checking all the inodes in the file system. in ext4, at theend of each group's inode table will be stored a list of unused inodes (with a Checksum, for safety), so fsck will not check those inodes.
Theresult is that total fsck time improves from 2 to 20 times, dependingon the number of used inodes (http://kerneltrap.org/Linux/Improving_fsck_Speeds_in_Ext4). It must be
Noticed that it's fsck, and not ext4, who will build theList of unused inodes. this means that you must run fsck to get theList of unused inodes built, and only the next fsck run will be faster (you need to pass a fsck in order to convert an ext3 filesystem
To ext4anyway). There's also a feature that takes part in this fsck speed up-"flexible block groups"-that also speeds up filesystem operations.

2.8. Journal checksumming

The journal is the most used part of the disk, making the blocks that form part of it more prone to hardware failure. and recovering from a previous upted Journal can lead to massive upload uption. ext4 checksums the journal data to know if the journal blocks arefailing
Or upted. but journal checksumming has a bonus: It allowsone to convert the two-phase commit System of ext3's journaling to asingle phase, speeding the filesystem operation up to 20% in some cases-So reliability and performance are improved at the same
Time. (Note: the part of the feature that improves the performance, the asynchronouslogging, is turned off by default for now, and will be enabled infuture releases, when its reliability improves)

2.9. "No journaling" Mode

Journaling ensures the integrity of thefilesystem by keeping a log of the ongoing disk changes. however, it isknown to have a small overhead. some people with special requirementsand workloads can run without a journal and its integrity advantages. in ext4
The journaling feature can be disabled, which provides
Small performance improvement.

2.10. Online defragmentation

While delayed allocation, extents and multiblock allocation help to reduce the fragmentation, with usage filesystems can still fragment. for example: You Write threefiles in a directory and continually on the disk. some day you need toupdate the file of
Middle, but the updated file has grown a bit, sothere's not enough room for it. you have no option but fragment theexcess of data to another place of the disk, which will cause a seek, or allocate the updated file continually in another place, far from theother
Two files, resulting in seeks if an application needs to read allthe files on a directory (say, a file manager doing thumbnails on adirectory full of images ). besides, the filesystem can only care aboutcertain types of fragmentation, it can't know, for example,
That itmust keep all the boot-related files contiguous, because it Doesn 'tknow which files are boot-related. to solve this issue, ext4 willsupport online fragmentation, and there's a e4defrag tool which candefragment individual files or the whole filesystem.

2.11. inode-related features

Larger inodes, nanosecond timestamps, fast extended attributes, inodes reservation...

  • Larger inodes: ext3 supports retriable inode sizes (via the-I mkfs parameter), but the default inode size is 128 bytes. ext4 willdefault to 256 bytes. this is needed to accommodate some extra fields (like nanosecond timestamps or inode Versioning), and
    The remainingspace of the inode will be used to store extend attributes that aresmall enough to fit it that space. this will make the access to thoseattributes much faster, and improves the performance of applicationsthat use extend attributes by a factor
    Of 3-7 times.

  • Inode reservation consists in reserving several inodes when adirectory is created, expecting that they will be used in the future. this improves the performance, because when new files are created inthat directory they'll be able to use the reserved inodes.
    Filecreation and deletion is hence more efficient.

  • Nanoseconds timestamps means that inode fields like "modifiedtime" will be able to use nanosecond resolution instead of the secondresolution of ext3.

2.12. Persistent preallocation

This feature, available in ext3 in the latestkernel versions, and emulated by glibc in the filesystems that don't support it, allows applications to preallocate disk space: applicationstell the filesystem to preallocate the space, and the filesystempreallocates
The necessary blocks and data structures, but there's nodata on it until the application really needs to write the data in thefuture. this is what P2P applications do in their own when they "preallocate" the necessary space for a download that will last hoursor
Days, but implemented much more efficiently by the filesystem andwith a generic API. this has several uses: first, to avoid applications (like P2P Apps) doing it themselves inefficiently by filling a filewith zeros. second, to improve fragmentation, since
Blocks will beallocated at one time, as contiguously as possible. third, to ensurethat applications always have the space they know they will need, whichis important for rt-ish applications, since without preallocation thefilesystem cocould get full in the middle
Of an important operation. thefeature is available via the libc posix_fallocate () interface.

2.13. Barriers on by default

This is an option thatimproves the integrity of the filesystem at the cost of someperformance (you can disable it with "Mount-O barrier = 0", recommendedtrying it if you're benchmarking). From
This lwn article: "The filesystem code must, before writing the [journaling] commitrecord, be absolutely sure that all of the transaction's informationhas made it to the Journal. Just doing the writes
In the proper orderis unknown; contemporary drives maintain large internal caches andwill reorder operations for better performance. So the filesystem mustexplicitly struct the disk to get all of the journal data onto themedia before writing the commit
Record; if the commit record getswritten first, the journal may be upted. the kernel's block I/osubsystem makes this capability available through the use of barriers; in essence, a barrier forbids the writing of any blocks after thebarrier until all blocks
Written before the barrier are committed tothe media. By using barriers, filesystems can make sure that theiron-disk structures remain consistent at all times ."

3. How to Use ext4

At this time, all relevant distros support it. Grub also supports ext4. just use it.

Switching to ext4 is very easy. There are three different ways to switch:

3.1. Creating a New ext4 filesystem from the scratch
  • The easiest one, recommended fornew installations. Just update your e2fsprogs package to ext4, andcreate the filesystem with mkfs. ext4.

3.2. migrate existing ext3 filesystems to ext4

You need to use the tune2fs and fsck tools in the filesystem, And that filesystem needs to be unmounted. Run:

  • Tune2fs-O extents, uninit_bg, dir_index/dev/yourfilesystem

After running this command you must run fsck. if you don't do it, ext4 will not mount your filesystem. this fsck run is needed to returnthe filesystem to a consistent state. it will tell you that it findschecksum errors in the group descriptors-it's expected,
And it 'sexactly what it needs to be rebuilt to be able to mount it as ext4, sodon't get surprised by them. since each time it finds one of thoseerrors it asks you what to do, always say yes. if you don't want to beasked, add the "-P" parameter to the fsck
Command, it means "automaticrepair ":

  • Fsck-PDF/dev/yourfilesystem

There's another thing that must be mentioned. all your existingfiles will continue using the old indirect mapping to map all theblocks of data. the online defrag tool will be able to migrate each oneof those files to an extent format (using an IOCTL that
Tells thefilesystem to rewrite the file with the extent format; you can use itsafely while you're using the filesystem normally)

3.3. mount an existing ext3 filesystem with ext4 without changing the format

You can mount an existing ext3 filesystem withext4 but without using features that change the disk format. this meansyou will be able to mount your filesystem with ext3 again. you canmount an existing ext3 filesystem with "Mount-T ext4/dev/yourpartition/mnt ".
Doing this without having done the conversionprocess described in the previous point will force ext4 to not use thefeatures that change the disk format, such as extents, it will use onlythe features that don't change the file format, such as mballoc ordelayed
Allocation. You'll be able to mount your filesystem as ext3again. but obviusly you'll be losing the advantages of the ext4features that don't get used...

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.