On the design of Linux UBI subsystem

Source: Internet
Author: User

Problem areas

Flash storage device has the following features:

    • There is a bad block
    • Short service life
    • Storage medium is unstable
    • Slow read and write speed
    • Random access not supported (NAND)
    • Can only be changed by erasing 0 to 1
    • Minimum read or write unit is page or sub-page
    • Cheap

For the characteristics of flash equipment, the core functional requirements and quality requirements of the Flash file system should include the following aspects:

    • Write
    • Performance
    • Reliability
    • Durability

For these requirements, it can be analyzed that the Flash file system needs to meet the following attribute requirements:

    • Data protection
    • Bad block Management
    • Garbage collection
    • Wear leveling
    • Partition Management
    • File Management
    • Performance optimization

In the Ubifs file system, the requirements for data protection, bad block management, garbage collection, wear leveling, partition management in these 7 attributes are implemented by the UBI subsystem, which is also the focus of this analysis.

Schema model

Ubi is a subsystem of ubifs, located above Mtd, under Ubifs.

Within the UBI subsystem, several modules are subdivided (below), each of which is followed by an introduction.

The design structure is as follows:

In order to manage each subsystem in the architecture stack, UBI exports multiple control interfaces in the user state to facilitate the control management of the model.
/DEV/MTD0:
MTD object, the entity that operates on the MTD device
/dev/ubi_ctrl:
Ubi control object for mapping and solution mapping of Ubi to MTD (Attach and Detach)
/dev/ubi0:
UBI abstract Layer object, entity for UBI operation
/dev/ubi0_0:
Ubi volume object, Entity for UBI volume operation

UBI Data Model

Data is at the heart of modeling and design, and Ubi has 2 top-level data objects: Ubi_attach_info and Ubi_volume_desc. The data relationship model is as follows:

UBI Design of data persistence

Because of the need for wear leveling, logical block management, and volume management, the metadata that UBI itself supports for these functions requires persistent storage, such as: block erase times , Leb/PEB Mappings , Volume/ Leb Mapping , Sub-volume tables , Fastmap and other data, the specific data structure is:

OOB
Ubi_ec_hdr-ubi Erase counter Header
Ubi_vid_hdr-on-flash Ubi Volume identifier header
Ubi_vtbl_record-a record in the volume table.
Ubi_fm_sb-ubi FASTMAP Super Block
Ubi_fm_hdr-header of the FASTMAP data set
Ubi_fm_scan_pool-fastmap pool PEBs to being scanned while attaching
Ubi_fm_ec-stores the erase counter of a PEB
Ubi_fm_volhdr-fastmap Volume Header
Ubi_fm_eba-denotes an association beween a PEB and Leb

Examples of UBI_EC_HDR, UBI_VID_HDR, Ubi_vtbl_record, OOB's data structure definition, storage location, and data are as follows:

UBI_EC_HDR, stored on page 0 per peb (1MB), each page (8K) section of OOB (0X1C0);

UBI_VID_HDR, for Peb,vid HDR that has been assigned to volume, is stored in PEB page 1 (8K) or page 0 sub-page 1 (2K);

Ubi_vtbl_record, as the ubi_layout_volume_id data, leb0,leb1 each other, starting from the PEB page 2 storage.

Each PEB block has an OOB area, the OOB front several bytes are bad block mark (orange Mark), the trailing byte is ECC data (green mark), if there are extra bytes in the middle, then idle (yellow mark). The size of each segment depends on the page format, the number of ECC bits, and the type of bad block notation defined by each flash manufacturer.

UBI Attaching Subsystem

The core task of the attaching subsystem is to create and initialize the UBI device, whose core data is the Ubi_attch_info object, which controls the data model of the previous section, which involves creating ubi_ainf_volume objects; scanning all PEB EC Header and vid header, read the OOB area bad block tag, count the number of bad blocks, initialize ai->bad_peb_count; if attaching when the EC header of PEB is an invalid value, there will be an average EC value to initialize its EC header If found 2 peb have the same lnum, choose Seqnum Large Peb,seqnum small peb into the ai->erase linked list.

Verify the EC header, vid header, and data for each block, classify the error type, put a ai->erase table for correctable errors, put a Ai->corr or Ai->alien table for errors that cannot be corrected For blocks with no errors, put the Ai->free table. Please refer to the table below for specific classification rules.

Error type:

    • UBI_IO_FF: Full 0xFF;
    • Ubi_io_ff_bitflips: Full 0xFF, but there is a correctable ECC error;
    • Ubi_io_bad_hdr:ec or vid header corruption (e.g. magic number error or CRC error)
    • UBI_IO_BAD_HDR_EBADMSG: An EC or vid head damage caused by an uncorrectable ECC error
    • Ubi_io_bitflips: There are ECC errors that can be corrected;

PEB classification:

    • Free: normal block;
    • Erase: Erase block, need to erase;
    • Corr: Damage to the block, no longer participate in wear balance;
    • Alien: Abnormal block, no longer participate in wear balance;
    • Scrub: Scrub block, data moved to normal fast, and wipe it, confirm that there is no problem;
    • Torture: Torture block, data moved to the normal fast, and repeatedly read and write erase, confirmed that there is no problem;

UBI EBA Subsystem

The EBA subsystem mainly provides the following functions:

    • LEB/PEB Mapping Table Management: The upper layer only see LEB, no longer concerned about the block read and write error processing, replacement and other details;
    • Leb sequence counter management: SEQ counter is mainly to mark the order, solve the LEB/PEB mapping conflict;
    • Leb Access interface encapsulation: such as read, write, copy, check, unmap, atomic change, etc.;
    • Leb Access Protection: Each LEB concurrent access is protected by a read-write semaphore lock Rwsem;

The EBA offers 2 ways to write: Ubi_eba_write_leb and Ubi_eba_atomic_leb_change. The Ubi_eba_write_leb is used to write,ubi_eba_atomic_leb_change the block to modify or append the block. Ubi_eba_write_leb write will do read verification, if there is-eio error, the old PEB on the data moved to the new PEB, and the new data is also written to the new PEB, the old peb torture. Ubi_eba_atomic_leb_change in order to avoid destroying the existing data, using an offsite update to achieve the atomic write, and add a Ubi->alc_mutex for serialization protection, the specific process is as follows:

    1. Read LEB data (completed in ubifs)
    2. Check if the length of the write data is 0, 0 o'clock, Unmap leb
    3. Assigning initialization Vid_hdr
    4. Assigning a new PEB (UBI_WL_GET_PEB)
    5. Write Vid_hdr in New PEB
    6. New PEB Write old LEB data + new data
    7. Recycling of Old Peb (UBI_WL_PUT_PEB)
    8. Update Leb map (VOL->EBA_TBL)
UBI wear-leveling Subsystem

Wear balance is one of the core functions of UBI, which is responsible for the management of PEB distribution, recovery, erasure, scrub, wear balance and so on. Among them, scrub, erase, wear-leveling function by UBI background thread for asynchronous dispatch management.
UBI wear leveling is based on the peb of erase times, and adopts the static wear equalization strategy. For the static wear balance, based on the assumption that: the number of erasure (EC) Less peb than the number of erasure PEB stable, the EC large PEB data exchange to the EC small peb, to achieve wear balance.

PEB distribution, recycling, erasing, and scrub all trigger wear leveling checks. In order to avoid the frequent wear balance, further aggravating the wear situation, wear balanced trigger frequency through Ubi_wl_threshold control, Ubi_wl_threshold value should not be too small. However, there are some problems with this strategy, in order to avoid the extreme cases of repeated erasure of certain blocks, through the Wear Equalization _free_max_diff (2*ubi_wl_threshold) to control the selection of the worst free peb range.

According to the PEB classification of attaching, when the Wear equalization module is initialized, the erase block is initiated, the torture of the torture block is constructed, and the wear-leveling block is built with red-black tree, scrub red-black tree, etc. The allocation of the PEB is implemented through the UBI_WL_GET_PEB interface, which allocates free PEB with an average erase count. PEB Recycling is implemented through the UBI_WL_PUT_PEB interface, and the Erase_worker is erased after the recovery.

UBI IO Subsystem

The IO subsystem provides a unified reading and writing interface for the upper module, which mainly includes:

PEB Unified Package of read-write interface, including MTD Read/write package, parameter check, read/write check (read IO check, write verify check), by Ubi->dbg.chk_io control, default is not enabled.
Ubi Ec/vid HDR's read-write interface for unified encapsulation, including validation, support for non-aligned storage, and support for vid to be stored in sub-page.

UBI Fastmap Subsystem

Shorten the Ubi initialization (attach) time, make the attach time complexity is a constant, does not increase linearly with the number of PEB. (experimental feature, not available in the product, not studied)

 

Resources

Linux Kernel 3.14-RC6 Source code
Http://en.wikipedia.org/wiki/Wear_leveling
www.linux-mtd.infradead.org

--eof--

On the design of Linux UBI subsystem

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.