Knowledge about linux storage and linux

Last Update:2018-03-01 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Knowledge about linux system storage. Linux means many things. Its strength lies in its ability to flexibly support completely different usage modes. However, one of the most important advantages of Linux is that it is the main force in the storage field. For Linux and storage, we often think of directly attaching storage or the latest file system, but there are much more things about storage and Linux than we can see. The elements in Linux are not only stable but also high-end.

This article discusses various storage technologies that make Linux in the storage field center. Let's start from the bottom-that is, the storage Architecture-and then gradually develop the stack into features, file systems, and plans to be executed (see figure 1 ).

Figure 1. Storage stack architecture explored in this article

Attaching storage to the platform is the key to the overall Storage Architecture. The three common architectures cover the vast majority of models:

Direct-attached storage (DAS) Storage area Network (SAN) attached storage (Network-attached storage, NAS)

Of course, Linux supports all three models and develops through changes that occur along with these models.

Figure 2 illustrates these models, focusing on file systems and storage locations. The DAS model contains direct storage attachments on the platform, representing the vast majority of storage usage. SAN isolates the storage from the platform and makes it accessible through one of the block storage protocols. Finally, NAS provides a similar architecture as SAN, but performs operations at the file level.

Figure 2. Storage directly attached to the main storage Architecture

Linux supports a wide range of DAS interfaces, including the old standard for Advanced Technology Attachment (ATA)-electronic Integrated Drive [IDE] (Integrated Drive Electronics [IDE]) /ATA-parallel SCSI and Fiber Channel and new storage interfaces, such as serially connected SCSI (Serial Attached SCSI, SAS), Serial ATA (Serial ATA, SATA) and external SATA (external SATA, eSATA ). You will also find advanced storage technologies such as USB3 (Scalable Host Controller Interface, Extensible Host Controller Interface [xHCI]) and Firewire (Institute of Electrical and Electronics Engineers 1394 ).

Storage area network

SAN provides block-level storage consolidation to share it with some servers. The storage is locally displayed to the server, and the endpoint storage device can provide additional services (such as backup and replication) for the client device ).

SAN protocols and interfaces are extensive and diverse. You can find typical SAN protocols in Linux, such as fiber channel and iFCP ). There are also newer protocols, such as SAS, fiber Channel over Ethernet (FCoE), and Internet SCSI (iSCSI, for example, it is applicable to remote Direct Memory Access (RDMA-iSER) and scsi rdma Protocol (SRP). It extends SCSI through RDMA of Infiniband.

Ethernet, as a storage protocol, has been fully implemented in Linux. It illustrates the strength and flexibility of these methods. In addition, Linux fully supports 10 Gbit/s Ethernet (10-gigabit Ethernet, 10GbE) and allows the construction of high-performance SAN. You can also find the ATA over Ethernet (ATAoE) protocol, which extends the ATA Protocol through the ubiquitous Ethernet protocol.

Network attached storage

Last but not least, NAS. NAS is a combination of network storage for different types of clients to access files. The two most popular protocols fully supported in Linux are Network File System (NFS) and Server Message Block/universal Internet File System (Server Message Block/Common Internet File System, SMB/CIFS ).

Although the original SMB implementation is proprietary, it is designed in reverse order to be supported in Linux. Later SMB revisions were publicly recorded to allow simpler development in Linux.

Linux continues to develop various enhancements and extensions for NFS. NFS is now a state protocol that optimizes data and metadata separation and parallel data access. For more information about NFS development, see the link in references. Like an Ethernet-based SAN, The 10GbE support in Linux allows high-performance NAS libraries.

Other storage Architectures

Not all storage architectures are ideal for DAS, SAN, and NAS storage. Because Linux is open, it is easier to develop new technologies internally, Which is why you can find the latest cutting-edge technologies in Linux.

An object storage architecture is worth mentioning. Although it is not a new feature, it is very interesting. The object storage architecture separates files from their metadata and stores them independently (on their respective data and metadata servers ). This separation provides some advantages, such as minimizing the bottleneck of metadata (because you only need to locate and open files to interact with this server ). You can also perform parallel access to segment data on multiple data servers to improve performance. Object Storage is implemented in Linux in various ways, including support for Object Storage Device (OSD) specifications and support for Linux clUSTER (Lustre) and Extended Object File System (exofs.

NameContent addressing storage(Content-addressable storage, CAS), which uses the data hash value to identify its name and address. This technology is also calledFixed content storage(Fixed-content storage, FCS) is very useful because it is easy to identify duplicate data: This hashed column (if strong enough) it will be the same and allow simple deduplication. The Venti architecture supports this method in Linux (except for the Plan 9 version of Bell Labs ).

Storage Service: logical volume Management

Although storage virtualization was once a unique feature of high-end storage systems, it is now a standard feature of Linux. One of the most important services available in Linux is the Logical Volume Manager (LVM ). LVM is a thin layer that is located on the physical storage available in the basic storage Architecture (with a user space tool) and extracts the storage into one or more simpler logical volumes. For example, if the size of a physical disk cannot be adjusted to an hour, you can adjust the logical volume size to add or delete space from it.

By extracting physical devices from logical devices, LVM creates some other storage functions, for example, read-only and read/write snapshots of volumes, data segments across volumes to improve performance (Redundant Arrays of Independent Disks [RAID]-0), and cross-volume (RAID-1) data image and volume migration between physical devices (or even online ).

For data protection outside the image, Linux includesMd (its representativeMultiple DisksAnd provides a variety of RAID functions. This element implements the software RAID function and supports RAID-4 (blocks are verified) and RAID-5 (blocks are verified in a distributed manner) RAID-6 (Block Segmentation Data verified through distributed and Dual Redundancy) and RAID-10 (segmented and mirrored data ).

The LVM depends on anotherDevice erIt provides the multi-path function (among other functions. For example, in a SAN Environment, there are usually multiple storage interfaces constructed in SAN. Multi-path is a function that provides protection to prevent the failure of a specified path. It ensures that storage can still be used as long as there is a path to communicate with the endpoint.

Storage Functions

In the past few years, two relatively simple functions have been added to the item storage stack, demonstrating the development of the storage ecosystem:

Data integrity supports solid state disks (solid-state disks, SSDs) Data Integrity

The first change is handled using a commercial drive in the enterprise storage settings. Although enterprise-level drives (such as SAS drives) are reliable, SATA drives are created based on different requirements and costs. For this reason, the SATA drive may be calledNo data corruption promptThat is, when reading data from the disk, errors may be introduced and cannot be detected. To solve this problem in enterprise settings and support SATA drives, you need to add the data integrity code to the block on the disk (where the disk uses 520-byte sectors, rather than the traditional 512-byte block ). In addition, the drive itself can verify the data being written so that its integrity code matches the data. In this way, these errors can be captured when they are written to the disk, rather than being detected when they cannot be operated in the future.

This mechanism is calledData Integrity field(Data Integrity Field, DIF), 3, which represents a Data block that includes Cyclic Redundancy Check (Cyclic Redundancy Check, CRC) the end of the 8-byte, a reference tag (usually part of the Logical Block address (Logical Block Addressing [LBA]), and an application-defined application tag. The reference tag is useful for capturing incorrect Block Error writes, where the application tag can be used to capture other errors in the software stack. For example, if a PDF document is written, the application tag can be set to indicate the value of a special PDF tag. When reading a PDF file, you can check the application tag of each block to ensure that all tags specify the PDF tag. Linux supports DIF since kernel version 2.6.27.

Figure 3. DIF structure applicable to 512-byte sectors Added support for SSD

The introduction of SSD is changing the storage ecosystem in some ways. These disks delete some of the large latencies in the rotating disk, so they provide a way to maintain the data flow between CPU and CPU. However, SSD is different from Hard Disk Drive (HDD) because they can be consumed. The number of writing times of SSD internal storage is limited (depending on the technology). Therefore, it is very important to be as effective as possible when writing data. Worse, SSD must switch data internally to minimizeGarbage CollectionOrConsumption balancing. This process will cause the data to be written to consumable storage, so we should try to reduce it as much as possible.

Another problem with SSD and traditional storage is that HDD does not care whether the data on the disk is valid or not. If the file system invalidates the data, the data can be stored on the disk without any disadvantage. This limit cannot be used with SSD due to a balanced consumption. For this reason, Linux now supports the ability of the file system to pass discarded blocks to SSD (starting from kernel version 2.6.29 ). This feature allows SSD to remove these blocks from the consumption balancing process and helps increase drive endurance.

File System

Linux is really separated from other operating systems by its large file system library. In Linux, you can find traditional client file systems like third extended file system (ext3) and fourth extended file system (ext4, however, you will also find Advanced Distributed File Systems, cluster file systems, and parallel file systems. You can find new and high-end file systems based on new ideas and handle new problems in the storage domain.

Today, Linux supports ZFS and Butter FS (BTRFS) in terms of cutting-edge file systems ). These two file systems compete with each other and share the differences between the write and copy semantics (these blocks have never been written in place ). In addition, both file systems support data deduplication, internal data protection (RAID-like protection), data and metadata checksum, and other storage functions (such as snapshots ).

Linux is also the origin of distributed file systems. In one example, Lustre is a large-scale parallel distributed file system that supports thousands of nodes and expands to a gigabit storage capacity. Ceph provides similar functions and was introduced to the Linux kernel last year. Other examples in Linux include GlusterFS and General Parallel File System (GPFS ).

You can also find specific File systems in Linux, including New Implementation Log Structure File System (NiLFS (2 )) such a log structure file system and an object-based file system like exofs. Because Linux can discover itself in many usage modes, you will also find resource constraints (such as embedded systems) and low-latency applications (such as high-performance computing, (HPC. File Systems in the embedded field include Yet Another Flash File System version 2nd (YAFFS2), Journaling Flash File System version 2nd (JFFS2), and Unsorted Block Image File System, UBIFS ). File Systems in HPC space include Parallel NFS (Parallel NFS, pNFS), Lustre, and GPFS.

Future of Linux Storage

Because of its openness and a large number of developers, Linux is and will continue to be the goal of file systems and general storage research.

One of the latest changes in storage is the use of remote services to economically and effectively store archived data. Today we all knowCloud storageMany vendors provide efficient and transparent remote access with centralized storage with different service level protocols (covering functions like protection and broadband. Two examples are Ubuntu One and Dropbo. Another service is calledSpiderOakWhich can be used to back up your local user directory to the cloud at a very low cost.

What other functions may appear in Linux? It may be support for large sectors (more than 512 bytes), streamlined configuration to avoid retaining but not using the capacity (the published storage exceeds the physical capacity), storage deduplication (to maximize storage availability) and more effective storage stacks to take advantage of the new speed and efficiency of the drive (such as SSD? Regardless of the development of the storage ecosystem, Linux will always bear the brunt.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Knowledge about linux storage and linux

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support