Introduction to Linux Kernel Engineering--User space device management

Last Update:2015-08-29 Source: Internet

Author: User

Tags character classes

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

User Space Device Management

All the devices that the user space sees are placed in the/dev directory (of course, just a directory, which can be changed), and the partition where the file system resides is also placed in that directory as a separate device. The previous version of the 2.4 has been DEVFS, this idea is very good, in the kernel to achieve dynamic management of disk devices. It can be done when the user accesses a device's device, the DEVFS driver will load the device driver. Even the device number of each node is dynamically obtained. But the author of the mechanism no longer maintains his code, and Linux members have been discussed using user-state udev instead of kernel-state DEVFS, so the DEVFS is now obsolete. The user-state Udev loads the device driver when the device is discovered and dynamically creates the node in the/dev directory. /dev is just a directory, not a DEVFS file system mounted.

Of course, this udev is only an application program and can be replaced with other programs, such as BusyBox to achieve the same work mdev do.

Device Change Notification client UDEVD

The system will detect the device when it starts up, and the change of the device should be able to be recognized after the system starts. In essence, the hardware found at startup is also a device change. This kind of device change is only the kernel know is meaningless, because the user of the device is the user space program, the kernel is only the manager, single management can not use, resources is meaningless existence. So how does the kernel notify the user program of changes to the device? This mechanism is called uevent.

The kernel notifies the user of the change in the device resources of the space program by sending the Uevent event to the user space, and the specific content of the change passed by the event is implemented by the parameter cache that accompanies the Uevent event. The program in which user space responds to this event is called UDEVD (or other name), but this kernel notification, the mechanism for user response is called Udev.

But this is just the current mechanism. Linux is a constantly evolving system, previously used in order to complete the same function also use the HotPlug program (the device changes to execute the program again, can re-enter multiple executions), as well as DEVFS (in advance in the/dev directory to create a bunch of node files, no dynamic), And UDEVD is a background service program, not as a hotplug to execute a copy of a program. UDEVD This ability to process messages eliminates reentrant problems, speeding up the efficiency of client response to kernel device changes.

Although the UDEVD is a user-side detection of kernel device changes. So what does it do for testing? This is what the Linux mechanism determines. The Linux client uses each device to be referenced in the/dev/directory, unless a higher-level package, such as Mount, does not require a reference to the device in the/dev file system. So, for client applications, the/dev/directory is the only way for them to deal directly with the kernel device. For example, the DRM device can directly access the video card, the TTY device can directly access the serial port, SR0 direct access to the CDROM,SDA can directly access the disk. The kernel confirms how each device is used, but what device is required to be specified by the user (for example, if you modify a file, the entire operation is driven in the kernel, but first you have to set the partition of the file to mount to your file system).

Therefore, the most important function of UDEVD is to create a device node under Dev.

But not all of the applications used to do this udevd, this is only a widely used implementation of the UDEV protocol software, there is a busybox used Mdev, but also can do the same function.

In addition, modern operating systems tend to integrate all services into a unified management, some of which are used to start the program entities in need of use, such as inetd, and some directly integrate the program itself into the hypervisor, such as the SYSTEMD program that is now being widely used. If you open your process, you will find that running in the background may not be the UDEVD program, it becomes the/lib/systemd/systemd-udevd–daemon service. This is the result of being SYSTEMD unified jurisdiction. Even in the INITRD is the direct use of this service, there is a trend of unified lakes and rivers.

So, what kind of communication does Linux use to communicate with client's service program? The answer is NetLink.

Other applications capture hot-spot events

Is it only udevd by listening to the NetLink event to get a change in kernel events? Definitely not. The kernel considers a variety of situations in its implementation, and you can even specify HotPlug programs as before. But the kernel implements the Kobject mechanism as well as this function, called Uevent_helper. In user space is/sys/kernel/uevent_helper. By writing a program path to this file, the Linux uevent will notify the program in passing.

The existence of this directory requires kernel support, config_uevent_helper=y in the kernel configuration

Config_uevent_helper_path= "" can control the mechanism.

Device type

There are 2 types of devices defined in the kernel: Character classes and block classes. The devices in the/dev directory do not necessarily correspond to specific hardware (such as zero, TTY), and some of the hardware may correspond to multiple nodes (such as SDA, SDA1). Most of the devices that play special functions are character devices, and because the devices are virtual, a new device subtype is created for the framework device.

The input device is a character device, and many input-related devices are managed using this input device. This means that the input virtual device is serving other input devices.

Disk-related generally is SDA, SDB, etc., where S is the SCSI device. In the past, there are often HD, FD and so on, FD means floppy disk, HD represents IDE hard disk. Because SATA and SCSI have been heavily combined, the software can already handle the same commands, so the SATA device is also an SD device for Linux that cares only about software. SR is CDROM and generally has a CDROM node file.

TTY is a serial port, generally will simulate a lot out, through the ctrl+alt+f1 ... F7 are called separately. But there is also a way to simulate a serial port is the pty of the graphical interface. In Ubuntu program open a terminal is a pty,p is the meaning of disguise.

Loop is a loopback device, which is a block device. It is not a device itself, mount a file to a directory, this file is considered a virtual disk, there is a partition structure. In the device is a loopback device.

User-oriented organization of kernel data structures kobject

The way the Linux kernel organizes data is to implement a struct (or a way of organizing data) that defines the structure for each element of the organization's way. Any other part that wants to use this data organization only needs to contain the corresponding structure, you can put yourself in a specific location of the data structure. For example, the implementation of the list data structure, there is a very important thing is kobject.

Kobject is also an example of a tree-shaped structure that is a pre-engineered structure for the kernel. Each kobject is a node of this tree, each kset is a non-leaf node of this tree, which contains kset or some kobject. By definition, likewise, a kset is a kobject. The Kset can be kset, or it can be the hierarchical data type Ksubsystem at the highest level defined by the organization. From the tree point of view, Ksubsystem and Kset no difference, but this data organization is defined on the basis of the tree, the top of the tree made an additional difference from the definition of Kset, named Ksubsystem.

Commands and tools for manipulating user space device nodes

Mknod

Disk Management MBR and GPT

See bootloader

Lvm

Overview

Linux and windows are often faced with a problem is that the partition of disk space is not effective for long-time use. Dynamic resizing has excellent tools under Windows, but it is time-consuming and requires a shutdown restart. This is not a thing for individual users, so it can be used for home use. But Linux is more than just a home, and businesses usually don't want to restart their computers when they need to expand their partitions, and they need to do it faster. This is when LVM was born. The LVM2 is now in development.

The main idea of LVM is not to partition the hardware sda1, SDB2 and other physical partitions, but to allow the organization of multiple physical disks to a partition. Make a number of disks into a volume group (volumegroup), and then randomly divide the logical volume Group (logicalvolumes) on the volume group.

A volume group is called the VG (Volume group), the physical disk is called PV (Physicalvolume), and the logical volume Group is called the LV (Logical Volume). In a PV does not like the previously divided into a file system, but divided into the same size of the storage unit called PE (physical extents), the default size of a PE is 4MB. PE is the smallest unit that LVM can address. The logical volume LV is also divided into a basic unit that can be addressed, called Le. In the same volume group, the size of Le is the same as the PE, and one by one corresponds.

and non-LVM systems The metadata that contains the partition information is saved in the partition table at the beginning of the partition, and the metadata related to the logical volume and volume group is also stored in the Vgda (Volume Group descriptor area) at the beginning of the physical volume. Vgda includes the following: PV descriptors, VG descriptors, LV descriptors, and some PE descriptors.

When the system initiates LVM, the VG is activated and the Vgda is loaded into memory to identify the actual physical storage location of the LV. When the system does an I/O operation, the actual physical location is accessed based on the mapping mechanism established by VGDA.

LVM is not an alternative to file systems, but a partitioning method below the file system. After the LVM is created, the file-system format can be used on the LV as well as the traditional partition.

Advantages and Disadvantages

The biggest advantage of LVM is the ability to snapshot, which is unthinkable in traditional disks and is only easy to implement in VMware's virtualization mechanism. Snapshots are also used for write-time copies. The newly written and modified PE is recorded through a write-time copy table, and the original PE is not modified, but the new one is used, so that it can be traced back to the snapshot.

Another advantage is scalability. The size of the partition can be adjusted without downtime. This is useful for RAID systems.

It is also the design of LVM that causes its drawbacks and is obvious. Is that when the physical PE of a partition is discontinuous, it can cause great performance loss.

In addition to LVM, there are other similar mechanisms, such as EVMS, Dmraid. But LVM is widely used.

Working style

This mechanism cannot be done only in user space and requires code assistance in kernel space, which is Device-mapper (Dm_mod module). The Dm_mod module completes the conversion of the IO request, which is essentially a mapping as in memory management. The code is located in DRIVER/MD. This part of the code is a vivid embodiment of the separation of policy and mechanism, the policy is specified by the client, and the mechanism is provided by the kernel.

This module models 3 entities: mappeddevice, mapping table, and target device. A mapping table is used to record mappings between two devices. One is mapped device, which is provided by the kernel and is logically present. One is target device, and all operations on the logical device will eventually be transformed to the operation of target device, which is physically present.

We know that the core data request for the generic block layer is bio, and bio cannot span multiple physical devices. Thus, in the case of mapping, a bio is split into multiple clones and sent to each target device.

Introduction to Linux Kernel Engineering--User space device management

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More