3. Device Files
Unix-like systems are based on the concept of files. They can process I/O devices as special files such as device files, the same system call used to interact with common files on the disk can be directly used for I/O devices.
According to the basic features of the device driver, the device files can be divided into the following types:
1. Block device data can be randomly accessed. from the user's point of view, the time required to transmit any data block is relatively small and roughly the same.
2. Data of character devices cannot be accessed randomly or can be accessed randomly. However, the time required to access random data depends largely on the location of the data in the device (for example, tape drive)
3. No corresponding device file exists for network devices.
Device Files are actual files stored in the file system. However, its index nodes do not contain pointers to data blocks (file data) on the disk because they are empty. Instead, the index node must contain an identifier of the hardware device, which corresponds to characters or block device files.
Traditionally, device identifiers are composed of the type (character or block) and a pair of parameters of the device file. The first parameter is called the main device number (major number), which identifies the device type. Generally, all device files with the same master device number and type share the same file operation set because they are processed by the same device driver. The second parameter is the sub-device number (minor number), which identifies a specific device in the device group with the same master device number.
The mknod () system call is used to create a device file. The parameters include the device file name, device type, master device name, and sub-device number. The device file is usually included in the/dev directory.
Device Files usually correspond to hardware devices (such as hard disk/dev/hda), or a physical or logical partition of hardware devices (such as disk partition/dev/hda2. In some cases, the device file does not correspond to any actual hardware, but represents a virtual logical device. For example,/dev/null is a device file corresponding to a "black hole.
The file name of the device is irrelevant to the kernel. The kernel only cares about the device type, master device number, and sub-device number. Applications often use device file names.
3.1 User State Processing of Device Files
In traditional UNIX systems (and earlier Linux versions), the master and secondary device numbers of device files are only 8-bit long, and are not enough in high-end systems, for example, a large cluster system requires a large number of SCSI disks with 15 partitions on each SCSI disk.
The real problem is that the device file is allocated once and always stored in the/dev directory: Therefore, each logical device in the system should have a device file that corresponds to it and clearly defines the device number. The documentation/devices.txt file stores the officially registered allocated device numbers and/dev directory nodes. The include/Linux/major. h file may also contain macros corresponding to the device's master device number. However, due to the astonishing number of hardware devices, the officially registered device number is competent for general Linux systems, but it cannot be well suited to large-scale systems, such as the above large-scale storage cluster system, if you have more than 16 SCSI disks, you must change the standard allocation of the original primary and secondary device numbers. You need to change the kernel source code and make the system difficult to maintain.
To solve the above problems, linux2.6 has added the device number encoding size: currently, the main device number is encoded as 12 bits, And the next device number is encoded as 20 bits. These two parameters are usually merged into a 32-bit dev_t variable. Macros used include major (), minor (), and mkdev ().
L in the kernel, the bit range is 0-19 for the slave device number, and 20-31 for the master device number
L when dev_t needs to be expressed in the external space (user space), 0-7 is used as the first part of the slave device number, 8-19 as the master device number, and the last 20-31 is used as the remaining part of the slave device number.
L if you insist on using a function for conversion between dev_t and external representation in the Code, the Code does not need to be changed even if the internal data type changes again in the future.
Common macros are:
<Kdev. h>
The official registry cannot statically allocate these additional available device numbers, which can only be used when handling special requirements for device numbers. In fact, for assigning device numbers and creating device files, it is more likely to process device files in a highly dynamic manner.
3.2 dynamically allocate device numbers
Each device driver specifies the range of device numbers to be processed during the registration phase. The driver can specify only the range of device numbers, without specifying exact values. In this case, the kernel allocates a suitable device number range to the driver. Therefore, the new hardware device drivers do not need to assign a device number from the official registry; they can only use idle device numbers in the current system.
However, in this case, you cannot create a device file permanently. It is created only when the device driver initializes a primary device number and a secondary device number. Therefore, a standard method is required to output the device numbers used by each driver to the user-state application. Therefore, the device driver model provides a very good solution: store the master and secondary device numbers in the dev attribute in the/sys/class subdirectory.
3.3 dynamically create Device Files
The Linux kernel can dynamically create a device file: it does not need to fill in the/dev directory for every device file that may come up with a hardware device, because the device file can be created as needed. Because of the existence of the device driver model, a simple method is provided in the linux2.6 kernel: a set of user-state programs in the udev toolset must be installed in the system. When the system starts, the/dev directory is cleared. Then, the udev program scans the/sys/class subdirectory to find the dev file. For each such file (the combination of the primary device number and secondary device number represents a logical device file supported by the kernel ), the udev program creates a device file for it in the/dev directory. The udev program also assigns a file name based on the configuration file and creates a symbolic link. This method is similar to the traditional naming mode of UNIX device files. Finally, the/dev directory only stores the device files of all devices supported by the kernel in the system, without any other files.
Device Files are usually created after system initialization. It either occurs when a device driver is loaded or when a hot-swappable device is added to the system. The udev tool set can automatically create corresponding device files because the device driver model supports hot swapping of devices. When a new device is found, the kernel generates a new process to execute the user-state shell script file/sbin/hotplug, and pass the useful information on the new device as an environment variable to the shell script. the user-mode script file reads the configuration file information and pays attention to any operations necessary to complete the new device initialization. If the udev tool set is installed, the script file will also create an appropriate device file in the/dev directory.
3.4. VFS processing of Device Files
Although device files are also in the system directory tree, they are fundamentally different from common files and directory files. When a process accesses a common file, it accesses some data blocks in the disk partition through the file system. When a process accesses a device file, it only needs to drive the hardware device.
To do this, VFS changes the default file operations when the device file is opened. Therefore, every system call to the device file can be converted into a function call related to the device, instead of calling the corresponding functions of the main file system. Device-related functions perform operations on hardware devices to complete the operations requested by the process. (Note: In the path name search, the symbolic link to the device file serves the same purpose as the device file ).
Assume that open () is a device file. In essence, the corresponding service routine is parsed to the path name of the device file, and the corresponding index Node object, directory item object, and file object are created. Use an appropriate file system function (usually ext2_read_inode () or ext3_read_inode () to read the corresponding index node on the disk to initialize the index Node object. When this function is used to determine the disk index node and the device file, it calls init_special_inode (). This function initializes the I _rdev field of the index Node object to the master and secondary device numbers of the device file, set the I _fop field of the index Node object to the address of the operation table in the def_blk_fops or def_chr_fops file. Therefore, the service routine called by the open () system also calls the dentry_open () function. The latter allocates a new file object and sets its f_op field to the address stored in I _fop, that is, the address that points to def_blk_fops or def_chr_fops again. The introduction of these two tables enables any system call sent on the device file to activate the function of the device driver rather than the function of the basic file system.
4. Device Drivers
A device driver is a collection of kernel routines. It enables hardware devices to respond to programming interfaces of control devices, this interface is a set of standard VFS function sets (open, read, lseek, ioctl, and so on ). The actual implementation of these functions is solely the responsibility of the device driver. Each device has a unique I/O controller, so it has unique commands and status information. Therefore, most I/O devices have their own drivers.
Before using the device driver, there are several activities that must happen.
4.1 register the device driver
Each system call sent from the device file is converted from the kernel to the function call of the corresponding device driver. To complete this operation, the device driver must register itself, assign a device_driver descriptor, insert it into the data structure of the device driver model, and connect it with the corresponding device file (possibly multiple device files. If the driver corresponding to the device file has not been registered before, access to the device file will return the error code-enodev.
If the device driver is statically compiled into the kernel, its registration is performed in the kernel initialization phase. On the contrary, if it is compiled as a module, its registration is performed during module loading. In the latter case, the device driver can also
Log out when the module is detached.
For example, for a PCI device, the driver must allocate a pci_driver type descriptor. The PCI kernel layer uses this descriptor to process the device. After initializing some fields of this descriptor, the device driver calls pci_register_driver ().
In fact, the pci_driver descriptor includes an embedded device_driver descriptor. pci_register_driver () only initializes the fields of the embedded driver descriptor, and then calls driver_register () insert the driver into the data structure of the device driver model.
When registering a device driver, the kernel looks for hardware devices that may be processed by the driver but are not yet supported. To achieve this, the kernel mainly relies on the matching method of the relevant bus type descriptor bus_type and the probe () method of the device_driver object. If a hardware device that can be processed by the driver is detected, the device_register () function is called to insert the device into the device driver model.
4.2 initialize the device driver
Registering and initializing device drivers are two different things. The device driver should be registered as soon as possible so that the user-mode application can use it through the corresponding device file. On the contrary, the device driver is initialized at the last possible time. In fact, initializing drivers means allocating valuable system resources, which are therefore unavailable to other drivers.
To ensure that resources are available when needed and no longer requested after they are obtained, the device driver usually adopts the following mode:
1. The reference counter records the number of processes currently accessing the device file. The counter is increased in the open method of the device file and reduced in the release method (more specifically, the reference counter records the number of file objects that reference the device file, because sub-processes may share file objects ).
2. Check the open () method before adding the reference counter value. If the counter is 0, the device driver must allocate resources and activate the interruption and DMA on the hardware device.
3. Check the value of the counter after the release method is used. If the counter is 0, no process has used the hardware device. In this case, the method will disable the interrupt and DMA on the I/O controller and then release the allocated resources.
4.3 monitor I/O operations
The duration of I/O operations is generally unpredictable. This may be related to the mechanical device (such as the current position of the head when the data block is transferred) and the actual random event (when the data packet arrives at the NIC ), it is also related to human factors (printer paper jam ). In any case, the device driver that starts the I/O operation must rely on a monitoring technology to signal when the I/O operation ends or times out.
When the operation is terminated, the device driver reads the content of the I/O interface Status Register to determine whether the I/O operation is successful. In case of timeout, the driver knows that something is wrong, because the maximum time interval allowed to complete the operation has been used up, but nothing has been done.
Two available technologies for monitoring the end of I/O operations: polling mode and interrupt mode ).
4.3.1. Round Robin Mode
The CPU polls the Status Register of the device until the value of the Register indicates that the I/O operation has been completed. I/O polling technology is clever, because the driver must also remember the possible timeout of the check. Record timeout method:
1. Count counter
2. Read the jiffies value of the cycle counter during each loop and compare it with the original value read before the loop starts.
3. If I/O operations take a relatively long time, such as milliseconds, the above method is inefficient because the CPU spends valuable machine cycles waiting for the completion of I/O operations. In this case, after each round-robin operation, you can insert the schedule () call into the loop to voluntarily discard the CPU.
4.3.2. interrupt mode
If the I/O controller can send an I/O operation end signal through the IRQ line, the interrupt mode can be used. Example:
When a user sends a read () system call to a device file corresponding to a character, an input command is sent to the control register of the device. After an unpredictable long interval, the device puts a byte of data into the input register. The device driver then returns this Byte as the result of the read () System Call.
Essentially, the driver contains two functions:
1. Implement the foo_read () function of the read method of the object;
2. foo_interrupt () function for handling interruptions;
As long as the user reads the device file, the foo_read () function is triggered:
The device driver depends on a custom descriptor of Type foo_dev_t; it contains semaphores SEM (protecting hardware devices from concurrent access), waits for queue wait, and marks intr (set when a device is interrupted) and a single byte buffer data (written by the interrupt handler and read by the read method ). Generally, all I/o drivers that use interruptions depend on the data structure accessed by the interrupt handler and the Read and Write methods. The foo_dev_t descriptor address is usually stored in the private_data field of the file object of the device file or in a global variable.
The main operations of the foo_read () function are as follows:
1. Obtain the foo_dev-> SEM semaphore, so make sure that no other process accesses the device;
2. Clear the intr mark;
3. Send a READ command to the I/O device;
4. Run wait_event_interruptible to suspend the process until the intr flag changes to 1.
After a certain period of time, the device sends an interruption signal to notify that the I/O operation has been completed, and the data has been placed on the appropriate dev_foo_data_port data port. Interrupt the processing program to set the intr flag and wake up the process.
When the scheduler decides to re-execute the process, the second part of foo_read () is executed as follows:
1. Copy the characters in the foo_dev-> data variable to the user address space;
2. Release foo_dev-> SEM semaphore
The actual device driver uses the time-out control. Generally, the time-out control is implemented through a static or dynamic timer. The timer must be set to the correct time after the I/O operation is started, and delete it at the end of the operation.
Function foo_interrupt ():
Note: none of the three parameters are used by the interrupt handler, which is quite common.
4.4 Access to I/O shared storage
According to the device and bus types, the I/O shared storage in the PC architecture can be mapped to different physical address ranges. Mainly include:
For most devices connected to the ISA bus
I/O shared memory is usually mapped to 0xa0000 ~ 0xfffff's 16-bit physical address range; this leaves a space between 640kb and 1 MB. That is, "holes" in the physical memory Layout"
For devices connected to the PCI bus
I/O shared memory is mapped to a 32-bit physical address range close to 4 GB.
Intel introduced the graphic accelerator port (AGP) standard, which is applicable to PCI enhancements for high-performance graphics cards. In addition to its own I/O shared storage, this card can also directly address the ram part of the Main Board through the special hardware circuit of the graphic address re-image table (Gart. The Gart circuit provides a higher data transmission rate than the old PCIe card. However, from the perspective of the kernel, there is no relation between the physical memory and its location. The Gart ing memory is exactly the same as that of other types of I/O shared memory.
How does a device driver access an I/O shared storage unit?
Starting with a simple PC architecture, do not forget that the kernel program acts on linear addresses. Therefore, the I/O shared storage unit must represent an address larger than page_offset, assume that page_offset is 0xc0000000, that is, the Kernel linear address is 4th GB.
The device driver must convert the physical address of the I/O shared storage unit to the linear address of the kernel space. In the PC architecture, the 32-bit physical address and 0xc0000000 constant can be simply calculated or obtained. For example, the kernel stores the values of I/O units whose physical address is 0x000b0fe4 in T1 and the values of I/O units whose physical address is 0xfc000000 in T2:
During the initialization phase, the kernel has mapped available Ram physical addresses to the starting part of a linear address space of 4th GB.Therefore, the paging unit maps the linear address 0xc00b0fe4 in the first statement to the original I/O physical address 0x000b0fe4, which falls in the "Isa hole" from 640kb to 1mb ". This is a good job.
But for the second statement, there is a problem because its I/o physical address exceeds the maximum physical address of the system Ram. Therefore, the linear address 0xfc000000 does not need to correspond to the physical address 0xfc000000. In this case, to include a linear address mapped to this I/O physical address in the kernel page table, you must modify the page table. This can be achieved by calling the ioremap () or ioremap_nocache () function. The first function is similar to the vmalloc () function. Call get_vm_area () to create a new vm_struct descriptor for the size of the requested I/O shared storage. Then, the two functions properly update the corresponding page table items in the General kernel page table. Ioremap_nocache () is different from ioremap () because the former invalidates the hardware high-speed cache content when referencing the re ing linear address appropriately.
Therefore, the correct form of the second statement should be:
The first statement establishes a 2 MB new linear address range, which maps the physical address starting from 0xfb000000. The second statement reads memory units at the address 0xfc000000. To cancel the ing later, the iounmap () function must be used.
In other architectures, linear addresses that reference physical memory units indirectly cannot access I/O shared memory correctly. Therefore, Linux defines the following Architecture-dependent functions to use when accessing I/O shared storage:
Readb (), readw (), readl ()
Read 1, 2, or 4 bytes from an I/O shared storage unit.
Writeb (), writew (), writel ()
Write 1, 2, or 4 bytes to an I/O shared storage unit.
Memcpy_fromio (), memcpy_toio ()
Copy a data block from an I/O shared memory unit to the dynamic memory. The other function is opposite.
Memset_io ()
Fill an I/O shared storage area with a fixed value
Therefore, the following method is recommended for accessing the 0xfc000000i/o unit:
Io_mem = ioremap (0xfb000000, 0x200000 );
T2 = readb (io_mem + 0x100000 );
It is precisely because of these functions that we can hide the differences in methods used to access I/O shared storage from different platforms.