Hardware foundation of Linux driver Design (i.)
This chapter summarizes the hardware basis for learning Linux device programming.
First, the processor
1.1 General Purpose processors
General-purpose Processors (GPP) do not optimize architecture and instruction sets for specific areas of application, they have generalized general-purpose architectures and instruction sets to support complex operations and facilitate the addition of new development features. In general, a generic processor core is included in the embedded microcontroller (MCU) and microprocessor (MPU).
MPU typically represents a CPU (central processing unit), while the MCU emphasizes the integration of the CPU, memory and peripheral circuitry into a single chip. Embedded microcontrollers are generally integrated by a CPU core and multiple peripheral circuits, and the current mainstream CPU cores are as follows. 51 Single chip microcomputer is a typical integrated peripheral circuit of the MCU structure.
The system architecture of the CPU can be divided into two categories, one for von · Neumann structure, a class for the Harvard structure. The von Neumann structure, also known as the Princeton structure, is a memory structure that merges program instruction memory and data memory together. The program instruction store address and data storage address point to different physical locations of the same memory, so the program instruction and data are the same width. The Harvard structure stores program directives and data separately, and instructions and data can have different data widths. In addition, the Harvard structure uses separate program lines and data lines, respectively, as a dedicated communication path between the CPU and each memory, with high execution efficiency.
From the instruction set point of view, the central processing unit can also be divided into two categories, namely RISC (thin instruction set computer) and CISC (complex instruction set computer). CSIC emphasizes the ability to enhance the instruction, reduce the number of target code, but the instruction is complex, the instruction period is long, while RISC emphasizes minimizing instruction set, instruction single cycle execution, but the target code will be larger. The CPU cores such as ARM, MIPS, PowerPC and so on have adopted RISC instruction set. At present, the convergence of RISC and CSIC is very obvious.
1.2 Digital Signal Processor
Digital signal Processor (DSP) is designed for algorithms in the fields of communication, image, speech and video processing. It contains a separate hardware multiplier. The multiplication instruction of DSP is usually completed in single cycle, and the large number of repeated multiplication in the algorithm such as convolution, digital filter, FFT (Fast Fourier transform), correlation and matrix operation are optimized. The DSP generally employs an improved Harvard architecture, which has a separate address bus and a data bus, and two buses are shared between the program memory and the datastore. DSP is divided into two types, one is fixed-point DSP, and the other is floating-point DSP. Floating-point DSP floating point arithmetic is implemented by hardware, which can be completed in single cycle, so its floating-point arithmetic processing speed is higher than that of fixed-point DSP. Fixed-point DSP can only simulate floating-point arithmetic with fixed-point arithmetic.
General-purpose processors and digital signal processors are also converging to complement each other's trends, such as the Digital Signal Controller (DSC) that is the Mcu+dsp,blackfin series belongs to DSC. At present, chip manufacturers have introduced many ARM+DSP dual-core processors.
Network processor is a kind of programmable device, it is applied to various tasks in telecom field, such as protocol analysis, routing lookup, sound/data aggregation, firewall, etc. A network processor device is typically composed of several microcode processors and several hardware coprocessors, and multiple microcode processors are processed in parallel within the network processor, with pre-programmed microcode to control the processing flow. For some complex standard operations (such as memory operation, routing table lookup algorithm, QoS Congestion control algorithm, traffic scheduling algorithm, etc.), the hardware coprocessor is used to further improve the processing performance, thus realizing the organic combination of business agility and high performance.
Using Asics (ASIC) is often a low-cost and high-performance solution. Asics are designed specifically for specific applications and do not have or require flexible programming capabilities. Using Asics to accomplish the same functionality is often cheaper and more efficient than using CPU resources directly or CPLD (complex programmable logic devices)/FPGA (field programmable gate array).
In the actual project hardware solution, often according to the application demand Choice general processor, the digital signal processor, the specific domain processor, the CPLD/FPGA or the ASIC one's solution, in the complex system, these chips may exist simultaneously, cooperates, each play own merit. In a smartphone, the MCU can be used to process the graphical user interface and the user's key input and run the multitasking operating system, using DSP for audio and video codec, and in the RF aspect of the use of ASIC.
Second, memory
Memory can be divided into read-only storage (ROM), Flash (Flash), Random access memory (RAM), optical media memory and magnetic media memory.
2.1 Rom can also be subdivided into non-programmable ROM, programmable rom (PROM), erasable programmable ROM (EPROM) and electrically erasable programmable ROM (e2prom), E2prom can be completely used software to erase, has been very convenient.
Nor Flash and CPU interfaces belong to the typical class SRAM interface and do not require additional control circuitry. Nor Flash is characterized by in-chip execution (Xip,execute in place), and the program can run directly within NOR. The interface of the Nandflash and CPU must be converted by the corresponding control circuit, and the NAND Flash interface can also be generated by the address line or GPIO. NAND Flash is accessed in block mode and does not support in-chip execution.
The public Flash interface (Common flash Interface, or CFI) is an open, standard interface for reading data from NOR Flash devices. It enables the system software to query the various parameters of the installed Flash devices, including the device array structure parameters, electrical and time parameters, and the functions supported by the device. With CFI, new and improved products can be used to replace older versions of the product without modifying the system software.
A Nandflah interface mainly contains the following information:
- I/O line: addresses, directives, and data are transmitted through this set of lines, typically 8-bit or 16-bit.
- chip start (Chip enable,ce#): If no ce# signal is detected, the NAND device remains in standby mode and does not respond to any control signals.
- write enable (write enable,we#): we# is responsible for writing data, addresses, or instructions to NAND.
- read enable (read enable,re#): re# allows data output.
- instruction latch enable (command Latch enable,cle): When CLE is high, on the rising edge of the we# signal, the instruction will be latched to the NAND instruction Register.
- address latch enable (addresses Latch Enable,ale): When the ALE is high, on the rising edge of the we# signal, the address is latched to the NAND The address register.
- ready/Busy (ready/busy,r/b#): If the NAND device is busy, the r/b# signal will be lowered. The signal is open-drain and requires a pull-up resistor.
Due to the intrinsic characteristics of Flash, in the process of reading and writing data will occasionally produce 1 or several data errors, the accession reversal, NAND flash bit reversal probability is much larger than NOR Flash. Bit reversal is unavoidable, so the error detection/error correction (EDC/ECC) algorithm should be used while using NAND Flash. Flash programming principle is only to write 1 to 0, and cannot write 0 to 1. So before Flash programming, the corresponding block must be erased, and the erase process is to write all the bits as 1 of the process, all the bytes in the block into 0xFF.
The various ROM, Flash and magnetic media memory described above are non-volatile memory (NVM), the power-down information is not lost, and RAM is the opposite.
2.2RAM can also be divided into static RAM (SRAM) and dynamic RAM (DRAM)
Ram can also be divided into static RAM (SRAM) and dynamic RAM (DRAM). DRAM is stored in the form of an electric charge, and the data is stored in a capacitor. The DRAM device needs to be refreshed periodically due to the loss of charge due to leakage of the capacitor. SRAM is static, as long as the power supply it will maintain a value, SRAM no refresh cycle. Each SRAM storage element consists of 6 transistors, and the DRAM storage unit consists of a transistor and a capacitor.
Commonly referred to as SDRAM, DDR SDRAM belongs to the category of DRAM, they use the CPU external memory controller synchronization clock work (note, not the CPU operating frequency). Compared to SDRAM, the DDR SDRAM uses both the rising and falling edges of the clock pulses to transmit data, thus doubling the frequency of data transmission in the case of constant clock frequency. In addition, there are RDRAM (Rambus DRAM) and Direct RDRAM that use RSL (Rambus signaling level) technology.
2.3.1NVRAM (non-volatile RAM)
NVRAM uses SRAM with backup power supply or with NVM (such as E2prom) to store SRAM information and recover it for non-volatile. NVRAM is characterized as read-write exactly like SRAM, and the information that is written is not lost and does not require specific erasure and programming operations for E2prom and Flash. NVRAM is used to store parameter information in the system.
2.3.2DPRAM (Dual-port RAM)
The DPRAM is characterized by simultaneous access via two ports, with two fully independent data buses, address buses, and read and write control lines, typically used for data interactions between two processors. When one end is written to the data, the other end can be learned by polling or interrupting and reading the data it writes. Since the dual CPU simultaneously accesses the DPRAM the arbitration logic circuit integrates in the DPRAM interior, therefore the hardware engineer designs the circuit the principle to be relatively simple.
The advantages of DPRAM are fast communication speed, strong real-time, simple interface, and both sides of the CPU can actively carry out data transmission. In addition to the dual-port RAM, currently IDT and other chip manufacturers have introduced multi-port RAM, can be more than 3 CPU interoperability data.
2.3.3CAM (content addressed RAM)
CAM is a memory that is addressed by content, is a special storage array RAM, its main mechanism is to automatically compare an input data item with all data items stored in the cam, to determine whether the input data item matches the data item stored in the cam, and output the matching information corresponding to the data item. In CAM, you enter the data that you want to query, and the output is the data address and the matching flag. If the match (that is, the data is searched), the data address is output. The advantages of CAM for data retrieval are unmatched by software and can greatly improve the performance of the system.
2.3.4FIFO (FIFO queue)
FIFO memory is characterized by first-in, out-of-order, FIFO is used for data buffering. FIFO is similar to dpram with two access ports, but the ports on both sides of the FIFO are not equivalent, and a moment can only be set to one side as input and one side as output.
If the region of the FIFO is a total of n bytes, we can only read the same address by looping n times, and cannot specify an offset address. For FIFO with n data, when the loop reads M times, the next read will automatically read to the m+1 data, which is determined by the characteristics of the FIFO itself.
Hardware foundation of Linux driver Design (i.)