Memory mapping (Linux device drivers)

Source: Internet
Author: User
Tags semaphore

Part I: The MMAP system call directly maps the device memory to the address space of the user process.
Part II: Direct access to memory pages of user space across borders. Some related drivers require this capability (how user-space memory maps to the kernel method get_user_pages)
Part III: Direct Memory access (DMA) I/O operations enable peripherals to have direct access to the system's memory capacity.




Memory Management for Linux

Address Type
Linux is a virtual memory system,
This means that the address used by the user program is not equivalent to the physical address used by the hardware.


Virtual memory introduces an indirect layer.

Linux systems handle multiple types of addresses, and each type of address has its own semantics.


But what type of address is used in which case. Kernel code is not clearly differentiated.

The following is a list of address types used by Linux:
User virtual address : The general address that the user space program can see.
Physical Address : used between the processor and the system memory
Bus address : used between the perimeter bus and memory
Kernel logical address : The general address space that makes up the kernel, which maps some (or all) of the memory and is often treated as a physical address.
kernel virtual address: The mapping of the kernel virtual address to the physical address does not have to be linear or one-to-one, and this is the characteristic of the logical address space.



( The memory returned by Kmalloc is the kernel logical address .)

)


All logical addresses are kernel virtual addresses. However, many kernel virtual addresses are not logical addresses.


The memory allocated by Vmalloc has a Virtual Address, the Kmap function also returns a Virtual Address


Physical Address and Page

High-end and low-end memory
The limits of low-end memory and high-end memory can be changed when the kernel is configured. This limit is typically set to less than 1GB.


This boundary is not related to the 640KB limitations of the earlier PCs, and is not hardware-independent.
It is set by the kernel. The 32-bit address space is cut into kernel space and user space.



Memory Mapping and page structure
Because of historical relationships, the kernel uses logical addresses to refer to pages in physical memory.

However, logical addresses cannot be used in high-end memory, so functions that handle memory in the kernel tend to use pointers to the page structure.

(defined in <linux/mm.h>)

This data structure is used to hold all the physical memory information that the kernel needs to know.
For each physical page in the system, there is a page structure corresponding to it.


The kernel maintains one or more page structure data to track the physical memory in the system.

In some systems, there is a separate array called Mem_map.

there are functions and macros used to convert between the page structure pointer and the virtual address:
/* is responsible for translating the kernel logical address into the corresponding page structure pointer. Because it requires a logical address . Therefore, you cannot manipulate Vmalloc generated addresses and high-end memory */
struct page * virt_to_page (void *kaddr);
/* Returns the page structure pointer for the given frame number */
struct page *pfn_to_page (int pfn);
/* Returns the kernel virtual address of the page. For high-end memory, only the address exists when the memory page is mapped. */
void *page_address (struct page *page);

#include <linux/highmem.h>
/*kmap returns the kernel virtual address for the page in the system */
void *kmap (struct page *page);
void Kunmap (struct page *page);

Kmap returns the kernel virtual address for the page in the system.

For low-end memory, too. It simply returns the logical address of the page;

For high-end memory. Kmap creates special mappings in the dedicated kernel address space.

Mappings created by Kmap need to be freed with Kunmap.




Page table
Converts the virtual address to the corresponding physical address.


Virtual memory Area (VMA)
The virtual memory area (VMA) is used to manage kernel data structures in different areas of the process address space.

A VAM represents a homogeneous area in the virtual memory of the process:

A contiguous range of virtual memory addresses with the same permission flag bits and backups of the same object.

"Memory object with properties of its own"


Memory mappings for processes (at least) include:
Programmable code (text) area of the program
Multiple data areas
The area corresponding to the memory map for each activity


View the/proc/<pid/maps> file to understand the memory area of the process.

The


/proc/self always points to the current process.
Each line is represented in the following form:
start-end Perm Offset major:minor inode image
Start-end: The virtual address at the beginning and end of the memory area.
Perm: the bitmask of read, write, and run permissions for the memory area. Describe what kind of process can be interviewed.
Offset: represents the starting position of the memory area in the mapping file.
Major Minor: the main device number and the secondary device number of the device that owns the mapped file.

For device mappings, the main device number and the secondary device number refer to a disk partition that includes special files for the device. The file is opened by the user rather than the device itself.
Inode: the index node number of the mapped file.
Image: the name of the mapped file (which is typically a running image).


Vm_area_struct structure
When a user-space process calls Mmap to map device memory to its address space. The system responds by creating a new VMA that represents the mapping. The
driver that supports mmap needs to help the process finish VMA initialization. The

Kernel maintains the VMA list and tree structure, while many members of the vm_area_struct are used to maintain the organization.
Therefore, the driver cannot create the VMA arbitrarily. Or break the structure of this Organization.




Main members of VMA:
unsigned long vm_start;
unsigned long vm_end;
struct file *vm_file;
unsigned long vm_pgoff;
unsigned long vm_flags;
struct vm_operations_struct *vm_ops;
void *vm_private_data;


VM_OPERATIONS_STRUCT structure:
These actions are only used to handle the memory requirements of the process.
void (*open) (struct vm_area_struct *vma);
void (*close) (struct vm_area_struct *vma);
/*
Call the Nopage function for the relevant area when a process is visiting a page belonging to a legitimate VMA, but the page is not in memory. When a physical page is read from secondary storage, the function returns a pointer to the page structure that points to the physical pages. Assume that the zone does not have a nopage function defined. The kernel assigns an empty page to it.


*/
struct page * (*nopage) (struct vm_area_struct *vma, unsigned long address, int *type);
/* Before the User space access page. The function agrees that the kernel will pre-load these pages into memory.

*/
Int (*populate) (struct vm_area_struct *vm, unsigned long address, unsigned long len, pgprot_t prot, unsigned long pgoff, I NT Nonblock);


Memory-mapped processing
Each process in the system (in addition to some of the auxiliary threads in the kernel space) has a struct MM_STRUCT structure (defined in <linux/sched.h>), which includes a list of virtual memory areas, page tables, and other large amounts of memory management information. Also includes a semaphore (Mmap_sem) and a spin lock (Page_table_lock).


A pointer to the structure that can be found in the task structure.


When the driver needs to be interviewed for it. The usual approach is to use CURRENT->MM.
Multiple processes can share a memory management structure, and Linux implements threading in such a way.


Mmap Device operation
In modern UNIX systems, memory mapping is one of the most attractive features.
For drivers, memory mapping provides the ability for a user program to directly access the device's memory.
Mapping a device means associating a piece of memory in the user space with the device memory. When the program reads and writes within the assigned address range, the actual access is the device.


But not all devices can be mmap abstracted, such as serial ports and other streaming-oriented devices.
Another limitation of mmap is that it must be mapped in page_size units.
The kernel can only manage virtual addresses at the page table level. Those mapped areas must therefore be an integral multiple of the page_size. And the starting address in physical memory is also required to be an integer multiple of page_size.


Most PCI peripherals map their control registers to memory addresses.

The




Mmap method is part of the file_operations structure and is called when the mmap system call is run. The
system call has the following declaration:
mmap (caddr_t addr, size_t len, int prot, int flags, int fd, off_t offset);
File Operation declaration:
Int (*mmap) (struct file *filp, struct vm_area_struct *vma);


In order to run mmap. The driver needs to create an appropriate page table for the address range and replace the Vma->vm_ops with a series of new operations.

has two methods for creating page tables: Use the Remap_pfn_range function once for all builds. or create a page table each time by nopage the VMA method.


Use Remap_pfn_range
Remap_pfn_range and Io_remap_page_range are responsible for establishing a new page table for a physical address.
Prototype:
/*pfn points to the actual system RAM using the */
int remap_pfn_range (struct vm_area_struct *vma, unsigned long virt_addr, unsigned LONG&NBSP;PFN, unsigned long size, pgprot_t prot);
/*phys_addr points to I/O memory using */
int io_remap_page_range (struct vm_area_struct *vma, unsigned long virt_addr, unsigned& Nbsp;long phys_addr, unsigned long size, pgprot_t prot);


uses nopage to map memory (returns the page structure pointer)
When an application wants to change the address that a mapped area is bound to. The MREMAP system call is used. This is the best time to use the nopage mapping.


Assuming the size of the VMA is smaller, the kernel refreshes unnecessary pages without notifying the driver.
Assuming the size of the VMA becomes larger, the driver will finally find out when the new page is set when Nopage is called.
Suppose you want to support MREMAP system calls. You must implement the Nopage function.

Nopage function Prototypes:
struct page* (*nopage) (struct vm_area_struct *vma, unsigned long address, int *type);
The associated nopage function is called when the user wants to access the page in VMA and the page is not in memory.
Address references include the virtual addresses that caused the error.


The Nopage function must locate and return a pointer to the page structure that the user needs.


The function also calls the Get_page macro to add the usage count of the returned memory pages.

Get_page (struct page *pageptr);
The kernel maintains this count for each memory page, and when the count value is 0 o'clock, the kernel places the page in the spare list. When VMA is de-mapped. The kernel reduces the usage count for each memory page in the zone.
In a device driver, the correct value for type should always be vm_fault_minor.


Typically, the Nopage method returns a pointer to the page structure.


PCI memory is mapped to the highest end of the system memory. Therefore, there is no entry for these addresses in the system memory map, so a pointer to the page structure cannot be returned.

In such a case. You must use Remap_page_range.

Assume that the Nopage function is null. The kernel code responsible for handling page faults will map the 0 memory pages to the defunct virtual address.
The 0 memory page is a write-copy memory page that returns 0 when it is read and is used to map the BSS segment.
Assuming a process calls Mremap to expand a mapping area, and the driver does not implement Nopage, the process will finally get a whole piece of zero memory without generating a segment fault error.




Remap a specific I/O region
Instead of mapping all addresses, a typical driver simply maps a small segment of the address associated with its peripheral device.



One limitation to the Remap_pfn_range function is that it only has access to the physical address of the reserved page and beyond the physical memory.



In Linux. When memory is mapped, the Physical Address page is marked as "reserved" (reserved), which means that memory management has no effect on it.

The reserved page is locked in memory. And is the only memory page that can be safely mapped to user space.

This limitation is the basic requirement to ensure the stability of the system.
Remap_pfn_range does not agree to map the regular address again. Contains the address obtained by calling the Get_free_page function.


It can map the high-end PCI buffers and ISA memory again.


Remap RAM using the Nopage method
The actual RAM is mapped to user space by using Vm_ops->nopage to process one page error at a time.

(once again, map the kernel virtual address)
A true kernel virtual address. The address returned by this function, such as Vmalloc, is a virtual address mapped to the Kernel page table.




Run Direct I/O interview
For block devices and network devices. The high-level code in the kernel is set up and uses direct I/O. The driver-level code does not even need to know that direct I/O is already running.



In the 2.6 kernel, the key to implementing direct I/O is a function called Get_user_pages. It is defined in <linux/mm.h>. And by the following prototype:
int get_user_pages (struct task_struct *tsk, struct mm_struct *mm, unsigned long start, int len, int write, int force, Stru CT page **pages, struct vm_area_struct **vmas);
Tsk pointer to a task that runs I/O;
A pointer to a memory management structure describing the mapped address space.
Start is the address of the user space buffer;
Len is the length of the buffer within the page.
Write nonzero indicates write permission to the mapped page.
The force flag tells the Get_user_pages function not to consider the protection of the specified memory page;
Pages (output parameters) pages contains a list of pointers describing the page structure of the user space buffer.
The VMAs (output parameters) includes pointers to the corresponding VMA.



The Get_user_pages function is an underlying memory management function that uses a more complex interface.

It needs to set the reader/writer Semaphore of Mmap to read mode before calling.
Such as:
Down_read (&current->mm->mmap_sem);
result = Get_user_pages (current, current->mm,...);
Up_read (&current->mm->mmap_sem);
The returned value is the number of pages actually being mapped.
Assume that the call was successful. The caller will have a group of pages pointing to the user space buffer, which will be locked in memory.

In order to be able to manipulate the buffer directly, the kernel space code must use the Kmap or Kmap_atimic function to convert each page structure pointer to a kernel virtual address. Devices that use direct I/O typically use DMA operations, so the driver creates a scatter/aggregate list from the page structure pointer array.
Once the direct I/O operation is complete, the user memory page must be freed.
Before releasing, assume that the contents of these pages have been changed, the kernel must be notified.

Each changed page must be marked with the following function:
void Setpagedirty (struct page *page);
(User-space memory is usually not marked as reserved.)
They must be freed from the page cache, regardless of whether the page has been changed.
void page_cache_release (struct page *page);


asynchronous I/O
asynchronous I/O consents to user-space initialization operations, but does not have to wait for them to complete.


Block devices and network device programs are fully asynchronous operations.
Only character device drivers need to clearly indicate the need for asynchronous I/O support.


The implementation of asynchronous I/O always includes direct I/O operations.
There are three File_operations methods for implementing asynchronous I/O:
ssize_t (*aio_read) (struct KIOCB *iocb, char *buffer, size_t count, loff_t offset);
ssize_t (*aio_write) (struct KIOCB *iocb, char *buffer, size_t count, loff_t offset);
Int (*aio_fsync) (struct KIOCB *iocb, int datasync);
Aio_fsync operations are only meaningful for file systems.


The purpose of the Aio_read and Aio_write functions is to initialize the read and write operations. When these two functions are complete, the read-write operation may or may not be complete.



Assuming support for asynchronous I/O, you must know the fact that the kernel sometimes creates a "sync Iocb".


The synchronization identity is identified in IOCB, and the driver should use the following function to query:
int IS_SYNC_KIOCB (struct KIOCB *IOCB);
Assuming that the function returns a value other than 0, the driver must run the synchronization operation.

The driver must notify the kernel that the operation is complete.
int aio_complete (struct KIOCB *IOCB, long res, long res2);
Once the aio_complete is called, it is no longer possible to access the IOCB or user buffers.

Memory mapping (Linux device drivers)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.