From: http://lwn.net/Articles/447405/
The problem of allocating large chunks of physically-contiguous memory has often been discussed in these pages. virtual Memory, by its nature, tends to scatter pages throughout the system; the kernel does not have to be running for long before free pages which happen to be next to each other become scarce. for policyears, The way kernel developers have dealt with this problem has been to avoid dependencies on large contiguous allocations whenever possible. kernel code which tries to allocate more than two physically-contiguous pages is rare.
Recently the need for large contiguous allocations has been growing. one source of demand is huge pages, and the transparent huge pages featurein particle. another is an old story with a new twist: hardware which is unable to perform scatter/gather DMA. any device which can only do DMA to a physically contiguous area requires (in the absence of an I/O Memory Management Unit) a physically contiguous buffer to work. this requirement is often a sign of relatively low-end (stupid) hardware; one cocould hope that such hardware wowould become scarce over time. what we are seeing, though, are devices which have managed to gain capabilities while retaining the contiguous DMA requirement. for example, there are video capture engines which can grab full high-definition data, perform a number of transformations on it, but still need a contiguous buffer for the result. the advent of high definition video has aggravated the problem-those physically-contiguous buffers are now quite a bit bigger and harder to allocate than they were before.
Almost one year ago, lwn looked at the contiguous Memory Allocator (CMA) patches which were meant to be an answer to this problem. this patch set followed the venerable tradition of reserving a chunk of memory at boot time for the sole purpose of satisfying large allocation requests. over the years, this technique has been used by the "bigphysarea" patch, or simply by booting the kernel withMem =Parameter that left a range of physical memory unused. the out-of-tree Android pmem driveralso allocates memory chunks from a reserved range. this approach certainly works; nearly 20 years of experience verifies that. the down side is that the reserved memory is not available for any other use; if the device is not using the memory, it simply sits idle. that kind of waste tends to be unpopular with kernel developers-and users.
For that reason and more, the CMA patches were never merged. the problem has not gone away, though, and neither have the developers who are working on it. the latest version of the CMA patch setlooks quite a bit different; while some issues still need to be resolved, this patch set looks like it may have a much better chance of getting into the mainline.
The CMA Allocator can still work with a reserved region of memory, but that is clearly not the intended mode of operation. instead, the new CMA tries to maintain regions of memory where contiguous chunks can be created when the need arises. to that end, CMA relies on the "Migration type" mechanic built deeply into the memory management code.
Within each zone, blocks of pages are marked as being for use by pages which are (or are not) movable or reclaimable. movable pages are, primarily, page cache or anonymous memory pages; they are accessed via page tables and the page cache Radix tree. the contents of such pages can be moved somewhere else as long as the tables and tree are updated accordingly. reclaimable pages, instead, might possibly be given back to the kernel on demand; they hold data structures like the inode cache. unmovable pages are usually those for which the kernel has direct pointers; Memory obtained fromKmalloc ()Cannot normally be moved without breaking things, for example.
The memory management subsystem tries to keep movable pages together. if the goal is to free a larger chunk by moving pages out of the way, it only takes a single Nailed-down page to ruin the whole effort. by grouping movable pages, the kernel hopes to be able to free larger ranges on demand without running into such snags. the memory compactioncode relies on these ranges of movable pages to be able to do its job.
CMA extends this mechanism by adding a new "CMA" Migration type; it works much like the "movable" type, but with a couple of differences. the "CMA" type is sticky; pages which are marked as being for CMA shoshould never have their migration type changed by the kernel. the Memory Allocator will never allocate unmovable pages from a CMA area, and, for any other use, it only allocates CMA pages when alternatives are not available. so, with luck, the areas of memory which are marked for use by CMA shoshould contain only movable pages, and it shoshould have a relatively high number of free pages.
In other words, memory which is marked for use by CMA remains available to the rest of the system with the one restriction that it can only contain in movable pages. when a driver comes along with a need for a contiguous range of memory, the CMA Allocator can go to one of its special ranges and try to shove enough pages out of the way to create a contiguous buffer of the appropriate size. if the pages contained in that area are truly movable (the real world can get in the way sometimes), It shoshould be possible to give that driver the buffer it needs. when that buffer is not needed, though, the memory can be used for other purposes.
One might wonder why this mechanism is needed, given that memory compaction can already create large physically-contiguous chunks for transparent hugepages. that code works: Your editor's system, as of this writing, has about 25% of its memory allocated as huge pages. the answer is that DMA buffers present some different requirements than huge pages. they may be larger, for example; transparent huge pages are 2 Mb on most ubuntures, while DMA buffers can be 10 MB or more. there may be a need to place DMA buffers in specific ranges of memory if the underlying hardware is weird enough-and CMA developer Marek szyproski seems to have some weird hardware indeed. finally, a 2 mb huge page must also have 2 MB alignment, while the alignment requirements for DMA buffers are normally much more relaxed. the CMA Allocator can grab just the required amount of memory (without rounding the size up to the next power of two as is done in the buddy Allocator) without worrying about overly stringent alignment demans.
The CMA patch set provides a set of functions for setting up regions of memory and creating "contexts" for specific ranges. Then there are simpleCm_alloc ()AndCm_free ()Functions for obtaining and releasing buffers. it is expected, though, that device drivers will never invoke CMA directly; instead, awareness of CMA will be built into the DMA support functions. when a driver calla function likeDma_alloc_coherent (), CMA will be used automatically to satisfy the request. In most situations, it shoshould "just work ."
One of the remaining concerns about CMA has to do with how the special regions are set up in the first place. the current scheme expects that some special callwill be made in the system's board file; it is a very arm-like approach. the intent is to get rid of board files, so something else will have to be found. moving this information to the device tree is not really an option either, as arnd Bergmann pointed out; it is really a policy demo. arnd is pushing for some sort of reasonable default setup that works on most systems; quirks for systems with special challenges can be added later.
The end result is that there's likely to be at least one more iteration of this patch set before it gets into the mainline. but CMA addresses a real need which has been met, thus far, with out-of-tree hacks of varying kludginess. this code has the potential to make physically-contiguous allocations far more reliable while minimizing the impact on the rest of the system. it seems worth having.