Introduction to Linux Kernel Engineering--memory Management (III)

Source: Internet
Author: User

User-side Kernel memory parameter tuning/proc/sys/vm/(needs to be adjusted according to kernel version) Exchange related Swap_token_timeout

Thisfile contains valid hold time of the swap out protection token. The Linux VM Hastoken based thrashing control mechanism and uses the token to Preventunnecessary page faults in thrashing situation. The unit of the value Issecond. The value would is useful to tune thrashing behavior. This tunable wasremoved in 2.6.20 when the algorithm got improved.

Swappiness

Swappiness is aparameter which sets the kernel ' s balance between reclaiming pages from thepage cache and swapping process Memory. The default value is 60. If you Wantkernel-to-swap out more process memory and thus caches more file contentsincrease the value. Otherwise, if you would like kernel to swap less decreaseit.

Page-cluster

Page-cluster controls the number of Pageswhich is written to swap with a single attempt. The swap I/O size. It is alogarithmic value-setting it to zero means "1 page", setting it to1 means "2 pages", setting it to 2 means "4 pag Es ", etc. The default value is three (eight pages at a time). There may is some smallbenefits in tuning this to a different value if your workload isswap-intensive.

File Cache related Vfs_cache_pressure

Controls the tendency of the kernel to reclaimthe memory which are used for caching of directory and Inode objects. At thedefault value of vfs_cache_pressure = kernel would attempt to reclaimdentries and inodes at a ' fair ' rate wit H respect to Pagecache Andswapcache reclaim. Decreasing vfs_cache_pressure causes the kernel to prefer toretain dentry and inode caches. Increasing vfs_cache_pressure beyond causesthe kernel to prefer to reclaim dentries and Inodes.

Nr_pdflush_threads

The count of Currently-running pdflushthreads. This is a read-only value.

Min_free_kbytes

This was used toforce the Linux VM to keep a minimum number of kilobytes free. The VM uses Thisnumber to compute a pages_min value for each LOWMEM zone in the system. Eachlowmem Zone gets a number of reserved free pages based proportionally on itssize.

Dirty_background_ratio

The parameter dirty_background_ratio is when all the changed page total size occupies a certain percentage of working memory, Pdflush will start to write back to work. The user can increase this ratio to increase the time that the page resides in memory.

Dirty_expire_centisecs

The parameter Dirty_expire_centisecs controls how long a changed page is considered outdated and must be written back.

Dirty_ratio

Contains, as a percentage of total systemmemory, the number of pages at which a process which is generating disk Writeswil L itself start writing out dirty data.

Dirty_writeback_centisecs

The parameter dirty_writeback_centisecs is the interval of time that wakes up in the Pdflash thread cycle. That is, every time Pdflsh will write the modified data back to disk.

Drop_caches

Writing to Thiswill cause the kernel to drop clean caches, dentries and inodes from memory,causing this memory to become F Ree. To free Pagecache:

Echo 1 >/proc/sys/vm/drop_caches

To free dentries and inodes:

Echo 2 >/proc/sys/vm/drop_caches

To free Pagecache, dentries and Inodes:

Echo 3 >/proc/sys/vm/drop_caches

As this is a non-destructive operation, Anddirty objects was not freeable, the user should run ' sync ' first inorder to Mak E sure all cached objects is freed. This tunable was added in2.6.16.

Laptop_mode

In notebook mode, the kernel uses the I/O system smarter, and it tries to keep the disk in a low-power state. Notebook mode organizes many I/O operations together, one at a time, and a 10-minute inactivity period between each disk I/O, which significantly reduces the number of disk launches. In order to complete this long inactivity period, the kernel is required to complete as many I/O tasks as possible during an active period. During one activity, a large amount of pre-reading is completed and all buffers are synchronized.

Memory allocation related Percpu_pagelist_fraction

This is thefraction of pages at the most (high mark Pcp->high) in each zone, areallocated for each CPU page list. The min value is 8. It Meansthat we don ' t allow more than 1/8th of pages in each zone to be allocated Inany single per_cpu_pagelist. This entry is changes the value of hot per cpupagelists. User can specify a number like the allocate 1/100th of each Zoneto each per CPU page list. The batch value of each per CPU pagelist is alsoupdated as a result. It is set to PCP->HIGH/4. The upper limit of batch is (Page_shift * 8). The initial value is zero. Kernel does not use this value atboot time to set the high water marks for each per CPU page list.

Overcommit_memory

Controlsovercommit of system memory, possibly allowing processes to allocate (but notuse) more memory than is actually AVA Ilable.

0-heuristic overcommit handling. Obviousovercommits of address space is refused. Used for a typical system. It Ensuresa seriously wild allocation fails while allowing Overcommit to reduce swapusage. Root is allowed to allocate slighly more memory in this mode. This isthe default.

1-always Overcommit. Appropriate for somescientific applications.

2-don ' t overcommit. The total Addressspace commits for the system are not permitted to exceed swap plus a configurablepercentage (default is 50) of physical RAM. Depending on the percentage your use,in most situations this means a process would not being killed while attempting touse alre Ady-allocated memory but would receive errors on memory allocation asappropriate.

Overcommit_ratio

Percentage of physical memory size to Includein overcommit calculations. Memory allocation limit = Swapspace + Physmem * (overcommit_ratio/100) Swapspace = total size of all swap areas

Physmem = size of physical memory in system

Max_map_count

This filecontains the maximum number of memory map areas a process could have. Memory Mapareas is used as a side-effect of calling malloc, directly by Mmap Andmprotect, and also when loading shared Li Braries. While most applications needless than a thousand maps, certain programs, particularly malloc debuggers, mayconsume lots of them, e.g., up to one or both maps per allocation. The DefaultValue is 65536.

Mmap_min_addr

This fileindicates the amount of address space which a user process would be restrictedfrom mmaping. Since kernel null dereference bugs could accidentally operatebased on the information in the first couple of pages of memo    Ry Userspaceprocesses should not being allowed to write to them. By default this value is set to 0 and noprotections'll be enforced by the security module. Setting This value tosomething like 64k would allow the vast majority of applications to workcorrectly and provide defense In depth against potential kernel bugs.

Lowmem_reserve_ratio

Ratio of TotalPages to free pages for each memory zone.

Legacy_va_layout

If Non-zero,this Sysctl Disables the new 32-bit mmap map layout-the kernel would use Thelegacy (2.4) layout for all Proce SSEs

Other Block_dump

The parameter Block_dump sets the block I/O to a non-zero value when debugging. If you want to find out which processes are causing the disk rotation (see/proc/sys/vm/laptop_mode), you can collect information by setting a flag. When this flag is set, Linux will report read and write operations as well as all dirty blocks in the form of files for all disk activity. This allows it to explain why a disk needs to be rotated and can even increase battery life. To write the block_dump output to the kernel output, you can use "DMESG" related information. When you use the Block_dump and kernel logging levels and also include kernel debugging information, you may want to turn off klogd, otherwise the block_dump output will be logged, causing unhealthy disk activity to be there.

Hugepages_treat_as_movable

When a non-zerovalue was written to this tunable, the future allocations for the huge page Poolwill use zone_movable. Despite huge pages being non-movable, we do notintroduce additional external fragmentation of note as huge pages is alway Sthe largest contiguous block we care about. Huge pages is movable so is notallocated from zone_movable by default. However, as zone_movable would alwayshave pages that can is migrated or reclaimed, it can be used to satisfyhugepage Alloca tions even when the system has been running a long time. Thisallows an administrator to resize the Hugepage pool at runtime depending on thesize of zone_movable.

Hugetlb_shm_group

Hugetlb_shm_groupcontains group ID That's allowed to create SysV shared memory Segment Usinghugetlb page

Nr_hugepages

Nr_hugepages configures number of hugetlbpage reserved for the system.

Numa_zonelist_order

This sysctl are only for NUMA. ' Where thememory is allocated from ' was controlled by zonelists. In Non-numa case, azonelist for Gfp_kernel is ordered as Following:zone_normal-ZONE_DMA. This means, a memory allocation request for Gfp_kernel would get memory FROMZONE_DMA only when zone_normal are not avail Able. In the NUMA case, you can think offollowing 2 types of order. Assume 2 node NUMA and below is Zonelist ofnode (0) ' s Gfp_kernel:

(A) node (0) Zone_normal, node (0) ZONE_DMA, node (1) Zone_normal

(B) node (0) Zone_normal, node (1) zone_normal (0) Zone_dma. Type (A) offers the best locality forprocesses on Node (0), but ZONE_DMA would be used before zone_normal exhaustion. This increases possibility of out-of-memory (OOM) of ZONE_DMA because Zone_dmais tend to be small. Type (B) cannot offer the "best locality" is the more robustagainst OOM of the DMA zone. Type (A) is called as "Node" order. Type (B) is "Zone" order. "Node order" orders the Zonelists Bynode, then by zone within each Node. Specify "[Nn]ode" for Nodeorder. "Zone Order" orders the zonelists by zone type, then by Nodewithin each zone. Specify "[Zz]one" for zone order. Specify "[Dd]efault" to request automatic configuration. Autoconfigurationwill Select "Node" order in following case:

(1) If the DMA zone does not exist or

(2) If the DMA zone comprises greater than50% of the available memory or

(3) If any node's DMA zone comprisesgreater than 60% of its local memory and the amount of the local memory are bigenough. Otherwise, "zone" order would be selected. Default order isrecommended Unless this is the causing problems for your system/application.

Panic_on_oom

This enables or disables panic onout-of-memory feature. If this was set to 1, the kernel panics whenout-of-memory happens. If this was set to 0, the kernel would kill some rogueprocess, by calling Oom_kill (). Usually, Oom_killer can kill rogue Processesand system would survive. If you want to panic the system rather than killingrogue processes, set this to 1. The default value is 0.

Stat_interval

With this tunable you can configure Vmstatistics update interval. The default value is 1. This tunable first Appearedin 2.6.22 kernel.

Vdso_enabled

Whenthis flag is set, the kernel maps a VDSO page into newly created processes andpasses it address down to glibc upon ex EC (). This feature is enabled Bydefault. VDSO is a virtual DSO (dynamic shared object) exposed by the kernel atsome address in every process ' memory. It ' s purpose is to the speed up systemcalls. The mapping address used to is fixed (0xffffe000), but starting with2.6.18 it's randomized (besides the security Implicati ONS, this also helpsdebuggers

Related system invoke API related kernel call Apilinux Performance tool performance monitoring tool

Performance testing Tools

Performance optimization Tools

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Introduction to Linux Kernel Engineering--memory Management (III)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.