This article is mostly reproduced and assembled, just think that these knowledge should be put together better. How process system resources are used
Most processes request memory using GLIBC, but GLIBC is also an application library that ultimately calls the operating system's memory management interface to use memory. In most cases, the glibc is transparent to the user and the operating system, so it is very helpful to observe the process logged by the operating system directly to the memory usage. But GLIBC's own implementation is also problematic, so it is too special to take account of the process of memory use also to consider the glibc factor. Other operating system resource usage can be viewed directly from the proc file system.
The type of system resources required by the process memory
The process requires memory, but it does not necessarily require physical memory. A process can request 1 g of memory, the kernel will certainly be approved to him, but the kernel really gives him the actual corresponding physical memory when the process needs to actually use this memory, this time will cause a fault, the kernel in this exception processing code to the actual memory of the process. Just as you save money in a bank, most of the time your assets are a number, only when you want to withdraw cash, the bank will need to raise cash to pay you (because he promised before), but usually the bank's cash is always small and the total number of depositors ' assets (the asset bubble is the same), When everyone is on a run, the bank has to go bankrupt. Operating system memory is the same, when the process is required to cash, the kernel will not be able to fully cash, the kernel has to crash.
The type of memory of the process: the process is used to hold the data (heap), to execute the process (stack), with the physical memory of other processes (such as shared libraries, shared memory), the size of the virtual address space of the process (depending on whether you are 32-bit or 64-bit, 2 of that number of parties), The application process is actually using the size of the physical address (RSS) that the process uses to place the part of the code to execute (TRS). In general, the physical address that the application actually uses consists of three parts: data, stack, executable code. Most processes require most of the data memory generally.
The entry point of the analysis process
Each process has a parent process, a process group, a conversation group (the general process group belongs to the conversation group, and the process belongs to the process group).
Each process has its own thread group (or even a co-worker)
The number of pages that occur can be used to diagnose memory-sensitive or excessive memory usage.
The running time of the kernel and the user state, the cumulative wait time of the task can be seen how much the process depends on the system call (perhaps you need to put the process into the kernel implementation or use non-blocking)
The scheduling policy, which CPU to run on, and the priority of the process can be used to allocate resources unfairly when system resources are low.
The number of pages swapped out and the various memory tables used illustrate the memory usage power of the process.
The process's cap capability set can see if the process has more permissions than
First,/PROC/PID/STATM
PID/STATM contains information that is active for all CPUs in this process, and all values in the file are accumulated from the start of the system to the current moment.
/PROC/1 # CATSTATM
550 70 62 451 0 97 0
Output interpretation
CPU and CPU0 ... The meaning of each parameter of each row (in the first example) is:
Parameter Interpretation/proc/1/status
Size (pages) = 550 task virtual address space VMSIZE/4
Resident (pages) = 70 the size of the physical memory that the application is using VMRSS/4
Shared (pages) = 62 shared pages
Trs (pages) = 451 the size of executable virtual memory owned by the program VMEXE/4
Lrs (pages) = 0 The size of the library that is imaged to the virtual memory space of the task VMLIB/4
Drs (pages) = 97 size of program data segment and user-state stack (vmdata+ VMSTK) 4
DT (pages) 0
Second,/proc/pid/stat
Pid/stat contains information about all CPU activity in the process, and all values in the file are accumulated from the start of the system to the current moment.
/PROC/1 # Cat Stat
1 (LINUXRC) S 0 0 0 0-1 8388864 50 633 204 2 357 72 342 16 0 1 0 22 2252800 70 4294967295 32768 1879936 31992707043199269 552 1113432 0 0 0 674311 3221479524 0 0 0 0 0 0
Each parameter means:
Parameter interpretation
Pid=1 process (including lightweight process, i.e. thread) number
comm= LINUXRC The name of the application or command
Task_state=s the status of the task, r:runnign,s:sleeping (task_interruptible), D:disk Sleep (task_uninterruptible), T:stopped, T: Tracing Stop,z:zombie, X:dead
Ppid=0 Parent Process ID
Pgid=0 Thread Group number
Sid=0 c the session group ID where the task resides
Tty_nr=0 (PTS/3) The device number of the TTY terminal of the task, INT (0/256) = main device number, (0-Main device number) = Secondary device number
The process group number of the Tty_pgrp=-1 terminal, the PID of the foreground task (including the shell application) currently running on the terminal where the task is located.
task->flags=8388864 the process flag bit to see the properties of the task
Min_flt=50 the number of page faults that occur when the task does not need to copy data from the hard disk
cmin_flt=633 cumulative number of times that all waited-for processes of the task have occurred
Maj_flt=20 the number of missing pages (main pages) that the task requires to copy data from the hard disk
CMAJ_FLT=4 cumulative number of main pages that have occurred in all waited-for processes of the task
When a process occurs with a missing fault, the process falls into the kernel state and performs the following actions:
1. Check if the virtual address you want to access is legitimate
2. Find/Assign a physical page
3, fill the physical page content (read the disk, or directly set 0, or do nothing)
4. Establish a mapping relationship (virtual address to physical address)
Re-execute the command that occurred with a missing pages interrupt
If the 3rd step, need to read the disk, then this time the fault is Majflt, otherwise it is minflt.
Utime=2 the time that the task was run in the user state, in units of jiffies
stime=357 the mission at the time of the nuclear mentality run, the unit is jiffies
cutime=72 cumulative of all the waited-for processes of the task once in the user state run time, in units of jiffies
cstime=342 cumulative of all the waited-for processes of this task ever run in the nuclear mentality at the time, in units of jiffies
Priority=16 the dynamic priority of a task
Nice=0 the static priority of a task
Num_threads=1 the number of threads in the thread group where the task resides
It_real_value=0 the delay of the next SIGALRM send process due to the timing interval, in Jiffy.
Start_time=22 the time the task was started, in Jiffies
vsize=2252800 (bytes) The virtual address space size of the task
RSS=70 (page) the size of the physical address space that the task currently resides in
These pages may be used for code, data, and stacks.
Rlim=4294967295=0xffffffff (bytes) The maximum value that the task can reside in the physical address space
start_code=32768=0x8000 the start address of the code snippet for the task in the virtual address space (determined by the connector)
end_code=1879936 the end address of the code snippet for the task in the virtual address space
START_STACK=3199270704=0XBEB0FF30 the start address of the stack for the task in the virtual address space
The current value of the kstkesp=3199269552 SP (32-bit stack pointer) is consistent with the kernel stack page of the process.
kstkeip=1113432 =0x10fd58 Pointer to the instruction to be executed, the current value of the PC (32-bit instruction pointer).
Pendingsig=0 the bitmap of the pending signal, recording the normal signal sent to the process
Block_sig=0 a bitmap for blocking signals
Sigign=0 the bitmap of the ignored signal
sigcatch=674311 the bitmap of the captured signal
wchan=3221479524 If the process is a sleep state, this value gives the scheduled call point
Number of pages nswap=0 by swapped
Cnswap=0 the number of pages that all child processes are swapped
Exit_signal=0 the signal sent to the parent process at the end of the process
Task_cpu (Task) =0 on which CPU to run
Task_rt_priority=0 relative priority levels for real-time processes
task_policy=0 process scheduling strategy, 0 = non-real-time process, 1=FIFO real-time process; 2=RR real-time process
Third,/proc/pid/status
Contains all CPU-active information, and all values in the file are accumulated from the start of the system to the current moment.
/proc/286 # Cat Status
Name:mmtest
State:r (running)
sleepavg:0%
tgid:286
pid:286
ppid:243
tracerpid:0
uid:0 0 0 0
gid:0 0 0 0
Fdsize:32
Groups:
vmpeak:1464 KB
vmsize:1464 KB
vmlck:0 KB
vmhwm:344 KB
vmrss:344 KB
Vmdata:20 KB
vmstk:84 KB
Vmexe:4 KB
vmlib:1300 KB
Vmpte:6 KB
Threads:1
sigq:0/256
sigpnd:0000000000000000
shdpnd:0000000000000000
sigblk:0000000000000000
sigign:0000000000000000
sigcgt:0000000000000000
capinh:0000000000000000
Capprm:00000000fffffeff
Capeff:00000000fffffeff
Output interpretation
Parameter interpretation
Name of the application or command
State task status, run/sleep/zombie/
The average wait time for sleepavg tasks (in nanosecond), interactive tasks because of the number of dormant, long time, their sleep_avg will be correspondingly larger, so the calculated priority will be correspondingly higher.
tgid=286 Thread Group number
pid=286 task ID
PPID=243 Parent Process ID
Tracerpid=0 the ID number of the process that received the process information tracking
UID UID euid suid fsuid
GID GID Egid sgid fsgid
The maximum number of fdsize=32 file descriptors, the maximum number of file handles that can be opened File->fds
Groups:
vmpeak:60184 KB/* The size of the process address space */
vmhwm:18020 KB/* File memory mapping and the size of the anonymous memory map */
Vmsize (KB) =1499136 the size of the task virtual address space (TOTAL_VM-RESERVED_VM), where TOTAL_VM is the size of the process's address space, RESERVED_VM: The physical page of the process between reserved or special memory
Vmlck (KB) the size of the physical memory that the =0 task has locked. Locked physical memory cannot be swapped to hard disk (LOCKED_VM)
Vmrss (KB) = 344 KB the size of the physical memory that the application is using, which is the RSS value with the parameters of the PS Command (RSS)
Vmdata (KB) =20kb the size of the program data segment (the size of the virtual memory), storing the initialized data; (TOTAL_VM-SHARED_VM-STACK_VM)
VMSTK (KB) =84kb task on the user-configured stack size (STACK_VM)
Vmexe (KB) =4kb the size of executable virtual memory owned by a program, code snippets, excluding libraries used by tasks (End_code-start_code)
Vmlib (KB) =1300kb the size of the library that is being imaged to the virtual memory space of the task (Exec_lib)
VMPTE=6KB the size of all page tables for the process, in kilobytes: KB
Threads=1 shares the number of tasks that use this signal descriptor, and in a POSIX multi-line program application, all threads in the thread group use the same signal descriptor.
SIGQ number of signals to be processed
SIGPND Shield bit that stores the pending signal for the thread
SHDPND shielding bit, storing the pending signal for this thread group
SIGBLK storage of blocked signals
Sigign storage of ignored signals
SIGCGT storage of captured signals
Capinh inheritable, the ability to inherit from a program that can be executed by the current process
CAPPRM permitted, the ability of the process to be able to use, can contain Capeff not in the ability, these capabilities are temporarily abandoned by the process itself, Capeff is a subset of CAPPRM, the process of abandoning the unnecessary ability to improve security
Capeff effective, the effective ability of the process
Iv./proc/loadavg
All values in the file are accumulated from the start of the system to the current moment. This file only gives the aggregate information for all CPUs, and it is not possible to get information about each CPU.
/proc # Cat Loadavg
1.0 1.00 0.93) 2/19 301
The meaning of each value is:
Parameter interpretation
Lavg_1 (1.0) 1-minute average load
Lavg_5 (1.00) 5-minute average load
Lavg_15 (0.93) 15-minute average load
Nr_running (2) The number of tasks that run the queue at the time of sampling, in the same way as the/proc/stat procs_running
Nr_threads (19) The number of active tasks in the system at the time of sampling (excluding tasks that have been completed)
Last_pid (301) The maximum PID value, including the lightweight process, which is the thread.
Assuming there are currently two CPUs, the current number of tasks per CPU is 4.61/2=2.31
Wu,/proc/286/smaps
The file reflects the size of the corresponding linear region of the process
/proc/286 # Cat Smaps
00008000-00009000 R-xp 00000000 00:0c1695459/memtest/mmtest
Size:4 KB
Rss:4 KB
shared_clean:0 KB
shared_dirty:0 KB
Private_clean:4 KB
private_dirty:0 KB
00010000-00011000 Rw-p 00000000 00:0c1695459/memtest/mmtest
Size:4 KB
Rss:4 KB
shared_clean:0 KB
shared_dirty:0 KB
private_clean:0 KB
Private_dirty:4 KB
00011000-00012000 rwxp 00011000 00:000 [Heap]
Size:4 KB
rss:0 KB
shared_clean:0 KB
shared_dirty:0 KB
private_clean:0 KB
private_dirty:0 KB
40000000-40019000 R-xp 00000000 00:0c2413396/lib/ld-2.3.2.so
size:100 KB
rss:96 KB
interface method for user-state processes using kernel memory management
From the operating system perspective, the process allocates memory in two ways, with two system invocations: BRK and mmap (regardless of shared memory).
1. BRK is the highest address pointer of the data segment (. data) _edata to the high address (see glibc section)
2, Mmap is in the virtual address space of the process (heap and the middle of the stack, called the file map area) to find a piece of free virtual memory.
Both of these methods allocate virtual memory and no physical memory is allocated. In the first access to the allocated virtual address space, a page break occurs, the operating system is responsible for allocating physical memory, and then establish a mapping between virtual memory and physical memory.
In the standard C library, Malloc/free function allocations are provided to release memory, which is implemented by brk,mmap,munmap these system calls.
GLIBC memory Management method pre-defined process memory area
Http://techpubs.sgi.com/library/tpl/cgi-bin/getdoc.cgi?coll=0650&db=man&raw=1&fname=/usr/share/ Catman/p_man/cat3/standard/_rt_symbol_table_size.z
GLIBC memory allocation method
The following is an example to illustrate the principle of memory allocation, the default is the use of Ptmalloc, followed by Jemalloc to pymalloc improvement implementation.
Principle
When malloc is less than 128k of memory, using BRK to allocate memory, the _edata to high address push (only the virtual space, not the corresponding physical memory (so no initialization), the first read/write data, causing the kernel page fault, the kernel allocates the corresponding physical memory, The virtual address space is then mapped to a mapping relationship), such as:
1. When the process starts, the initial layout of its (virtual) memory space is shown in 1.
Where the mmap memory-mapped file is in the middle of the heap and stack (for example, libc-2.2.93.so, other data files, etc.), the memory-mapped file is omitted for the sake of simplicity. The _edata pointer (defined inside the glibc) points to the highest address of the data segment.
2. After the process calls A=malloc (30K), Memory space 2:
The malloc function invokes the BRK system call and pushes the _edata pointer toward the high address by 30K, completing the virtual memory allocation.
You may ask: just put the _edata+30k to complete the memory allocation?
The fact is, _edata+30k just complete the allocation of virtual address, a This memory is still no physical page corresponding to it, until the first time the process read and write a memory, a page break occurs, this time, the kernel allocates a memory corresponding to the physical pages. That is, if malloc assigns a block of content and then never accesses it, a corresponding physical page is not assigned.
3, the process calls B=malloc (40K) after the memory space 3.
In case two, malloc is larger than 128k of memory, use MMAP to allocate memory, find a free memory allocation between heap and stack (corresponding to independent memory, and initialize to 0), such as:
4. After the process calls C=malloc (200K), Memory space 4:
By default, the malloc function allocates memory, and if the request memory is greater than 128K (which can be adjusted by the M_mmap_threshold option), it is not to push the _edata pointer, but instead use the MMAP system call to allocate a piece of virtual memory from the middle of the heap and stack.
This is done mainly because:
BRK allocated memory needs to wait until the high address memory is freed (for example, before B is released, A is not possible to release, which is why memory fragmentation occurs, when to tighten to see below), and mmap allocated memory can be released separately. Of course, there are other benefits, there are disadvantages, and then specific, interested students can go to see the glibc inside the malloc code.
5, the process calls D=malloc (100K) after the memory space 5;
6, after the process calls free (c), C corresponds to the virtual memory and physical memory released together.
7, Process call Free (b), 7: B corresponds to the virtual memory and physical memory is not released, because there is only one _edata pointer, if push back, then d this memory how to do? Of course, b this memory, can be reused, if this time to a 40K request, then malloc will probably put b this memory back.
8. After the process calls free (d), 8 shows: B and D are joined together and become a piece of 140K of idle memory.
9. By default: The memory Crunch operation (TRIM) is performed when the free memory of the highest address space exceeds 128K (can be adjusted by the M_trim_threshold option). In the previous step free, the highest address was found to have more than 128K of memory, and the memory crunch became shown in Figure 9.
Experiment
Look at a phenomenon after understanding the principle of memory allocation:
1 during the pressure test, it is found that the measured object performance is not ideal, the concrete performance is:
The system state of the process CPU consumes 20, the user state CPU consumes 10, the system is idle about 70
2 The Ps-o majflt,minflt-c Program command is used to see that majflt increments per second is 0, while Minflt increments is greater than 10000 per second.
Preliminary analysis
Majflt representative Major fault, Chinese name Big mistake, Minflt representative minor fault, Chinese name is small error. These two values represent the number of fault pages that have occurred since the start of a process. When a process occurs with a missing fault, the process falls into the kernel state and performs the following actions:
L Check if the virtual address you want to access is legitimate
L Find/Assign a physical page
Fill physical page contents (read disk, or direct 0, or do nothing)
l Establish a mapping relationship (virtual address to physical address)
L re-execute the command that has a missing pages interrupt
L if the 3rd step, need to read the disk, then this time the fault is Majflt, otherwise it is minflt.
L This process Minflt so high, more than 10,000 times a second, I have to suspect that it is very much related to the CPU consumption of the kernel state of the process.
Analyze code
Look at the code and find this: a request to allocate 2 m of memory with malloc and the free memory after the request ends. Looking at the log, it is found that allocating memory statements takes 10us of time, averaging one request processing time of 1000US. The reason has been found!
Although allocating memory statements takes a small amount of time to process a request, this statement severely affects performance. To explain why, you need to understand the principles of memory allocation first.
Truth
Say the memory allocation principle, then the test module in the kernel CPU consumption is very high reason is clear: each request to malloc a piece of 2M of memory, by default, malloc call Mmap allocate memory, the request ends, call Munmap free memory. Assuming that each request requires 6 physical pages, then each request will produce 6 page faults, at 2000 of the pressure, 10,000 page faults per second, which do not need to read disk resolution, so called Minflt; page break in the kernel state execution, Therefore, the kernel CPU consumption of the process is very large. The fault of the pages is scattered throughout the processing of the request, so the allocation statement time-consuming (10US) relative to the processing time (1000US) of the entire request is very small.
Solutions
Change dynamic memory to static allocation, or start with malloc for each thread and then save it in Threaddata. However, due to the particularity of this module, static allocation, or startup time allocation is not OK. In addition, Linux under the default stack size limit is 10M, if you allocate a few m of memory on the stack, there is a risk.
Disable malloc calls Mmap allocate memory, and suppress memory crunch.
At the start of the process, add the following two lines of code:
Mallopt (m_mmap_max,0); Disable malloc calls Mmap allocate memory
Mallopt (m_trim_threshold,-1); Suppress Memory crunch
Effect: After adding these two lines of code, with the PS command to observe, the pressure is stable, majlt and Minflt are 0. The system-State CPU of the process dropped from 20 to 10.
Summary
You can use the command Ps-o majfltminflt-c program to view the Majflt of a process, the value of Minflt, both of which are cumulative values that accumulate from the start of the process. We can pay more attention to these two values when we stress test the high-performance requirements of the program.
If a process uses mmap to map a large data file to the virtual address space of a process, we need to focus on the value of Majflt because the damage to performance compared to Minflt,majflt is fatal, and the time-consuming order of random reads of a disk is several milliseconds, And Minflt only have a lot of time to affect performance.
Other Memory request Management algorithm implementation
malloc used in glibc is not the only memory management method available. Bionic's Dlmalloc, Google's Tcmalloc also has been widely regarded as the strongest jemalloc. The core idea of jemalloc is to divide the memory pool into 3 levels, each with its own pool of memory, and some large memory pools, most of which are huge memory pools. Instead, Tcmalloc manages a series of memory pools, each of which develops affinity to a memory pool. So the jemalloc is suitable for a fixed number of threads, while the tcmalloc is suitable for a large number of thread changes.
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Introduction to Linux Kernel Engineering--User space process using kernel resources