Linux memory leakage detection

Source: Internet
Author: User

Linux memory leakage detection
In actual projects, the most difficult problem is memory leakage. Of course, there are panic and so on. Memory leakage is divided into two parts: user space and kernel space. we will analyze these two levels separately.
It is relatively easy to view and solve memory leaks in user space. There are also a lot of methods and tools to locate the problem. Let's take a look.
1. view memory information
Cat/proc/meminfo, free, cat/proc/slabinfo, etc.
2. View Process status information
Top, ps, cat/proc/pid/maps/status/fd, etc.
We usually locate the problem and check the status of the currently running process in ps under shell. The embedded system may display less information.

 
 
  1. Root @ hos-machine :~ # Ps-uaxw
  2. User pid % CPU % MEM VSZ RSS TTY STAT START TIME COMMAND
  3. Root 1 0.0 0.1 119872 3328? Ss August 10 0:24/sbin/init splash
  4. Root 2 0.0 0.0 0 0? S August 10 0:00 [kthreadd]
  5. Root 3 0.0 0.0 0 0? S August 10 0:44 [ksoftirqd/0]
  6. Root 5 0.0 0.0 0 0? S <August 10 0:00 [kworker/0: 0 H]
  7. Root 7 0.0 0.0 0 0? S August 10 3:50 [rcu_sched]
  8. Root 8 0.0 0.0 0 0? S August 10 0:00 [rcu_bh]
  9. Root 9 0.0 0.0 0 0? S August 10 0:12 [migration/0]
  10. Root 10 0.0 0.0 0 0? S August 10 0:01 [watchdog/0]
  11. Root 11 0.0 0.0 0 0? S August 10 0:01 [watchdog/1]
  12. Root 12 0.0 0.0 0 0? S August 10 0:12 [migration/1]
  13. Root 13 0.0 0.0 0 0? S August 10 1:18 [ksoftirqd/1]
  14. Root 15 0.0 0.0 0 0? S <August 10 0:00 [kworker/1: 0 H]
  15. Root 16 0.0 0.0 0 0? S August 10 0:01 [watchdog/2]
  16. Root 17 0.0 0.0 0 0? S August 10 0:12 [migration/2]
  17. Root 18 0.0 0.0 0 0? S August 10 1:19 [ksoftirqd/2]
  18. Root 20 0.0 0.0 0 0? S <August 10 0:00 [kworker/2: 0 H]
  19. Root 21 0.0 0.0 0 0? S August 10 0:01 [watchdog/3]
  20. Root 22 0.0 0.0 0 0? S August 10 0:13 [migration/3]
  21. Root 23 0.0 0.0 0 0? S August 10 0:41 [ksoftirqd/3]
  22. Root 25 0.0 0.0 0 0? S <August 10 0:00 [kworker/3: 0 H]
  23. Root 26 0.0 0.0 0 0? S August 10 0:00 [kdevtmpfs]
  24. Root 27 0.0 0.0 0 0? S <August 10 0:00 [netns]
  25. Root 329 0.0 0.0 0 0? S <August 10 0:00 [ext4-rsv-conver]
  26. Root 339 0.0 0.0 0 0? S <August 10 0:05 [kworker/1: 1 H]
  27. Root 343 0.0 0.0 0 0? S <August 10 0:11 [kworker/3: 1 H]
  28. Root 368 0.0 0.0 39076 1172? Ss August 10 0:10/lib/systemd-journald
  29. Root 373 0.0 0.0 0 0? S August 10 0:00 [kauditd]
  30. Root 403 0.0 0.0 45772 48? Ss August 10 0:01/lib/systemd-udevd
  31. Root 444 0.0 0.0 0 0? S <August 10 0:09 [kworker/2: 1 H]
  32. Systemd + 778 0.0 0.0 102384 516? Ssl August 10 0:04/lib/systemd-timesyncd
  33. Root 963 0.0 0.0 191264 8? Ssl August 10 0:00/usr/bin/vmhgfs-fuse-o subtype = vmhgfs-fuse, allow_other/mnt/hgfs
  34. Root 987 9.6 0.0 917024 0? Ssl August 10 416: 08/usr/sbin/vmware-vmblock-fuse-o subtype = vmware-vmblock, default_permi
  35. Root 1007 0.2 0.1 162728 3084? Sl August 10 10:14/usr/sbin/vmtoolsd
  36. Root 1036 0.0 0.0 56880 844? S August 10 0:00/usr/lib/vmware-vgauth/VGAuthService-s
  37. Root 1094 0.0 0.0 203216 388? Sl August 10 1:48./ManagementAgentHost
  38. Root 1100 0.0 0.0 28660 136? Ss August 10 0:02/lib/systemd-logind
  39. Message + 1101 0.0 0.1 44388 2608? Ss August 10 0:21/usr/bin/logs-daemon -- system -- address = systemd: -- nofork -- nopidfile
  40. Root 1110 0.0 0.0 173476 232? Ssl August 10 0:54/usr/sbin/thermald -- no-daemon -- disable-enable
  41. Root 1115 0.0 0.0 4400 28? Ss August 10 0:14/usr/sbin/acpid
  42. Root 1117 0.0 0.0 36076 568? Ss August 10 0:01/usr/sbin/cron-f
  43. Root 1133 0.0 0.0 337316 976? Ssl August 10 0:00/usr/sbin/ModemManager
  44. Root 1135 0.0 0.2 634036 5340? Ssl August 10 0:19/usr/lib/snapd
  45. Root 1137 0.0 0.0 282944 392? Ssl August 10 0:06/usr/lib/accountsservice/accounts-daemon
  46. Syslog 1139 0.0 0.0 256396 352? Ssl August 10 0:04/usr/sbin/rsyslogd-n
  47. Avahi 1145 0.0 0.0 44900 1092? Ss August 10 0:11 avahi-daemon: running [hos-machine.local]
This is a detailed description of the ubuntu system. We can clearly see the comparison between VMZ and RSS. VMZ is the virtual address space applied by the process, while RSS is the actual physical memory space occupied by the process.
Generally, if a process has memory leakage, VMZ will increase and the relative physical memory will increase. If so, check whether the malloc/free matches. Based on the process ID, we can view detailed VMZ-related information. Example:
 
 
  1. root@hos-machine:~# cat /proc/1298/status
  2. Name:sshd
  3. State:S (sleeping)
  4. Tgid:1298
  5. Ngid:0
  6. Pid:1298
  7. PPid:1
  8. TracerPid:0
  9. Uid:0000
  10. Gid:0000
  11. FDSize:128
  12. Groups:
  13. NStgid:1298
  14. NSpid:1298
  15. NSpgid:1298
  16. NSsid:1298
  17. VmPeak: 65620 kB
  18. VmSize: 65520 kB
  19. VmLck: 0 kB
  20. VmPin: 0 kB
  21. VmHWM: 5480 kB
  22. VmRSS: 5452 kB
  23. VmData: 580 kB
  24. VmStk: 136 kB
  25. VmExe: 764 kB
  26. VmLib: 8316 kB
  27. VmPTE: 148 kB
  28. VmPMD: 12 kB
  29. VmSwap: 0 kB
  30. HugetlbPages: 0 kB
  31. Threads:1
  32. SigQ:0/7814
  33. SigPnd:0000000000000000
  34. ShdPnd:0000000000000000
  35. SigBlk:0000000000000000
  36. SigIgn:0000000000001000
  37. SigCgt:0000000180014005
  38. CapInh:0000000000000000
  39. CapPrm:0000003fffffffff
  40. CapEff:0000003fffffffff
  41. CapBnd:0000003fffffffff
  42. CapAmb:0000000000000000
  43. Seccomp:0
  44. Cpus_allowed:ffffffff,ffffffff
  45. Cpus_allowed_list:0-63
  46. Mems_allowed:00000000,00000001
  47. Mems_allowed_list:0
  48. voluntary_ctxt_switches:1307
  49. nonvoluntary_ctxt_switches:203
If we want to see how many files this process has opened
Ls-l/proc/1298/fd/* | wc
View detailed memory ing information of a process

Cat/proc/7393/maps
Let's take a look at the meminfo Annotations: see documentation/filesystem/proc.txt

  1. MemTotal: Total usable ram (I. e. physical ram minus a few reserved bits and the kernel binary code)
  2. MemFree: The sum of LowFree + HighFree
  3. Buffers: Relatively temporary storage for raw disk blocks shouldn't get tremendously large (20 MB or so)
  4. Cached: in-memory cache for files read from the disk (the pagecache). doesn' t include
  5. SwapCached: Memory that once was swapped out, is swapped back in but still also is in the swapfile (if memory is needed it
  6. Doesn't need to be swapped out AGAIN because it is already in the swapfile. This saves I/O)
  7. Active: Memory that has been used more recently and usually not reclaimed unless absolutely necessary.
  8. Inactive: Memory which has been less recently used. It is more eligvisible to be reclaimed for other purposes
  9. HighTotal:
  10. HighFree: Highmem is all memory abve ~ 860 MB of physical memory Highmem areas are for use by userspace programs, or
  11. For the pagecache. The kernel must use tricks to access this memory, making it slower to access than lowmem.
  12. LowTotal:
  13. LowFree: Lowmem is memory which can be used for everything that highmem can be used for, but it is also available for
  14. Kernel's use for its own data structures. Among has other things, it is where everything from the Slab is
  15. Allocated. Bad things happen when you're out of lowmem.
  16. SwapTotal: total amount of swap space available
  17. SwapFree: Memory which has been evicted from RAM, and is temporarily on the disk
  18. Dirty: Memory which is waiting to get written back to the disk
  19. Writeback: Memory which is actively being written back to the disk
  20. AnonPages: Non-file backed pages mapped into userspace page tables
  21. AnonHugePages: Non-file backed huge pages mapped into userspace page tables
  22. Mapped: files which have been mmaped, such as libraries
  23. Slab: in-kernel data structures cache
  24. SReclaimable: Part of Slab, that might be reclaimed, such as caches
  25. SUnreclaim: Part of Slab, that cannot be reclaimed on memory pressure
  26. PageTables: amount of memory dedicated to the lowest level of page tables.
  27. NFS_Unstable: NFS pages sent to the server, but not yet committed to stable storage
  28. Bounce: Memory used for block device "bounce buffers"
  29. WritebackTmp: Memory used by FUSE for temporary writeback buffers
  30. CommitLimit: Based on the overcommit ratio ('vm. overcommit_ratio '), this is the total amount of memory currently available
  31. Be allocated on the system. This limit is only adhered to if strict overcommit accounting is enabled (mode 2 in
  32. 'Vm. overcommit_memory ').
  33. The CommitLimit is calculated with the following formula: CommitLimit = ('vm. overcommit_ratio '* Physical RAM) + Swap
  34. For example, on a system with 1G of physical RAM and 7G
  35. Of swap with a 'vm. overcommit_ratio 'of 30 it wowould
  36. Yield a CommitLimit of 7.3G.
  37. For more details, see the memory overcommit documentation in vm/overcommit-accounting.
  38. Committed_AS: The amount of memory presently allocated on the system. The committed memory is a sum of all of the memory which
  39. Has been allocated by processes, even if it has not been
  40. "Used" by them as of yet. A process which malloc ()'s 1G
  41. Of memory, but only touches 300 M of it will only show up as using 300 M of memory even if it has the address space
  42. Allocated for the entire 1G. This 1G is memory which has been "committed" to by the VM and can be used at any time
  43. By the allocating application. With strict overcommit enabled on the system (mode 2 in 'vm. overcommit_memory '),
  44. Allocations which wowould exceed the CommitLimit (detailed abve) will not be permitted. This is useful if one needs
  45. To guarantee that processes will not fail due to lack of memory once that memory has been successfully allocated.
  46. VmallocTotal: total size of vmalloc memory area
  47. VmallocUsed: amount of vmalloc area which is used
  48. VmallocChunk: largest contiguous block of vmalloc area which is free
We only need to pay attention to a few items. buffers/cache/slab/active/anonpages

Active = Active (anon) + Active (file) (also Inactive)
AnonPages: Non-file backed pages mapped into userspace page tables \
The difference between buffers and cache is clearly explained.
Sometimes it is not a memory leak, but it also causes the system to crash. For example, the cache and buffers occupy too much and open too many files. It is a very long process to wait for the system to automatically recycle them.
The meminfo file in the proc directory shows the usage of the current system memory. The available physical memory is memfree + buffers + cached. When memfree is insufficient, the kernel will pass
The write-back mechanism (pdflush thread) writes the cached and buffered memories back to the backup storage to release the memory for use by the process or explicitly release the cache memory manually.

  1. Drop_caches
  2. Writing to this will cause the kernel to drop clean caches, dentries and inodes from memory, causing that memory to become free.
  3. To free pagecache:
  4. Echo 1>/proc/sys/vm/drop_caches
  5. To free dentries and inodes:
  6. Echo 2>/proc/sys/vm/drop_caches
  7. To free pagecache, dentries and inodes:
  8. Echo 3>/proc/sys/vm/drop_caches
  9. As this is a non-destructive operation and dirty objects are not freeable, the user shoshould run 'sync' first
The user space memory detection can also be used through mtrace to detect usage is also very simple, we have mentioned in the previous article, including the famous tool valgrind, dmalloc, memwatch and so on. Each has its own characteristics.

The location of kernel memory leakage is complicated. First, determine whether the kernel is leaked, and then locate the specific operations, and then check some suspicious modules. The kernel memory operations are basically kmalloc.
That is, through the slab/slub/slob mechanism, if slab continues to grow in meminfo, it is likely to be a kernel issue. We can view slab information in more detail.
Cat/proc/slabinfo
If slabtop is better supported, you can basically determine whether the kernel has a memory leak and when it operates on what objects.
  1. Cat/proc/slabinfo
  2. Slabinfo-version: 2.1
  3. # Name <active_objs> <num_objs> <objsize> <strong> <pagesperslab>: tunables <limit> <batchcount> <sharedfactor>: slabdata <active_slabs> <num_slabs> <sharedavail>
  4. Fuse_request 0 0 288 28 2: tunables 0 0 0: slabdata 0 0 0
  5. Fuse_inode 0 0 448 18 2: tunables 0 0 0: slabdata 0 0 0
  6. Fat_inode_cache 0 0 424 19 2: tunables 0 0 0: slabdata 0 0 0
  7. Fat_cache 0 0 24 170 1: tunables 0 0 0: slabdata 0 0 0
In the Kernel configuration, some memleak automatic check options are supported, which can be opened for tracking and debugging.
There are no in-depth things here ~.







Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.