DPDK memory management ----- (1) Initialization

Source: Internet
Author: User

1 Preface

DPDK uses hugetlbfs to reduce the number of Miss times on the cpu tlb table and improve performance.

2 Initialization

DPDK memory initialization is mainly to effectively organize the large memory page configured by hugetlbfs based on whether the mapped physical address is consecutive and the Socket to which it belongs, convenience for subsequent management.

2.1 eal_hugepage_info_init ()

Eal_hugepage_info_init () is used to obtain information about the configured Hugetlbfs and save it in the struct internal_config data structure.

The main work is as follows:

1. Read the subdirectories in the/sys/kernel/mm/hugepages directory and obtain the subdirectories of hugetlbfs by judging that the directory name contains the "hugepages-" string, and get the size of the memory page configured by hugetlbfs. For example:

[[Email protected] _ DEFAULT ~] # Ls-ltr/sys/kernel/mm/hugepages/
Total 0
Drwxr-xr-x 2 root 0 hugepages-2048kB

2. Read/proc/mounts information to find the mount point of hugetlbfs. For example:

[Email protected]: ~ # Cat/proc/mounts
Rootfs/rootfs rw 0 0
Sysfs/sys sysfs rw, nosuid, nodev, noexec, relatime 0 0
Proc/proc rw, nosuid, nodev, noexec, relatime 0 0
Udev/dev devtmpfs rw, relatime, size = 1016836 k, nr_inodes = 254209, mode = 755 0 0
Devpts/dev/pts devpts rw, nosuid, noexec, relatime, gid = 5, mode = 620, ptmxmode = 000 0 0
Tmpfs/run tmpfs rw, nosuid, noexec, relatime, size = 205128 k, mode = 755 0 0
/Dev/disk/by-uuid/fd1dbca3-ac30-4bac-b93a-0d89b0fd152c/ext4 rw, relatime, errors = remount-ro, user_xattr, barrier = 1, data = ordered 0 0
None/sys/fs/fuse/connections fusectl rw, relatime 0 0
None/sys/kernel/debug debugfs rw, relatime 0 0
None/sys/kernel/security securityfs rw, relatime 0 0
None/run/lock tmpfs rw, nosuid, nodev, noexec, relatime, size = 5120 k 0 0
None/run/shm tmpfs rw, nosuid, nodev, relatime 0 0
None/media/sf_F_DRIVE vboxsf rw, nodev, relatime 0 0
Gvfs-fuse-daemon/home/chuanxinji/. gvfs fuse. gvfs-fuse-daemon rw, nosuid, nodev, relatime, user_id = 1000, group_id = 1000 0 0
/Dev/sr0/media/VBOXADDITIONS_4.3.10_93012 iso9660 ro, nosuid, nodev, relatime, uid = 1000, gid = 1000, iocharset = utf8, mode = 0400, dmode = 0500 0 0
None/mnt/huge hugetlbfs rw, relatime 0 0
[Email protected]: ~ #

3. Get the number of hugepages by reading/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages.

[Email protected]: ~ # Cat/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
64
[Email protected]: ~ #

4. Open the mount point directory as a file and set a mutex lock for its FD. Why ??

All the information obtained above is saved in internal_config.hugepage_info [MAX_HUGEPAGES_SIZE]. The hugepage_info data structure is as follows:

1 struct hugepage_info {2 size_t hugepage_sz;/** <size of a huge page */3 const char * hugedir; /** <dir where hugetlbfs is mounted */4 uint32_t num_pages [RTE_MAX_NUMA_NODES]; 5/*** <number of hugepages of that size on each socket */6 int lock_descriptor; /** <file descriptor for hugepage dir */7 };

The specific values are as follows,

Hpi-> hugepage_sz = 2 M;
Hpi-> hugedir =/mnt/huge;
Hpi-> num_pages [0] = 64; // because you do not know which memory pages are located on which socket, it is first placed on socket-0.
Hpi-> lock_descriptor = open (hpi-> hugedir, O_RONLY );

5. Sort internal_config.hugepage_info [MAX_HUGEPAGES_SIZE] by memory page size.

2.2 rte_eal_config_create ()

Rte_eal_config_create () is used to initialize rte_config.mem_config. If the dpdk program is run as a root user, rte_config.mem_config points to a memory of sizeof (struct rte_mem_config) in the/var/run/. rte_config file mmap.

Rte_config.mem_config =/var/run/. rte_config the first address of the mmap file;

1 struct rte_config {2 uint32_t master_lcore;/** <Id of the master lcore */3 4 ...... 5 6 struct rte_mem_config * mem_config; 7} _ attribute _ (_ packed __));

The data structure of struct rte_mem_config is as follows:

1 struct rte_mem_config {2 volatile uint32_t magic;/** <Magic number-Sanity check. */3 4/* memory topology */5 uint32_t nchannel;/** <Number of channels (0 if unknown ). */6 uint32_t nrank;/** <Number of ranks (0 if unknown ). */7 8/** 9 * current lock nest order10 *-qlock-> mlock (ring/hash/lpm) 11 *-mplock-> qlock-> mlock (mempool) 12 * Notice: 13 ** ALWAYS * obtain qlock first if having to obtain both qlock and mlock14 */15 rte_rwlock_t mlock;/** <only used by memzone LIB for thread-safe. */16 rte_rwlock_t qlock;/** <used for tailq operation for thread safe. */17 rte_rwlock_t mplock;/** <only used by mempool LIB for thread-safe. */18 19 uint32_t memzone_idx;/** <Index of memzone */20 21/* memory segments and zones */22 struct rte_memseg memseg [RTE_MAX_MEMSEG]; /** <Physmem descriptors. */23 struct rte_memzone memzone [RTE_MAX_MEMZONE];/** <Memzone descriptors. */24 25/* Runtime Physmem descriptors. */26 struct rte_memseg free_memseg [RTE_MAX_MEMSEG]; 27 28 struct rte_tailq_head tailq_head [RTE_MAX_TAILQ]; /** <Tailqs for objects */29 30/* Heaps of Malloc per socket */31 struct malloc_heap malloc_heaps [RTE_MAX_NUMA_NODES]; 32} _ attribute _ (_ packed __));

 

2.3 rte_eal_hugepage_init ()

Rte_eal_hugepage_init () is mainly used to create the rtemap_xx file of the number of memory pages configured by hugetlbfs in the/mnt/huge Directory (64 in this article) and map mmap for each rtemap_xx file, make sure that the virtual address after mmap is the same as the actual physical address.

The details are as follows:

1. Create an nr_hugepages struct hugepage_file array, which contains the number of memory pages and the number of struct hugepage_file data structures. The data structure of struct hugepage_file is as follows:

1 struct hugepage_file {2 void * orig_va;/** <virtual addr of first mmap () */3 void * final_va;/** <virtual addr of 2nd mmap () */4 uint64_t physaddr;/** <physical addr */5 size_t size;/** <the page size */6 int socket_id; /** <NUMA socket ID */7 int file_id;/** <the '% d' in HUGEFILE_FMT */8 int memseg_id; /** <the memory segment to which page belongs */9 # ifdef RTE_EAL_SINGLE_FILE_SEGMENTS10 int repeated; /** <number of times the page size is repeated */11 # endif12 char filepath [MAX_HUGEPAGE_PATH];/** <path to backing file on filesystem */13 };

 

2. How many memory pages are there and how many rtemap_xx files are created under the mount point directory, as shown below, and a hugepage_sz memory area is created for each file mmap. Where,

Hugepage_file-> orig_va = records the first address of the mmap of each rtemap_xx file;

Hugepage_file-> file_id = the order of the created rtemap_xx, that is, the value of xx;

Hugepage_file-> filepath =/mnt/huge/rtemap_xx;

Hugepage_file-> size = hugepage_sz, that is, 2 M;

[Email protected]: ~ # Ls-LR/mnt/huge/
Total 131072
-Rwxr-xr-x 1 root 2097152 Nov 5 rtemap_2
-Rwxr-xr-x 1 root 2097152 Nov 5 rtemap_1
-Rwxr-xr-x 1 root 2097152 Nov 5 rtemap_0
-Rwxr-xr-x 1 root 2097152 Nov 5 rtemap_8
-Rwxr-xr-x 1 root 2097152 Nov 5 rtemap_7
-Rwxr-xr-x 1 root 2097152 Nov 5 rtemap_6

......

-Rwxr-xr-x 1 root 2097152 Nov 5 rtemap_60
-Rwxr-xr-x 1 root 2097152 Nov 5 rtemap_59
-Rwxr-xr-x 1 root 2097152 Nov 5 rtemap_58
-Rwxr-xr-x 1 root 2097152 Nov 5 rtemap_63
-Rwxr-xr-x 1 root 2097152 Nov 5 rtemap_62
-Rwxr-xr-x 1 root 2097152 Nov 5 rtemap_61
[Email protected]: ~ #

3. Read the/proc/self/pagemap page table file to obtain the ing between virtual addresses and physical addresses in the process. Use the virtual address obtained by mmap of each rtemap_xx file in the previous step and divide it by the size of the memory page of the operating system 4 k to get an offset. According to this offset, in/prox/self/pagemap, the page of the physical address is displayed. If it is page, the page of the physical page is multiplied by the size of the memory page of the operating system (4 K, the page offset of the virtual address is the physical address. The physical address mapped to each rtemap_xx is stored in the corresponding hugepage_file-> physaddr.

1 physaddr = (page & 0x7fffffffffffull) * page_size) + (unsigned long) virtaddr % page_size );

 

4. Read/proc/self/numa_maps and obtain the Socket on which the virtual address obtained by mmap of each rtemap_xx file is located, that is, the CPU. Its socketid is saved in the corresponding hugepage_file-> socket_id.

[Email protected]: ~ # Cat/proc/self/numa_maps
00400000 default file =/bin/cat mapped = 7 mapmax = 2 N0 = 7
0060a000 default file =/bin/cat anon = 1 dirty = 1 N0 = 1
0060b000 default file =/bin/cat anon = 1 dirty = 1 N0 = 1
025c1000 default heap anon = 3 dirty = 3 active = 0 N0 = 3
7fdf0222c000 default file =/usr/lib/locale-archive mapped = 10 mapmax = 61 N0 = 10
7fdf0290f000 default file =/lib/x86_64-linux-gnu/libc-2.15.so mapped = 82 mapmax = 128 N0 = 82
7fdf02ac4000 default file =/lib/x86_64-linux-gnu/libc-2.15.so
7fdf02cc3000 default file =/lib/x86_64-linux-gnu/libc-2.15.so anon = 4 dirty = 4 N0 = 4
7fdf02cc7000 default file =/lib/x86_64-linux-gnu/libc-2.15.so anon = 2 dirty = 2 N0 = 2
7fdf02cc9000 default anon = 3 dirty = 3 active = 1 N0 = 3
7fdf02cce000 default file =/lib/x86_64-linux-gnu/ld-2.15.so mapped = 27 mapmax = 122 N0 = 27
7fdf02ed7000 default anon = 3 dirty = 3 N0 = 3
7fdf02eee000 default anon = 2 dirty = 2 N0 = 2
7fdf02ef0000 default file =/lib/x86_64-linux-gnu/ld-2.15.so anon = 1 dirty = 1 N0 = 1
7fdf02ef1000Default file =/lib/x86_64-linux-gnu/ld-2.15.so anon = 2 dirty = 2 N0= 2
7fff09be1000 default stack anon = 3 dirty = 3 N0 = 3
7fff09cc2000 default
[Email protected]: ~ #

5. In the hugepage_file array, sort hugepage_file in the ascending order based on the physical address.

6. Check whether the physical address is consecutive based on the result of sorting by physical address, and re-run The mmap/mnt/huge/retmap_xx file,Make the physical address equal to the virtual address after the second mmap. The virtual address obtained by the second mmap is saved in the corresponding hugepage_file-> final_va.

7. munmap releases the memory address obtained by the first mmap of each rtemap_xx file in step 1.

8. Calculate the number of hugepages contained in each socket and save the information in internal_config.hugepage_info [0]. num_pages [socket.

9. What is the purpose of calc_num_pages_per_socket ???

10. For/var/run /. the rte_hugepage_info file mmap contains a memory block of nr_hugepages * sizeof (struct hugepage_file) and copies all the content in the hugepage_file array created in step 1 to this memory.

11. rte_config.mem_config-> memseg [] the physical address of hugepage_file created in the array record is continuous, hugepage_file-> memseg_id is the rte_config.mem_config-> memseg [] array of the physical address of the huepage_file.

2.4 rte_eal_memzone_init ()

Rte_eal_memzone_init () initializes rte_config.mem_config-> free_memseg [] and rte_config.mem_config-> memzone []. Here, rte_config.mem_config-> free_memseg [] records idle rte_config.mem_config-> memseg [].

 

DPDK memory management ----- (1) Initialization

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.