Rps/rfs principle __linux in Linux kernel network

Last Update:2018-07-26 Source: Internet

Author: User

Tags data structures

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

rps/rfs principle in Linux kernel network

In the last chapter, explained the network soft interrupt's mechanism and the optimization in the whole, but in the RPS and the RFS place did not completely explain clearly, only has described its overall function and involves the original intention. This article, further depth and understand the RPS and RFS mechanism. 1.1.1 Self-belt irqbalance bottleneck

Based on simple interrupt load balancing, such as a system-irqbalance process, it can be self-defeating. Because it does not recognize the network stream, it only recognizes that this is a packet and does not recognize the packet's tuple information.

Every processor in a multiprocessor system has a separate hardware cache, if one of the CPUs modifies its own hardware cache, it must have the data included in the hardware cache of the other CPUs, and if so, the other CPUs must be notified to update the hardware cache, which is called the cache consistency of the CPU. Therefore, in order to reduce the CPU hardware cache refresh frequency, it is necessary to assign a similar feature of the packet to the same CPU processing.

In addition, in the TCP IP packet segmentation problem, once the chaotic sequence will be retransmission, a Linux host if only as a router, then enter the system of a TCP packet of different sections if the different CPU processing and forwarding to a network card, then sync problem will be very troublesome, if not do synchronous processing, Then it is likely that the next segment will be a CPU first sent out, the final receiver received a disorderly packet after the request to the postback, so it is not as much as a CPU serial processing.

This requires a set of solutions that need to figure out three questions:

Lââ which CPU is consuming this network data

Lââ which CPU is processing interrupts

Lââ which CPU is doing soft interrupts

This three problem is rps/rfs need to solve. 1.1.2 Data Structure

Many of the secrets at the bottom of Linux are hidden in data structures, including their object-oriented design. So look at these data structures first.

The hardware receive queue Netdev_rx_queue of the NIC, defined in the file:

In Include/linux/netdevice.h.

struct Netdev_rx_queue {

#ifdef Config_rps

struct Rps_map __rcu *rps_map;

struct rps_dev_flow_table __rcu *rps_flow_table;

#endif

struct Kobject kobj;

struct Net_device *dev;

struct Xdp_rxq_info xdp_rxq;

} ____CACHELINE_ALIGNED_IN_SMP;

The result of the parsing is the RPS_MAP member of the NIC hardware Receive queue instance, the CPUs array is used to record the array of CPUs configured in the configuration file that participate in packet distribution processing, and the Len member is the length of the CPUs array. The parse function is defined for Store_rps_map in the file NET/CORE/NET-SYSFS.C.

* This structure holds a RPS map which can be of variable length. The

* The map is an array of CPUs.

struct Rps_map {

unsigned int len;

struct Rcu_head RCU;

U16 Cpus[0];

};

Device flow table rps_dev_flow_table, defined in file: Include/linux/netdevice.h. The mask member is the size of an array of type struct Rps_dev_flow, the number of flow table entries, specified by the configuration file/sys/class/net/(dev)/queues/rx-(n)/rps_flow_cnt. When the configuration file is set, the kernel acquires the data and initializes the device Flow table member rps_flow_table (initialization process function store_rps_dev_flow_table_cnt) in the network adapter hardware receive queue.

* The rps_dev_flow_table structure contains a table of flow mappings.

struct Rps_dev_flow_table {

unsigned int mask;

struct Rcu_head RCU;

struct Rps_dev_flow flows[0];

};

Instances of the Rps_dev_flow type mainly include two members of the CPU that last processed the message in the stream and the Input_pkt_queue Queue Tail index of the CPU private data Object Softnet_data.

* The RPS_DEV_FLOW structure contains the mapping of a flow to a CPU, the

* Tail pointer for this CPU ' s input queue at the time of last enqueue, and

* A hardware filter index.

struct Rps_dev_flow {

U16 CPU;

U16 Filter;

unsigned int last_qtail;

};

The following rps_sock_flow_table is a global data flow table that contains the CPU that the data stream expects to be processed, and the CPU where the message in the current processing stream is located. The Global socket Flow table is tuned recvmsg,sendmsg (Inet_accept (), inet_recvmsg (), inet_sendmsg (), Inet_sendpage () and Tcp_splice_read ()) is set to update. The final call to the function Rps_record_sock_flow to update the Ents array.

The mask member stores the size of the Ents array, as specified by the configuration file/proc/sys/net/core/rps_sock_flow_entries.

The Rps_record_sock_flow function is defined in the Include/linux/netdevice.h file,

Every time a user program reads a packet, it updates the Rps_sock_flow_table table to ensure that the CPU number is up to date

* The rps_sock_flow_table contains mappings of flows to the last CPU

* on which they were processed by the application (set in recvmsg).

* Each entry is a 32bit value. Upper part is the high-order bits

* ' flow hash, lower ' is CPU number.

* Rps_cpu_mask is used to partition the spaces, depending on number of

* Possible cpus:rps_cpu_mask = Roundup_pow_of_two (nr_cpu_ids)-1

* For example, if CPUs are possible, Rps_cpu_mask = 0x3f,

* meaning we use 32-6=26 bits for the hash.

struct Rps_sock_flow_table {

U32 Mask;

U32 Ents[0] ____CACHELINE_ALIGNED_IN_SMP;

};

1.1.3 RPS

The work of RPS and RFS is performed in the context of a soft interrupt, because the processing of this phase is not process-independent, but is stripped of the underlying hardware. It can realize the load balance of soft interrupt of network protocol stack.

The total flow of the RPS implementation is as follows: Add the packet to the receiving queue of the other CPUs, the other CPUs will execute in their soft interrupts process_backlog,process_backlog will receive all packets in the queue and invoke the __netif_receive_ SKB () perform follow-up work. 1.1.3.1 RPS Configuration

Linux is the configuration file to specify which CPU core to participate in the distribution of packets, the configuration file is stored in the path:/sys/class/net/(Dev)/queues/rx-(n)/rps_cpus.

After the configuration file is set up, the kernel will get the contents of the configuration file, and then generate a list of CPUs to participate in the packet distribution processing according to the result of the analysis, so that when the message is received, the HASH-CPU mapping relationship can be established, and the analytic function is store_rps_map. The results are stored in the Rps_map. 1.1.3.2 RPS Details

RPS will be based on packet hash value (packet hash value, can be calculated from the network card, can also be calculated by the software, the specific calculation is also due to different message protocol, such as TCP packet hash value is based on the four-tuple information, that is, source IP, source port, Destination IP and destination port for hash calculation to choose the CPU, select the Target CPU action specific implementation function is GET_RPS_CPU, which is defined in the Net/core/dev.c file, to achieve from the CPU list to obtain the nuclear number:

Staticint get_rps_cpu (struct net_device *dev, struct sk_buff *skb, Structrps_dev_flow **rflowp)

RPS is simply the hash value of the message to distribute the message, but not the processing of the message in the flow of the application of the CPU.

So when is the function get_rps_cpu called?

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More