In front
0. 1. This article does not involve specific implementation, source code, or code profiling.
. This article does not argue the implementation details between Linux or Cisco IOS versions.
0. 3. Please note that this article is incorrect
Cisco is undoubtedly the leader in the network field, while Linux is the most dynamic operating system kernel. Linux can almost achieve all network features, but there is certainly room for optimization, this article first looks at Cisco, then analyzes the corresponding features of Netfilter from different angles, and finally proposes an ip_conntrack optimization solution.
. My daughter was born yesterday and she was not crying, so she had to sort out this document. I 've been tired for the past few days, but I am still working on this document.
1. Similarities and Differences of Design
Netfilter is a well-designed framework. The reason why netfilter is a framework is that it provides the most basic underlying support, but it does not pay much attention to implementation, this underlying support is actually the five hook points:
Prerouting: Before a packet enters the network layer to route immediately
Forward: After the packet is routed, confirm that the packet is to be forwarded.
Input: After the packet is routed, make sure that the packet is received locally.
Output: Send local data packets (see appendix 4 for details)
Postrouting: Before the packet is sent immediately
1) Hook Point Design:
The hook point of Netfilter is actually a fixed "checkpoint". These checkpoints are embedded in the network protocol stack, and they are unconditionally inserted into the protocol stack, these checkpoint checks are performed unconditionally.
Compared with Cisco, we know that its ACL is also well-designed, but its idea is completely different from that of Netfilter. ACL is not embedded in the protocol stack, it is an "external list". The policy is contained in these lists. These lists must be bound to specific interfaces to take effect. In addition to binding to interfaces, the direction of the checked data packets must also be specified during binding. This indicates that the ACL is only an external policy and can be dynamically assigned to any location where the data packet access check is required.
2. Similarities and Differences between data streams-only forwarding considerations
1) For Cisco, the data packet path is as follows:
2) For Linux netfilter, the data packet passing path is as follows:
3. efficiency and flexibility 3. 1. Filter position
From the diagram of the data stream, we can see that netfilter's data packet filtering takes place at the network layer. This is actually a very late period. In terms of security, many attacks-especially DoS attacks against routers/servers-have been formed at this time. An effective prevention method is to discard data packets earlier, which is also the Cisco policy: "discard data packets as early as possible ". Cisco does the same, as shown in the figure above. Cisco filters occur before a route.
3. 2. Filter table entries
Because netfilter is a global filtering framework embedded in the protocol stack and has a high position, it is difficult to distinguish "which packages should match which policies, the Cisco ACL is configured on the NIC interface and specifies the direction of the matching data packet. Therefore, by distinguishing the interface and direction of the NIC, in the end, a packet only needs to be matched by a "part rather than all" policy. For example, the data packet that enters from ether0 will only match the ACL configured in the Inbound direction of ether0.
3.3.nat location
Netfilter Nat occurs before and after the filter, while Cisco Nat also occurs in the filter, which has a great impact on the configuration of the filter policy of the two. For systems that use netfilter, you need to configure the address after DNAT or before SNAT. For Cisco, you need to configure the address before or after DNAT.
3. 4. configuration flexibility 3.4.1.cisco's ACL Configuration is flexible, and even the information "Configure to interface" and "specify direction" is external, which is in line with the KISS Principle of UNIX philosophy, however, engineers have higher requirements on specific configurations. They should not only consider matching items and other information, but also plan interfaces. 3.4.2.netfilter is designed to be more integrated. It integrates interfaces and directions in a unified manner in "matching items". Engineers only need to know IP information or transport layer information to configure it, if they do not care about the interface or even do not need to specify the interface information, there are many iptables that do not use the-I and-O options.
4. netfilter optimization 4. 1. Firewall policy search optimization 4.1.1. Summary
In the traditional sense, netfilter linearly arranges all rules in the order of configuration, and each packet must pass through all these rules, which greatly reduces the efficiency. With the increase of rules, the efficiency will decrease linearly. It would be better to make a data packet match only a part of the rules. That is to say, we need to classify the rules, and then first match the data packets in the past with an efficient algorithm to a specific classification, then, the data packet only needs to continue matching the rules in the classification.
Classification is very simple. It is based on a simple analytical geometric fact:On a line segment, the whole line segment is divided into three parts:
Therefore, any matching item can be attributed to a so-called "key value". In this key value space, there must be some order for sorting. Then, a key value, the key-value space can be divided into three parts: greater than, equal to, less. The same is true for one-dimensional space and n-dimensional space, but it is more accurate. Here N is the matching field we have selected. In order to better understand the following discussion, we will first give two images. Shows the traditional firewall rule matching operations, which are flat:
As shown in the following figure, the optimized firewall rule matching operation is divided into dimensions:
In the end, only the areas enclosed by the dotted line have rules to match, and only the data packets "fall into" these areas need to match the rules, otherwise they all follow the "Policy. Of course, a data packet cannot fall into two regions. Only the source IP address and destination IP address are considered here. If the layer-4 protocol and port information are added, the matching is more accurate, as long as the "class" matching algorithm is exquisite enough, operations will not increase with the increase of rules, and this part of content is exactly what we will discuss soon.
4.1.2.cisco optimization policy
Many people who have used Cisco know that Cisco has a concept called Turbo ACL. The purpose of this turbo ACL is to "no longer use rules to match data packets, it becomes a rule for data packet information search to match ". This means that when the ACL is inserted into the system, it will be sorted. When the data packet enters, the information of the data packet will be used to find the rule set that has been retained.
To learn about Cisco's technical details, it is necessary to directly view the support on its official website. Here is the most direct description. Cisco's technical support has a good place, that is, it has situational analysis. I will use the above example for analysis below, basically based on a document: Turbo ACL.
The turbo ACL defines a series of matching domains, as shown in:
Green indicates Layer 3 Information, red indicates Layer 4 information, and pink indicates Layer 3 + Layer 4 information. A table exists for each matching field, which is called a "value table ":
The index is used for search and management convenience, while the value is filled in with the value of the matching domain of the expected table in the rule. The ACL bitmap indicates which ACLs the table's records match. Therefore, for all matching fields, because there are a total of eight matching fields, there are eight such tables. To make it easier to understand, let's look at the following four ACL rules:
# Access-list 101 deny TCP 192.168.1.0 0.0.255 192.168.2.0 0.0.0.255 EQ Telnet
# Access-list 101 permit TCP 192.168.1.0 0.0.255 192.168.2.0 0.0.0.255 EQ HTTP
# Access-list 101 deny TCP 192.168.1.0 0.0.255 192.168.3.0 0.0.255 EQ HTTP
# Access-list 101 deny ICMP 192.168.1.0 0.0.0.255 200.200.200.0 0.0.0.255
After these rules are entered in the match domain table, the match domain table is as follows:
Then, only a value table of "source address 1" is provided:
So far, we have provided all the static data structures, followed by specific dynamic operations, which are regarded as an algorithm. Cisco's rule matching algorithm is hierarchical and can be computed in parallel, so it is extremely efficient. The entire algorithm is divided into two parts:
1). Find data packets based on bitmap of all matching Domains
This step can be performed in parallel. For example, you can search the value table of "source address 1" and the value table of "source address 2" on both processors at the same time to maximize CPU utilization, the two bitmaps are obtained at the fastest speed. The algorithm does not specify the search algorithm used. It depends on how to insert the values of the matching fields into the corresponding value table when adding the ACL. In addition, it is an indisputable fact that we usually have very few dynamic inserts, which are usually static inserts. Therefore, the performance requirements for data insertion are not high, the key element is search. The search efficiency of this search algorithm is very important. If a good algorithm is O (1), it means that the time consumed by the matching rule process will not increase with the increase of rules, in fact, even the O (n) search algorithm converts n matching operations to a A * n query operation with a much smaller proportion, often a is a small number smaller than 1...
2). Multiple bitmap and operations
Take the intersection of multiple results and finally get one or several ace entries. This Bitmap Algorithm is Cisco's usual space-for-time strategy, and the legendary 256 cross-tree uses this policy.
The flowchart of the operation is as follows:
As a scenario analysis, we consider the arrival of a data packet. The value of its matching domain is as follows:
Source Address 1: 192.168
Source Address: 2: 1.1
Destination Address: 1: 200.200
Destination 2: 200.1
L4 Protocol: 0001 (ICMP)
The operation flowchart for this package is as follows, assuming that only the preceding example of ACL is available:
The final result is 0001, that is, only the last rule matches.
In this way, we will end the discussion of the turbo ACL. Next we will look at whether the Linux netfilter has any peering optimization policies.
4.1.3.filter optimization policy of Netfilter
Netfilter has a project calledNF-hipac
Its code is extremely complex, its documentation is extremely scarce, and its functions are more limited than iptables. In addition, Linux does not have much demand for massive rules, so hipac is not highly available, however, it is also advantageous to analyze it from a theoretical perspective. Although hipac code is a terrible thing, it is not difficult to browse it. In the end, we found that its implementation is basically the same as that of Turbo ACL, it is also based on data packets that first match the matching domain to get the classification first. It uses almost the same but more matching domains. Unlike Turbo ACL, bitmap is not used, because Linux may not allow changing the time by space, huh, huh...
Hipac does not use bitmap because it does not need bitmap at all,
Because CISCO concurrently obtains bitmaps of all matching domain value tables, the final result can be obtained by combining them and. However, hipac does not operate in parallel, but serially, hipac also has a value table for each matching domain, because a series of matching domains are arranged in a certain order, such as: Source Address-Destination Address-Protocol-source port-destination port, therefore, the value table also has such a concatenation relationship, as shown below:
The target address does not match the Protocol or the domain that matches the target address. The specific rules are attached to the final matching domain value table. Hipac does not retain the original configuration rules and finds them through bitmap. Instead, it directly links the rules to the position where they should be. A hipac flowchart is as follows:
4.1.4.comparison between Cisco and netfilter
They use the same search algorithm, but the specific operations are quite different. We can see that Cisco is completely processing in parallel, while netfilter is a string to the end. If the image is understood, we can compare the entire operation to finding a point in a multi-dimensional space. There are two methods:
1). n dimensions are pushed forward at the same time, and finally the intersection areas of their paths are found;
2). First match in the first dimension and then match the second dimension...
We found that Cisco used the first method, while netfilter used the second method. I think there are two reasons why netfilter does not use the parallel method: first, netfilter is generally used in Linux, while Linux is a general operating system. The support for protocol stacks is only a part of its functions, if parallel mechanism is introduced for the protocol stack, it will inevitably lead to an unbalanced situation. Second, Linux generally does not have thousands of firewall entries, and the above optimization algorithm is more effective in the case of more rule entries. In addition, the NF-hipac search mechanism of Netfilter reminds me of the Linux page table search and the trie tree search algorithm of the route table.
4.2.ip_conntrack Optimization
The ip_conntrack module of Netfilter implements the connection tracking function. However, I always think there is a weakness in this implementation. That is, it processes IP segments. The ip_conntrack of Netfilter is used to merge segments. The reason is that the IP layer is not connected, while the IP segment cannot obtain the layer-4 information, therefore, to obtain the layer-4 Information, you must wait for all segments to arrive before proceeding with the processing. This is a reason, and it is well said. However, I personally think this can be further optimized. We can solve this problem at another level,Just as we can identify whether a packet belongs to a stream by retaining only five elements of a stream, we can also reserve a "Source IP/destination IP/protocol/layer-3 ID" element for an IP datagram, which uniquely identifies an IP datagram (for the reason, see the appendix)
We only need to match these four elements with an IP segment to determine which IP segment it belongs to, and this knows which stream it belongs, of course, we only need to retain these four elements for the segmented datagram, but the IP address does not guarantee the order. What should we do if the incoming IP part is not the first one? This is very simple, it can only wait, wait for the first Shard to arrive, get the information of the four elements, and then process it.
Here is a flowchart. The reason is that we can only give this diagram. This document is written in the hospital. My family estimates that it is about to be born... let's change the code later. If any man reads the code and finds it interesting, try it and submit a patch to the kernel's netfilter group. Thank you:
In short, NF-hipac adopts a serial (can be corrected as parallel) multi-dimensional tree search algorithm. It is derived from a packet classification algorithm and is no longer a rule matching package, but a packet search rule. When the dimension is increased and the constraints are increased, positioning is more accurate. No matter how long it takes to search for a rule, it takes less time to match the rule by time. In the end, only a part of the rule can be selected.
4. 3. stateful Firewall Based on ip_conntrack after optimization
Since ip_conntrack is optimized, it will not be tired of IP sharding (in fact it will not be tired of it ). It is not difficult to implement a stateful Firewall Based on ip_conntrack. ip_conntrack retains the matching policy when the first packet arrives in the stream passes through the filter table. Specifically, it is a target, then, the subsequent packages are directly executed according to the target.
However, what is the difference between this firewall and hipac? This Firewall matches the stream during prerouting, and the first packet matches the rule in the filter, hipac only needs to match rules in the filter. for a large number of connections, stream matching will certainly be slow. However, if there are a large number of rules, hipac will not slow down, which is the advantage of hipac, it is the same as Cisco's Turbo ACL.
5. Details-how does the firewall process IP segments? 5. 1. Where is the problem?
Rfc1858 indicates two types of IP sharding attacks.
1). TCP Packet Attack
For this type of attacks, it is easy to understand that the first part of an IP datagram Shard is given:
Then let's take a look at the second one:
The offset field of the slice indicates the offset of the TCP load. In this way, attackers think that the firewall cannot identify the layer-4 Information of the slice, thus bypassing the firewall detection. The main attack point is, split a complete TCP header into two segments !. In fact, for a long time, as long as the Cisco ACL matches the layer-3 information of the datagram shard and the rule is permit, it will be ignored. In fact, rfc1858 provides a solution, you need to limit the minimum value of TCP load fragmentation, which is also the RFC's suggestion (but Cisco does not necessarily comply ).
2). TCP overlapping attacks (dependent on the reorganization algorithm)
Compared with 1), this is an indirect attack method. Please refer to the first IP segment:
Let's look at the Second part:
We can see that the first one is normal, but the second one is abnormal. If there is a problem with the merge algorithm of IP shard on the destination host, the information of the second slice overwrites the TCP header information of the first slice. Because the filter rule cannot obtain layer-4 Information from the IP segment, the data can easily bypass the firewall to launch attacks. The standards do not specify the specific constraints on IP sharding, which is the root cause of this attack.
5.2.cisco handling
Cisco uses a uniform method to process IP segments. It passes the process to configuration engineers to determine whether to allow IP segments to pass through the process. The flowchart is as follows, the flowchart shows how a single data packet matches the ACE:
Cisco engineers must show which parts cannot be configured. However, the new version of Cisco IOS still limits the passing of attack fragments mentioned in rfc1858.
5.3.netfilter Processing
Netfilter directly disables the passing of attack fragments mentioned in rfc1858. The flowchart is not shown. Just give a piece of code:
Static int <br/> tcp_match (const struct sk_buff * SKB, <br/> const struct net_device * In, <br/> const struct net_device * Out, <br/> const void * matchinfo, <br/> int offset, <br/> int * hotdrop) <br/>{< br/> struct tcphdr tcph; <br/> const struct ipt_tcp * tcpinfo = matchinfo; <br/> If (offset) {</P> <p> If (offset = 1) {<br/> duprintf ("dropping edevil TCP offset = 1 frag. /n "); <br/> * hotdrop = 1; // This package cannot pass <br/>}< br/> return 0; <br/>}< br/>... <br/>}
In addition, for non-first IP segments, netfilter matches all matching items higher than the network layer. For example, if an IP segment is generated, all TCP/UDP port information matches, then execute target directly, which is different from Cisco's policy. This can be seen from its flowchart.
5. Comparison
Both Cisco and netfilter split the matching items into two categories,One is implicit matching items, which only contain three layers of information, and the other is clear matching items, which contain higher level information.
-For Linux netfilter, this type of implicit match does not need to be registered, but it is clear that the match needs to be registered. The Cisco method is unknown, but this is not the case, it should be required or not required. For the filter and Cisco ACL of Netfilter, the explicit match is matched based on the implicit match. For details, refer to the Cisco process ACL flowchart and the netfilter code of the Linux kernel:
Do {<br/> // determine whether it matches the implicit match, and offset indicates whether it is an IP shard <br/> If (ip_packet_match (IP, indev, outdev, & E-> ip, offset) {<br/> for each exact match, as long as one does not match, it will jump to not-match. Otherwise, the target is executed. Because almost all registered matches directly return "match" for IP segments, all IP segments only need to match the implicit match. <Br/>}else {<br/> not-Match: <br/> next rule <br/>}< br/>}while (other rules are available)
6. Summary
On a relatively high level, we carefully observe the Network Design of Linux and Cisco IOS. The advantage of IOS is that it applies almost all of its energy to the network, the kernel mechanism of IOS is actually much simpler than that of Linux. However, it relies on a good overall design, so that almost everything can be configured,In iOS, any policy is configured. Although it has a default configuration file, it is also configured.
The Linux practices are completely different. The Linux Network policy is actually a combination of Netfilter and hard encoding. In the Linux kernel (network code), we can see a lot of comments, most of these annotations areAlan Cox
Many of the added statements are "to follow rfcxxxx ...". Of course, this hard encoding can also be configured, such as using the sysctl tool, but it cannot be configured using a unified tool. For example, you cannot use the ip command to open ip_forward...
I know that using netfilter can implement almost all the functions of Cisco IOS, and it can also be optimized similar to that of IOS. This is precisely the superiority of the netfilter framework, however, although it looks the same from the outside, it is meaningless to understand that the essence is quite different. In addition, Linux does not have to catch up with Cisco IOS, even if it is better than Cisco, I believe most people will still buy Cisco, because the technical factors in the market competition only account for a small part of the share, just as many people are engaged in the compatibility of linux windows, is this necessary, there is a registry in windows. Is there a similar registry in Linux? Everything is well. If we put our discussions in pure technical fields on the entire product layer, we will think it is stubborn and stubborn.
The netfilter framework is well designed, and every detail is worth pondering., Use it, understand it, modify it, optimize it, improve it, and use it...
This is a good learning process. You can also try it.
Appendix 0. Who does netfilter belong?
0.1.netfilter is a framework
, It is independent of the Linux kernel and it has its own website: http://www.netfilter.org/
0.2.netfilter has almost unlimited scalability,
Liuux only uses a small part of liuux. Most of the content is on standby as pluggable modules.
0.3.netfilter is integrated into the Linux kernel,
However, its policy extension is in an independent space. We say this so-called mechanism is only five hook points. We can see from netfilter.org that it integrates a large number of policies, and we are most familiar with iptables. The hipac mentioned above is also one of the extensions of Netfilter.
. It is enough to see how powerful netfilter is. The kernel only provides hook points.
If you think something is not good, you can implement a better one by yourself.
. In fact, netfilter does not integrate many things into the Linux kernel.
1. Figure 1: Kernel path of data packets
To provide a panoramic view of Netfilter in the Linux kernel, a diagram is provided, which details each part of Netfilter.
2. Reasons for using four elements in ip_conntrack optimization:
The IP layer provides four elements and clearly tracks an IP datagram. In fact, the protocol header at each layer of TCP/IP provides some PDU tracking information for this layer, because the IP layer is packet-based, its tracking information fully identifies an IP datagram, and all the packet fragments of an IP datagram share the same tracking information. Understand that 01:10 is simple and straightforward. Just like TCP/UDP port numbers and lower-level trace information can mark a stream-a stream indicates a session, and a lot of data packets are formed. In rfc791 (a very important IP protocol RFC), this point is clearly specified, and the TCP/IP details also indicate this point:
"The ID field uniquely identifies each datagram sent by the host. Generally, the value of each sent packet is increased by 1 ." "RFC 791 [Postel 1981a] considers that the identification field should be selected by the upper layer of the IP address to send the datagram. Suppose there are two consecutive IP datagram, one of which is generated by TCP and the other is generated by UDP, then they may have the same ID field. Although this can also work as usual (processed by the restructuring algorithm), in most systems derived from Berkeley, each time an IP datagram is sent, the IP layer must add the value of a kernel variable to 1, regardless of the layer from which the data is sent to the IP. The initial value of the kernel variable is set based on the system boot time ." TCP/IP details (Volume 1)
3. conntrack-Tools
First, declare that this is not a Linux error! Maybe, sometimes your iptables rule is cleared, but packet address conversion is still in progress. This is caused by the chache of ip_conntrack. However, this is not a problem. As long as you can use tools to solve the problem, it can also be solved by tools. This tool is conntrack-tools, it can delete any ip_conntrack cache at any time. download; 2. installation; 3.man
4. Output position design for Hook points of Netfilter
The output Hook Point in netfilter is special. According to common sense, the output should be designed before the route, which also conforms to the principle of filtering as early as possible, however, we found that the output chain of netfiler is behind the route. What is the difference?
4.1.after routing, the output chain focuses on "whether it is easy to filter or not to use routes"
. After filtering occurs on a route, the trade-off is that "No route may be dropped ".
4.3.the output function of SKB is a callback function, which is set based on the routing result and routing policy. Therefore, it is best to set the output chain after the route,
In this way, you can simply write the IP sending function as follows:
Int _ ip_local_out (struct sk_buff * SKB) <br/>{< br/> struct iphdr * IPH = ip_hdr (SKB ); <br/> IPH-> tot_len = htons (SKB-> Len); <br/> ip_send_check (IPH); <br/> return nf_hook (pf_inet, nf_inet_local_out, SKB, null, SKB-> DST-> Dev, <br/> dst_output); <br/>}< br/> static inline int dst_output (struct sk_buff * SKB) <br/>{< br/> return SKB-> DST-> output (SKB); <br/>}
Note that the DST above is initialized Based on the routing search results. Hanging DNAT on the output chain is no problem, because after DNAT changes the destination address, it will re-route and then re-initialize the DST field. The new output function will also get it. Netfilter determines the output after the route and implements DNAT on the output to ensure that the skb dst field is determined after the route. Otherwise, if the DSK field is not determined, The nf_hook function is hard to write.
To sum up, the sending routine at the IP layer of Linux is determined by the "route query result". Therefore, the sending function can be determined only after the route query, you can mount the hook to continue sending.
5. Cisco IOS/H3C VRP/GNU Linux
In the year 04, I got in touch with H3C devices and then used Cisco. About two years later, when I saw the Linux Shell Interface, I thought it was Cisco... the operating interfaces of iOS and VRP are similar. They all belong to the core network device, and focus on the core routing and firewall. The configuration can be difficult, but it must be flexible to meet various needs, VRP draws on Cisco-although its kernel is BSD-based, Linux is a general operating system, and its core network is not its application.
6. Protection against IP Spoofing Based on rfc2827 classic Configuration
(1) Any data packet that enters the network cannot use the address inside the network as the source address.
(2) Any data packet entering the network must use the internal address of the network as the destination address.
(3) Any data packet that leaves the network must use the internal address of the network as the source address.
(4) Any data packet that leaves the network cannot take the address inside the network as the destination address.
(5) Any data packet that enters or leaves the network cannot include a private address or the reserved space (including 10) listed in rfc1918. x. x. x/8, 172.16.x.x/12, 192.168.x.x/16, and network return address 127.0.0.0/8 .) as the source or destination address.
(6) block any source route package or any package with IP options set.