19th APO Connection and network v-node

Source: Internet
Author: User

19th APO Connection and Network v-node  There is a lot of preparation to be done before you can write the underlying implementation of the network; In addition to modifying the previous chapters, you may need to write 3 chapters, this chapter, the implementation of the file number management class, and the implementation of the local memory management class. I hope that the ultimate network programming can surprise me; ip/tcp/udp/icmp implementation, including all network server implementation (HTTP, DNS, FTP, etc.), can reach 300 lines of code, more than 300 lines of code that would be better, I would be very happy. The first layer of network programming is the socket layer, the second layer is referred to as the TCP layer, the third layer is referred to as the IP layer;

This chapter will be slow to write, because you have to study that very carefully: complex, huge, serious, serious, funny, horn-pointed, awesome TCP protocol.

I. Connection to APO

What is a connection? This simple problem takes me a lot of time, simple actually means complex, is a lot of content condensed and formed. Socket, socket, telephone number, port, even if we use analogy to describe a communication connection process, but can fully describe a connection? For a socket client port, it does correspond to a client connection. But for the same socket service port, there can be a lot of connections, obviously not. Use both IP addresses, and socket ports to define a connection, which is obviously OK; but that requires 32 bytes Ah, it is not possible to quickly locate a connection. Well, check the online information, days yo, really is what 5 yuan Group, 7 Yuan group to determine a connection! The five tuples are: Source IP address, Destination IP address, protocol number, source port, destination port. The seven tuples are: Source IP address, Destination IP address, protocol number, source port, destination port, service type and interface index.

1, the connection of the expression

For the client, it can be represented by a 16-bit socket client port; How do you know if a connection is a client or a server? We need to create a port table, not large, because a service port corresponds to many connections; the size of the APO table is only 8K. So, what is the content of the table item? Should be related to the operating system, I'm not sure, maybe they're not using a port table to represent. In any case, when a packet is received, it should be judged which connection it belongs to, and the idea is associated with the file number SOCKFD. The Port table entry for APO is a word, if it is a client port: The high 8 bits are the status flag, the low 24 bits are the file number SOCKFD; For the service port, the high 8 bits are the status flag, the low 24 bits are 0, the port entry is not used, all 0. In fact, you also need to manage the port's bitmap (for port allocation, release), and the Port query table (the table item content is the port value). Extract the destination socket port value of the IP packet, first find the port value in the Port query table in item I (usually consumes about 30ns), then, then I go to the port table to obtain the file number, if 0, the 24-bit server file Number field to extract IP packets. If the Mac port speed of the host is 1GBPS, the minimum interval between packets 160ns;apo is 512 bits, equivalent 512ns, so the minimum time to process an IP packet at the IP layer is only 672ns. IP layer processing needs to include memory management, IP packets assembled into IP datagrams, error packet processing, properly committed to the TCP layer (the most complex, connection, file number management, port management, etc., the shortest time to process is only 672ns. ), ICMP packet processing, and so on. APO processing speed is also 1百多万个 packets per second, support thousands of connections should be less problematic.

Using a five-tuple, is the destination not native? Ternary group not? Well, they're not like the APO host is the only IP address that can have multiple IP addresses. OK, five tuples too many bytes, direct comparison is too expensive (thousands of connections will exhaust the Inte class host resources), the idea of extracting 8 bits per element, the formation of 32-bit hash (simple hashing algorithm, only 4 yuan, enough verbose). That's not necessarily the only hash. Ah, increase conflict handling; I'm going to faint! Perhaps thousands of connections are OK, against 100,000 connections, the host to take a break. They should be linked to the file number SOCKFD, for the complex head of the operating system, I can not be discussed carefully, so, only to speculate on them.

since we think of the socket as a file, why not just use the file number SOCKFD to represent a connection? Yes, this is the case with APO. The client does not have to say, on the server side, we use different file numbers to represent different clients to the same server connection, and describe the connection is the file number corresponding to the memory v node. But the client is not aware of their connection to the server, the corresponding file number on the server AH; therefore, you need to use the 24-bit server file Number field in the IP packet. The server-side file Number field is the server-side setting, and the client sends the datagram and forwards the server file Number field along the line. In this way, when the client's datagram arrives at the bottom of the server's network, it can extract the 24-bit server file Number field and quickly locate the corresponding memory v node to get a description of the connection. The bottom of the network is usually to extract the target socket port of the packet, check the socket port table, if the client port, then you can get the corresponding file number immediately, if it is 0 (service port), then you need to extract the IP header of the 24-bit server file Number field; Of course, after entering the corresponding v-node, Further confirmation is required, which is only a one-time comparison of less than 10ns.
2. Establish the connection

     signaling must be reliable, TCP is using three-time handshake connection, why? To ensure reliable signaling, an ACK acknowledgement of a signaling packet is required to indicate that the receiving side is receiving it. If you do not receive an ACK, you will have to do overtime re-send, apo up to 2 times the timeout. Under Linux, the default number of retries is 5 times, the retry interval from 1s start each time the sale, 5 times the retry interval of 1s, 2s, 4s, 8s, 16s, a total of 31s, the 5th time after the issuance of 32s all know that the 5th time has expired, so, the total need for 63s, TCP will disconnect this connection. We know that a 512 signaling packet is only less than 1us, even if the path has 100 intermediate routing switch storage and forwarding will produce a certain delay, from a to B will not go to 1ms, back and forth 2ms; Why do I need to retry the interval more than 1s? Isn't that a waste of time? Opportunities for hackers to exploit? Kind of like taking off your pants and farting? You may say that if the server has received a SYN after the Clien sent back to Syn-ack after the client suddenly dropped the line (or deliberately do not send an ACK, this probability is greater), the server side did not receive the client back ACK, then, should give the client side 1 minutes, And the chance to re-send 5 times. To know that the host processing 1 million packets per second, then 1 minutes of 60 million packets ah, if these packets are request connection signaling packets, the host resources are exhausted. Some malicious people created a synflood attack-after sending a SYN to the server, they went offline, so the server needed to wait for 63s to disconnect, so that the attacker could run out of the server's SYN connection queue so that the normal connection request could not be processed. Therefore, in order to correct a small error, we have to pay more complex and time-consuming cost. Linux gives a parameter called Tcp_syncookies to deal with this-when the SYN queue is full, TCP creates a special sequence number from the source address port, the destination address port, and the timestamp to send back (also called a cookie). If an attacker does not respond, if it is a normal connection, the SYN cookie will be sent back and the server can be connected via a cookie (even if you are not in the SYN queue). The price is that it can handle 1 million packets per second, potentially turning 10,000 packets per second. Linux has a lot of complicated ways to deal with this, and it's no longer discussed. I can't figure out why the product of a complex brain is always Lombard 18 bends; The agreement should not be designed for garbage hosts.

     MSS is the abbreviation for maximum transfer size, is the maximum datagram fragment that a TCP packet can transmit at a time. A datagram is segmented into multiple packets smaller than the MTU (also called datagram fragmentation). In order to achieve the best transmission efficiency the TCP protocol usually negotiates the MSS value of both sides when the connection is established, the TCP protocol is often replaced with the MTU value when it is implemented (minus the Mac header 18 bytes, IP packet header size 20Bytes and TCP data Segment header 20Bytes) so often MSS is 1518–58 = 1460B. Usually the MTU is 1536B, and the transmission of a TCP signaling is only 64B, wasted ah; 3 handshake, also probing MTU and negotiate MSS value, test round trip time, how many times do you say you want to create a connection? How much resources are wasted? I used to look at the TCP protocol, feeling rigorous, great, like a big Stone mountain full of mystery, why, now look like a garbage hill? Riddled with holes, hacked and bruised, so funny! Pupil level is inferior, but complex matchless, flat flat I also see a few days cai Lue understand, eh, contradictory world.

      in APO, TCP is a duplex connection, UDP is a simple half-duplex connection. The APO connection is not the usual TCP three-time handshake connection, but is connected using an ICMP sniffing packet; only 2 handshake, request, acknowledgment. When the client's request to connect the ICMP probe packet arrives at the server, the sniffing packet already contains the MTU, timestamp, IP address token, etc. of the routed switch along the route (a total of 16 bytes), then the server can know the client's earth location, path information, IP address, one-way time-consuming, and do filtering analysis: The source IP address, the routing switch link address is blacklisted; if secure, the server establishes a file number that describes the connection, saves the connection description to the memory v node, forwards the ICMP sniffing packet, and hits the server file Number field, the segment size MSS, the allowed datagram size, One-way time-consuming, ACK-acknowledgement and other marks. The APO network cannot spoof the source address unless you hijack the subnet address of the routing switch. However, the server specifies that the number of connection requests sent to the server by the subnet must not exceed 1000, and the number of connection requests sent by the host to the server per second cannot exceed 10. Hackers think SYN flood attack is impossible, you know the server is allowed thousands of connections AH; APO cancels the broadcast, you use the multicast server, or the equivalent of a contract from a machine, useless. Then, the subnet address of the hijacked routing switch, the number of connection requests per second 1000, continuous n seconds, also useless, APO's re-send time is 1s, the last batch of 1000 connections, if the packet is not sent, the connection is canceled. Hacker Ah, do not on the blacklist, otherwise how many packets, is discarded or trivial; So, use tricks: Occupy manger not to poop; a host is connected normally, and a garbage get request is sent every 10 minutes to keep the connection. This is not possible, the server provides only 50 connections per host, hackers need to hijack more than 300,000 hosts. If the host with APO operating system, you want to hijack so many hosts root, just dreaming. The APO signaling message is only 3 times, the retransmission interval is 200ms, 600ms, delay to 1s, if you do not receive an ACK (client) or packet (server side), the connection is released. The sniffing ICMP packet for APO is up to 45E and can only be imprinted on up to 90 routed switches, and if there are more than 90 round trips, some of the imprints will be eliminated. With a converged routing switch, this scenario does not occur, and no more than 50 routing switches can be routed back and forth.

The client sends a SYN connection request after the ICMP probe packet, entering the syn_sent state. The server receives the connection request after the ICMP probe packet, forwards the ICMP probe packet that marks the Ack+syn, enters the syn_received state, and if the IP packet of the corresponding client is received within 1s, it enters the established connection establishment State. Otherwise, after 2 re-sends within 1s, the packet is confiscated and the connection is released! After the client receives the acknowledgement of the ICMP sniffing packet, it confirms the connection, enters the established state, calculates the round trip time, determines the MSS, initializes the header that sends the datagram, and sends the first IP packet. If you receive an ICMP error message such as a destination unreachable, release the connection and return an error message to the application layer. If, time-out, the request to the ICMP probe packet is re-sent up to 2 times, and if the acknowledgment ICMP probe packet is forfeited to the server within 1s, the connection is released and an error message is returned to the application tier.

3. TCP Communication after connection is established

19th APO Connection and network v-node

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.