Something about the transport layer protocol that comes to mind from HTTP 2.0

Source: Internet
Author: User
Tags http 2 truncated

0. I do not know the history of HTTP...
1. Regarding the email subscribed to by HTTP 2.0, the front page is about the content of HTTP 2.0. I am not very concerned about HTTP, but I will be confused when I am bored. I think there are two major improvements to HTTP 2.0:
First, the frame layer is added.
The advantage of the frame layer is that the stream information is re-distributed, and the server's processing sequence can no longer depend on the order in which users submit requests. In addition, you don't have to use TCP to transmit HTTP. In fact, this is what the standard first said.
Second, the content of the HTTP header can be incrementally interacted.
Many of the information in the HTTP header is parameter negotiation and must be carried every time. For example, in the form of key/value, HTTP 2.0 is changed. It only carries new or changed content. Since HTTP is based on a request/response session, why not store parameters in the session? Similar to SSL/TLS. Even a connectionless protocol family like IPSec only transmits an SPI index each time.
2. There is no need to design a new transport layer protocol in the transport layer. Only TCP and UDP are enough. If you want to extend it, You can extend it on UDP.
3. several misunderstandings about TCP and UDP before protocol transformation, first clarify several misunderstandings, that is, you need to know why UDP is used instead of TCP, or vice versa, you always think TCP is better than UDP:
Misunderstanding 1: UDP is more efficient. Many people think it is because UDP is more efficient. Because ACK is not required, it is most likely that I learned from textbooks, or I told you by the teacher, but look at what you are about to do to make UDP reliable and arrive in order, unless you have invented a sort-based arrival mechanism that does not require ACK, otherwise, your ACK mechanism may be more inefficient than TCP.
Misunderstanding 2: TCP is more robust and attack-proof. This is a thorough misunderstanding. What guarantees the security in the TCP protocol header? Checksum? In most cases, the IP layer first blocks you. If the IP layer removes the firewall and checksum, what will the unprotected TCP face! Serial number? A 32-bit number. To allow the Peer to receive your forged TCP segment, you only need to construct a segment with a serial number within a reasonable window and matching the quintuple. In fact, the TCP protocol specification has never had a security mechanism, although you can indeed put something related to the PKI system in the TCP option.
Misunderstanding 3: Neither TCP nor UDP is secure. This is actually a pseudo proposition. The problem is not at all an idea. It is possible that the emergence of IPv6. TCP or UDP is just a transport layer protocol that provides end-to-end node communication and application multiplexing over IP addresses. Security is neither its responsibility nor its responsibility at the IP layer. According to the strict layered model, identity authentication is the responsibility of the Session Layer, while data integrity is the responsibility of the presentation layer.
Misunderstanding 4: TCP's low efficiency lies in its handshake, slow start, and other negotiation feedback mechanisms. The feedback-based slow start and fast retransmission/fast recovery are both reasonable, their purpose is to detect the transmission capability of the end-to-end path. TCP does not use an explicit NAK mechanism, whereas packet loss is regarded as congestion and packets that continuously receive the same ACK are considered as packet loss, the feedback mechanism of this link is already the most efficient way TCP can use. The more efficient way is to directly send source suppression feedback signals to packet loss points or congestion points, but that would undermine the simplicity principle. Some people have invented TCP reuse to reduce the handshake overhead of TCP short connections, that is, the TCP fast open mechanism, so that data can be carried in the SYN packet. This is not a good method, why do we need to frequently enable transient connections between the same two end-to-end hosts? Why do I enable a persistent connection during this period and then attach the application request of the transient connection? Using persistent connections to smooth out the handshake overhead is the correct solution. The handshake is for negotiation parameters. This overhead cannot afford. Free lunch must be leftovers! Isn't reusable connections left over before?
HTTP 1.1 supports persistent connections, but the support is not good enough. Each request must be responded according to the order in which the request is sent. If the two requests are requests with little relevance, the processing delay of the first request makes the second request tired, which cannot be hurt. How can this be explained ?? I personally think this is an incorrect way of using the transport layer protocol, not a problem of protocol design. The transport layer only provides the upper layer multiplexing mechanism at the entire layer, the upper layer does not mean that it must be the application layer. Who asked you to plug two GET requests into a TCP connection? Well, you can use two TCP connections. You can use multiple TCP connections! But there is still a problem. Where is the problem? Let's take a look at the application model.
4. there was no such application in the original prehistoric era. There were two types of application models: File Download and terminal login, look at the famous prehistoric protocols, FTP, Telnet... at that time, TCP/IP was associated with the IP address, but it was not divided into two layers and there was no need to separate them, because no matter whether it was file download or login, strict data needs to arrive in order, at that time, the network was actually a network with accurate data in order, but when they were developed to the third version, they broke up. As to why they broke up, the nice thing to say is that each layer of the layered model is only responsible for the Responsibility Principle of class-1 processing. If it is not so nice to say, but it is easy to understand, it is because there was a demand, there is no need to provide a sequential arrival mechanism, but to provide data boundaries is a type of post-release regardless of the demand. Many control packets belong to this type, they do not need a persistent connection, but require data boundaries, that is, where a packet starts and ends. These control packets themselves provide an additional verification mechanism and do not require network feedback. Apparently, the TCP-IP could not meet these needs, so the IP is a confetch, thus becoming a TCP/IP, actually (TCP-UDP)/IP.
In accordance with the preemptive principle, future applications will either use TCP or UDP, and there are also many direct IP addresses that do not require multiplexing, such as many routing protocols. With the rapid development of applications, TCP and UDP can respond until HTTP 2.0. Even if there are many efficiency problems, there are solutions in general. However, HTTP 2.0 has brought about a brand new era. Everything is different from the past.
An application needs to request a lot of resources on a host. According to the previous understanding of sessions, multiple sessions need to be established. However, for efficiency, sharing a TCP connection is recommended, HTTP 2.0 is different from the past in that it adds a new frame layer, encapsulates each HTTP packet in a frame, and each frame carries a stream label, this means that the HTTP response does not have to be processed in the order of the request. The stream number can identify the request corresponding to an HTTP response, and the problem still exists, but it is at the TCP layer. Although the application server does not have to process the queue in the Request order, the TCP connection is strictly ordered, so in case of packet loss, all subsequent data will be blocked, until packet loss arrives, this is TCP's constant bumpy problem. I should have experienced it when I have played Zuma games. In a wireless environment with unstable signals, the turbulence of a TCP stream brings comprehensive and serious problems. The problem caused by HTTP 2.0 over TCP is that TCP blocking replaces Application Server blocking. How can this problem be solved?
5. the session transport layer assumes that if there is an end-to-end protocol, each connection occupies a quintuple, providing sequential and reliable arrival of frame boundaries on the connection, or reusing multiple streams on the connection, every stream reaches reliably in order. Does this solve the problem of HTTP 2.0? At the same time, the problem of port resource occupation is also solved, and there is no need to worry about port insufficiency.
The OSI model has a session layer, but the TCP/IP model does not. In fact, this layer is necessary. If HTTP 2.0 is hosted on the Session Layer rather than the transport layer, the transport layer can use lightweight UDP, make sure that each session is processed in sequence at the Session Layer. If the session is hosted on the transport layer, all sessions of a WEB application share a TCP connection for TCP, mutual restraint is achieved in order.
6. the Implementation ideas and measures do not know whether you know about the Reliable layer of OpenVPN. This is a living example. After my personal transformation, it is easy to implement multiple stream multiplexing based on UDP. The protocol encapsulation diagram is as follows:
Stream1 Based on the Reliable layer-stream2 Based on the Reliable layer-stream3 Based on the Reliable Layer
Bytes ----------------------------------------------------------------------------------------------------------------------------
UDP protocol (I used a pair of BIO for OpenSSL)
Bytes ----------------------------------------------------------------------------------------------------------------------------
If an intermediate data packet in stream2 does not arrive, it will only block the subsequent data packets of stream2 to be submitted to the application, stream1 and stream3 data packets arrive even after the data packets that have not arrived at stream2. As long as they are in order, they can be submitted to the application layer as soon as possible. The protocol header of this stream is very simple. For convenience, there are two fields, one identifying session ID and one identifying length. In fact, this length is not required, because there is a length in the Reliable layer. According to this idea, there is also an extended protocol, that is, to provide the semantics of arrival in order according to the boundary mark. At this time, the length field in the protocol header is mandatory. The extended meaning is, I only need to ensure that the data with the length indicated in the protocol header arrives in order. In this case, each data packet requires a serial number, but only the serial number within the length range is used to arrive at the semantics in order. If the serial number falls within the length range, it is processed in order, otherwise, the application is submitted directly. In my tests, I didn't provide any performance optimization measures. I just called it, because I think there is always a way to optimize it. Besides, it is not very accurate to use BIO to simulate packet loss.
The reason why I use the Reliable layer of OpenVPN is not because it is so good, but because I am familiar with it. According to my understanding of the history of the world, the wheel was invented only once in meusoanda Mia, it is better to borrow more than just try again. The entire process is the process of cropping OpenVPN, and the final C file is:
Buffer. c error. c interval. c list. c mbuf. c memcmp. c otime. c packet_id.c reliable. c schedule. c session_id.c ssl. c
The corresponding H file and Makefile should also be modified. I just added a reuse mechanism between the bio_read/write and Reliable layers. BIO is really a good thing. It's better to simulate UDP,
6. I have read some sugon. I am a little away from HTTP 2.0. In fact, most of the content in this article is not about HTTP 2.0, but completely for TCP, it is really inappropriate for HTTP 2.0 to run on TCP because the TCP logic is too simple and complex. Simply put, it is because it is still the same as a few decades ago. It is only suitable for file transmission and remote persistent connection login. It is complicated because it is not applicable to these two types of requirements and a small number of these two types of requirements, TCP has derived a variety of complex feedback-based algorithms. You must know that, as long as the feedback system is complex, animals are differentiated from plants by a negative feedback system! Therefore, if the application logic becomes more and more complex, such as today's various WEB applications, whether TCP can be used as a good horse is questionable. HTTP 2.0 was born to cope with complicated WEB applications, TCP must not be held in a bid. In general, TCP is too heavy, and HTTP 2.0 is also not light. Many TCP optimizations are not targeted at the WEB. For example, the goal of TCP is to fill the channel of bandwidth-delay flight attendants. Therefore, it conflicts with the end-to-end traffic control mechanism, and the receiving end can only receive N pieces of data, can the sender send only N messages? You need to know that N pieces of data are still running on the network for a while! Therefore, the sender should send N * Delayed data, but if the delay is long, can you ensure that the receiving window of the receiver remains unchanged during the delay period? Especially in complex WEB application environments, the receiving window changes instantly, so feedback is required... if it is used for file transfer, there is no problem, because the purpose of file transfer is to fill the whole bandwidth as much as possible to achieve the maximum throughput.
Well, after talking about how many TCP connections are not suitable for HTTP 2.0, is my above idea fixed? I dare not say it, but I think at least several problems have been solved. The first one is that you do not need to establish multiple connections (this is actually a problem solved by HTTP 2.0, but this article is not limited to HTTP 2.0). You must know that for the same target, each IP address on your machine can only use 65535 ports, which is not the limitation of your machine, it is a protocol restriction. In the age when the bandwidth is scarce and the data volume is small (the load rate is very low), the 16-bit port field in the protocol header saves a lot of bandwidth than the 32-port field! Maybe you will think that the same is true if you reuse multiple session IDs on one port? Isn't session ID a waste of money? Yes, it is not inexhaustible. The key to the problem is that it may overwrite the bad architecture of IPv6!
The problem is whether you are willing to scale horizontally or vertically. I have to use a reality on the IP layer to illustrate the problem, so I have to go to the IP layer. Taking IPv4 as an example, its address space is always considered limited, and people are always worried that it will run out, so one of the ideas of IPv6 is very simple, that is, increasing the address space and suddenly becoming 128 bits, what do we say about the address and other promotional words that every square on the earth can possess? The boredom is no less than that of the Great Depression when the US president promised that each person had a chicken in it! In this way, IPv6 solves the problem, but the protocol stack is not compatible, and the entire IP layer needs to be rewritten. During the rewriting, communication between the entire upper and lower layers needs to be truncated, but this cannot be truncated, currently, IP addresses have basically become the only route for all communications. This is not the case when we build a route. We can change the route to the left to the right. The IP address is actually a two-way two-lane path.
Is there any way? Yes! That is the idea of LISP. Is it better to divide an IP address into two layers, identify the location of the outer layer, and identify the device in the inner layer? Like the promotion of IPv6, it can be said that each location can have 2 IP addresses to the power of 32, isn't it more attractive? After all, I did not say how big a location is !! Different from IPv6, implementing this is very simple. You only need to provide a compatibility layer for convenient transition. If LISP is supported, you can handle it. If not, you can strip the outer pass. Note: what I am talking about here is not the standard LISP, but the improvement of IPv4 based on the LISP idea.
Back to the TCP/UDP issue, if we adopt similar ideas, isn't it good? If a session ID is 32-bit, a UDP port can host the 32th session of 2! This completely satisfies the current complex application requirements. Applications can connect to the same UDP endpoint, but send different business data. Each business or transaction is session independent and does not affect each other.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.