TCP/IP and HTTP related summary

Last Update:2016-04-01 Source: Internet

Author: User

Tags ack epoll

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The network is basically divided into two blocks, a TCP protocol, and an HTTP protocol.

1.TCP and UDP differences

Establish connection or not
TCP targets data flow, UDP datagrams.
TCP provides reliable, orderly, and non-repeatable data transmission; UDP simply encapsulates it, sends it to the IP layer datagram, and does not provide a mechanism such as a time-out retransmission.
If the packet is not transmitted to the peer, the TCP socket returns an error state.
TCP supports full-duplex communication, UDP supports a pair of one or one-to-many, many-to-one, many-to-many communications,
Four kinds of reliable mechanisms for TCP
- Confirmation (includes three handshakes and four disconnects)
- Timeout retransmission mechanism
- Sort, stream by byte stream, each bit has its own serial number, discarding duplicate data
- Sliding window for flow control. Slow start, congestion avoidance, fast retransmission, fast recovery, and fast retransmission.

(Flow control is point-to-point, which is to achieve the ability to accept data, congestion control is global, it is designed to ensure that routers and links in the network are not overloaded)

TCP because of the need to establish a connection before communication, so the communication overhead, is caused more than the UDP transmission data slower. The simplicity of the function determines that the UDP header size is much smaller than the TCP header.

UDP does not require a stay connection and is typically applied to short application and control messages.

2.TCP three-time handshake with four-time handshake, state name and meaning, timewait effect

The MSS in is the maximum message length size (Maximum segment size)
Time_wait status
The time at which each endpoint stays in this section is twice times the maximum section life (MSL), sometimes referred to as 2MSL.
Its significance is two, one is to reliably realize the termination of the TCP full-duplex connection, and the second is to allow the old repeating packet to disappear in the network.
- Reliably terminating TCP full-duplex connections
  If the server sends a FIN, the client receives, the reply ACK packet is lost, if there is no time_wait state, the connection ends, and for the full-duplex servers, it does not end normally, but responds to a RST error. Leave this state, ensure that the server after the discovery timeout after the last fin, the client can still re-reply to a ack.** in this process, the client sends an ACK to the server requires an MSL, the server does not receive an ACK to send a fin, at this time need an MSL, So the whole process must be 2MSL.
- Allow old repeating groupings to disappear in the network
  Previously expired connection requests can disappear from the network. There may be a situation, that is, the end of A and B end of the connection, in a short period of time, the original same connection is re-established, and the last duplicate packet if the peer, then the peer may receive this packet, resulting in a receive error. Because the time_wait state lasts 2MSL, it is enough to make the repeating packet maximum survival MSL time is discarded.
Why a connection is a three-time handshake, not a two-time handshake
This is to prevent the client from expiring the connection request message and establish the connection again. For example, there is a packet blocking the network, but it is not discarded, and then the client re-transmission, three times to complete the handshake. Once again, the original blocked packet arrives at the server, the server replies with a SYN and ACK, and if it is only two times the handshake, the connection is established. However, if the handshake is three times, the client does not reply to this syn+ack package, so this outdated connection cannot be established.
why it takes three handshake to connect and four handshake to disconnect
When a connection is made, the server's ACK to the client SYN and its own ACK can be sent to the client in a packet packet, as this is just a synchronization. But for the end, TCP is doing full duplex work, the client sends a fin to the server to represent the completion of data sent by the client, but the data sent by the server is not necessarily complete, so after receiving the client sends the FIN, the server sends the buffer after the data sent out, Also sent to the client a fin. On the other hand, EOF sent to the server by the client is queued in the server's receive buffer, requiring the data of the receiving buffer to be completed before the EOF response can be achieved. EOF can be processed immediately only by placing the push in the TCP header control field, but at the same time it is necessary to ensure the completion of the send buffer data.

3. TCP Congestion Control

Slow start: increased by 1 index
Congestion avoidance: Linearly increases after reaching the congestion threshold. If congestion occurs, the congestion threshold is reduced to half the original, while the congestion window is set to 1 and the index increases again.
Fast retransmission: Requires the receiving party to receive an out-of-order packet, immediately reply to retransmission, rather than when they send data, piggyback retransmission data. When a retransmission request is received three times in a row, it does not have to wait until the retransmission time expires to retransmit.
Fast recovery: When the sender receives three consecutive confirmations, the congestion threshold is halved, but the slow start is not performed at this time, but the congestion avoidance algorithm.

4. The message structure of HTTP.

Request line, request header, request content

Request Method:
- GET
  Request to get the resource identified by Request-uri (need to return data)
- POST
  Append new data to the resource identified by Request-uri (new plus data)
- HEAD
- Request to get the response message header for the resource identified by Request-uri (similar to get, no need to return data, so only header)
- PUT
  Request server stores a resource and uses Request-uri as its identity (modify data)
- DELETE
  The request server deletes the resource (delete data) identified by Request-uri TRACE Request Server Loopback received request information, primarily for testing or diagnostics
- CONNECT reserves the ability to query the server for future use of the options request, or to query resource-related options and requirements
Request header (not cooked)
- Accept-charset
  The Request header field is used to specify the character set accepted by the client. eg:accept-charset:iso-8859-1,gb2312.
- Connection:keep-alive.
  In non-keep-alive mode, each request of C/S is a request answering mode, and each request is accompanied by a TCP connection establishment. When a request answer is finished, the connection is disconnected. In the keep-alive mode, the reuse of the connection is realized, in the case of the next request, the connection between the server continues to be valid, avoiding the duplication of the connection. However, this connection has a certain duration. For example, the HTML of a Web page has an image file, but in HTML, it is just a URL for the src tag. If the connection is not maintained, it causes the request to HTML, and the connection is re-established to request the URL of the image. This can also happen in Web pages that contain a large number of JS CSS.
  HTTP 1.0 is turned off by default, you need to add "connection:keep-alive" in the HTTP header to enable Keep-alive;http 1.1 by default enabling Keep-alive, if you join "Connection:close", Before closing. Most browsers now use the http1.1 protocol, which means that the Keep-alive connection request is initiated by default, so whether a full keep-alive connection can be completed depends on the server setup.
- Advantages of Keep-alive:
  - Less CPU and memory usage (due to the reduced number of simultaneous open connections)
  - Allow HTTP pipelining of requests and replies
  - Reduce network congestion (TCP connections are reduced)
  - Reduced latency for subsequent requests (no more handshake)
  - Reporting errors without shutting down the TCP connection
- Referer: The browser indicates to the Web server which page/url obtained/clicked on the URL/url in the current request.
- In the keep-alive mode is how to determine the data transfer completed, you can pass the Content-length field, because it identifies the length of the content, but for some dynamic pages, is the edge of the edge of the data generated, it is not possible to calculate this field value, it is necessary to use the Chunked protocol
- The ransfer-encoding:chunked server needs to use the "transfer-encoding:chunked" approach instead of the content-length. The chunk code divides the data into a piece of the occurrence. The chunked encoding will be concatenated with a number of chunk, ending with a chunk marked with a length of 0.

Status code meaning of 5.Http

1XX: Indicates information – indicates that the request has been received and continues processing.

2XX: Success – Indicates that the request has been successfully received, understood, accepted.

3XX: Redirect – A further step must be made to complete the request.

4XX: Client Error – The request has a syntax error or the request cannot be implemented.

5XX: Server-side Error – The server failed to implement a legitimate request.

A description of the common status code and status is described below.

OK: Client request succeeded.

Bad Request: Client requests have syntax errors and cannot be understood by the server.

401 Unauthorized: Request is not authorized, this status code must be used with the Www-authenticate header domain.

403 Forbidden: The server receives the request but refuses to provide the service.

404 Not Found: The request resource does not exist, for example: The wrong URL was entered.

Internal Server error: Unexpected errors occurred on the server.

503 Server Unavailable: The server is currently unable to process client requests and may return to normal after a period of time, for example: http/1.1 OK (CRLF).

6. The difference between Http1.1 and Http1.0

Connection multiplexing, that is, the connection field as called by the appeal
Added more request packages and response packages
Added the host request header. Because HTTP 1.0 does not support host Request header fields, Web browsers cannot use host header names to explicitly indicate which Web site to access on the server, so you cannot use a Web server to configure multiple virtual Web sites on the same IP address and port number. After adding the host Request Header field in HTTP 1.1, the Web browser can use the host header name to explicitly indicate which Web site to access on the server, which enables multiple virtual Web sites to be created on a single Web server with a different hostname on the same IP address and port number.
HTTP 1.1 also provides request headers and response headers related to mechanisms such as authentication, state management, and cache caching.

7.http How to handle long connections

The short link to HTTP is the case where the connection is false, that is, the connection is no longer used. When the connection is keep-alive, it is a long connection.

Short connections and long connections for HTTP are short connections and long connections for TCP.

A TCP short connection is disconnected after a read-write. The advantage is that the management is relatively simple, the existing connections are useful connections, do not need additional control means.

Long connections are not disconnected after reading and writing. This connection is also used for the next read and write. With this feature is the TCP KeepAlive function, which is designed for the server, so that the server wants to know the client state, to determine whether it crashes. If the server does not receive any request from the client within 2 hours, a probe message will be sent. The server does some processing based on the response/non-responsiveness of the client.

8.Cookie and the role of the session in principle

First understand the stateless state of HTTP.

The HTTP protocol is stateless, meaning that the protocol has no memory capability for transactions, and the server does not know what state the client is. That is, there is no connection between opening a Web page on a server and the pages you have previously opened on this server. HTTP is a stateless, connection-oriented protocol, and stateless does not mean that HTTP cannot maintain TCP connections, nor does it use the UDP protocol (no connection) on behalf of HTTP.

Because the HTTP protocol is stateless, that is, the client and server side, there is no need to record the other party's actions. In some cases, however, the status of the other person is maintained, so the cookie and session are present.

A cookie is a method of maintaining state on the client. By extending the HTTP protocol implementation, the server generates the appropriate cookie,cookie for the browser by adding a special field in the response header (set-cookies), if the requested resource is within its scope, and the cookie is valid, The cookie is sent to the server.

Cookies are stored in memory and stored on the hard drive. If the time to live is not set, it is saved in memory, the browser is closed, and the cookie disappears, which is called a session cookie. If you set a time-to-live, it is saved on your hard disk, and you can use it when you close the browser next time you open it.

For cookies saved on the hard disk, sharing between browsers is possible. Cookies are stored in memory, but different browsers behave differently. For IE, using CTRL + N to create a page that can be shared, and other ways can not. For Mozilla Firefox0.8, all processes and tabs can share the same cookie

The session is a way to maintain state on the server, when the servers to create a session for a request, it first check the client request header for Session-id, if any, then the previous has been created, the server according to SessionID to detect it. If there is no Session-id, the server creates a session and assigns a session-id,session-id associated with it to the client in the response header to be saved. Session-id can be stored in a cookie-based manner on the client.

. Select, poll, Epoll

Reference 1
Reference 2
Reference 3
Reference 4

Select, poll, and Epoll are the mechanisms for IO multiplexing, where a process can monitor multiple descriptors and, once a descriptor is in place, notifies the program to do the appropriate reading and writing operations. They are essentially synchronous I/O.

Select

The Select function monitors file descriptors in three categories, Writefds, Readfds, and Exceptfds, respectively. The process blocks until a descriptor is ready when the select is called. When select returns, you can traverse fdset to see the ready descriptor.

The advantage of select is that almost all platforms are supported.
The disadvantage of select is three.
1. The default number of supported descriptors is too small, 1024/2048, although can be modified, but need to re-need to compile the kernel. Even if a supported descriptor is added, the overhead of the kernel traversal of FD is also linearly increased.
2. Each time a select is called, a prize is presented. All descriptors are copied from the user state to the kernel state. Also, the state of the kernel FD requires the use of a memory copy to pass its state to the user layer
3. For each FD transmitted from the user state, the kernel needs to traverse through, which also brings a lot of overhead.
Poll

The poll mechanism is similar to select, which copies the user's incoming array to the kernel space, then queries each FD for the device state, and if the device is ready, adds an entry in the device waiting queue and continues the traversal, suspending the current process if no device is found after traversing all FD. Until the device is ready or the active timeout is awakened, it again iterates over the FD. This process has gone through many meaningless loops.

Poll solves the connection number problem because the poll descriptor is stored by the linked list, not the array.

Epoll

Epoll solves all the problems with SELECT, so it's very efficient.

There is no limit to the number of file descriptors. The internal implementation is the number of red-black trees.
The Epoll uses the mmap shared memory strategy relative to the copy of the FD between the kernel state and the user state in the case of select and poll.
Epoll adds all FD to the wait sequence and assigns it a callback callback function, when an FD is ready, it is activated in the ready sequence, and the call Callback,callback joins it to a ready list and wakes the Epoll_ in the sleep state Wait Epoll_wait () detects if this ready list is empty. Although, like Select, it needs to be from sleep to activation, the sleep and activation of select is active, multiple conversions. The Epoll activation is activated in the case of a ready FD, avoids unnecessary activation, and epoll_wait () only needs to detect if the ready list is empty, rather than traversing like select.

Two working modes of Epoll

Epoll has LT (horizontal trigger) and ET (Edge trigger)

The difference is that in the LT case, as long as a socket in the readable/writable State, epoll_wait () will return the socket, but for edge triggering (Edge_trigger), as long as the state transformation, that is, by the unreadable /unwritable to readable/writable State transform to return the socket.

Both select and poll are horizontal triggers, which read data as long as they are readable and writable. For the epoll edge trigger mode, problems may occur. That is, when an edge is triggered, the event causes epoll_wait () to return the socket and do the corresponding read, but when the data is not read at the first read, there is no conversion unreadable/unwritable to readable/writable state. To trigger the read data again, the read of the data may be discarded at this time. So in this case, the read is read all the time, working in a non-blocking situation until the Eagain is returned.

TCP/IP related to HTTP summary

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More