Simple understanding of Http and TCP, and understanding of tcp
The TCP protocol corresponds to the transport layer, while the HTTP protocol corresponds to the application layer. Essentially, the two are not comparable. Http is based on TCP. When the browser needs to obtain webpage data from the server, an Http request is sent. Http will establish a connection channel to the server through TCP. When the data required for this request is complete, Http will immediately disconnect the TCP connection. This process is very short. Therefore, Http connections are short connections and stateless connections. The so-called stateless means that each time a browser initiates a request to the server, it establishes a new connection instead of a connection. If it is a connection, the server can maintain the connection and remember some information in the memory. After each request ends, the connection is closed and the related content is released. Therefore, you cannot remember any status and become a stateless connection.
Over time, html pages become complicated, and many images may be embedded in them. At this time, it is inefficient to establish a tcp connection for each image access. Therefore, Keep-Alive is proposed to solve the problem of low efficiency. Starting from HTTP/1.1, Keep-Alive is enabled by default to maintain the connection feature. To put it simply, after a webpage is opened, the TCP connection between the client and the server for transmitting HTTP data will not be closed. If the client accesses the webpage on the server again, will continue to use this established connection Keep-Alive will not Keep the connection permanently, it has a retention time, you can set this time in different server software (such as Apache. Although the TCP connection is used for a period of time, the time is limited and will be closed by the time point, so we also regard it as closing every time the connection is complete. Later, Session, Cookie and other related technologies were used to maintain the status of some users. However, if a connection is still used every time, the connection is still stateless.
In the past, there was a concept that was quite tolerable and unclear. That is why Http is a stateless short connection while TCP is a stateful long connection? Isn't Http Based on TCP? Why is it a short connection? Now, Http closes the TCP connection after each request is completed, so it is a short connection. When we use the TCP protocol directly through Socket programming, because we can control through the Code area when to open the connection and when to close the connection, as long as we do not close the connection through code, this connection will always exist in the process of the client and the server, and the relevant status data will be kept.
In C #, there will be a Socket. In fact, the socket is the encapsulation of the TCP/IP protocol. The Socket itself is not a protocol, but an API call ). The emergence of Socket only makes it easier for programmers to use the TCP/IP protocol stack. It is an abstraction of the TCP/IP protocol, thus forming some of the most basic function interfaces we know, for example, create, listen, connect, accept, send, read, and write.
A more vivid description: HTTP is a car that provides a specific form of encapsulation or display data; Socket is an engine that provides network communication capabilities. For the convenience of C # programming, you can directly choose the Http car that has been created to interact with the server. However, sometimes, due to environment factors or other custom requests, you must use the TCP protocol. In this case, you need to use Socket programming and then process the obtained data by yourself. It is like you have built a truck with an existing engine to interact with each other on the server.
Both HTTP/1.0 and HTTP/1.1 use TCP as the underlying transmission protocol. The HTTP client first initiates a TCP connection to the server. Once a connection is established, browser and server processes can access TCP through their sockets. As mentioned above, the client socket is the "door" between the client process and the TCP connection, and the server socket is the "door" between the server process and the same TCP connection ". Customers send HTTP request messages to their sockets, and receive HTTP response messages from their sockets. Similarly, the server receives HTTP request messages from its socket and sends HTTP response messages to its socket. Once a customer or server sends a message to their respective sockets, the message is completely under TCP Control. TCP provides a reliable data transmission service for HTTP. This means that each HTTP request message sent by the customer will eventually reach the server without loss, each HTTP Response Message sent by the server will eventually reach the customer without loss.
C # The Code uses the TCP protocol to connect to the remote database. Each time a new connection is created, connection. open Opens the TCP connection. When connection. Close, the connection is closed. The underlying layer of FTP is also TCP, but it is a persistent connection. Transmission of large files is faster. It depends on specific scenarios. On the server side, if the program adopts a persistent connection method, the number of connections to the server can be controlled to prevent multiple connections at the same time. However, the short connection method cannot control the number of connections connected to the server at the same time, which is also an advantage for processing a large number of connection requests at the same time. However, if the number of connection requests is too large, the server may stop working.
WebService does not need to be connected. At least tens of thousands/100,000 of requests can be supported in one second. Each request is released without any memory consumption. Generally, the number of simultaneous connections is not limited, which is an advantage. Message Queue needs to establish a connection. It is very difficult to support thousands of connections. Because each connection occupies a certain amount of storage space even if it does not request data. Limits, such as SQL Server database servers. Generally, up to 16 database servers can be connected at the same time.
The Http protocol must pass the specified port 80, so the port is not restricted on the computer. Therefore, the Http protocol can pass through the firewall on all machines smoothly. If you use Socket programming, You need to specify a specific port, so it is likely that this port is disabled in a certain environment, so it cannot penetrate the firewall. IIS uses port 80, which the program has been listening. Once you find someone wants to establish a connection to this port, he will respond and then establish a connection. All connections are short connections. Therefore, all your requests to the web site on the server are sent to the website program through port 80. Then the client browser sent through this port.
HTTP is an object-oriented protocol at the application layer. It is applicable to distributed hypermedia information systems due to its simple and fast method. It proposed in 1990 that, after several years of use and development, it has been continuously improved and expanded. Currently, the sixth version of HTTP/1.0 is used in WWW, standardization of HTTP/1.1 is in progress, and suggestions for HTTP-NG (Next Generation of HTTP) have been put forward.
The main features of HTTP are as follows:
1. Supports the customer/Server mode.
2. simple and fast: when a customer requests a service from the server, they only need to send the request method and path. Common Request methods include GET, HEAD, and POST. Each method specifies the type of contact between the customer and the server. Because the HTTP protocol is simple, the program size of the HTTP server is small, so the communication speed is fast.
3. Flexibility: HTTP allows transmission of any type of data objects. The Type being transferred is marked by Content-Type.
4. No connection: No connection means that only one request is allowed for each connection. After the server processes the customer's request and receives the customer's response, the connection is disconnected. This method can save transmission time.
5. Stateless: HTTP is stateless. Stateless means that the Protocol has no memory for transaction processing. The lack of status means that if subsequent processing requires the previous information, it must be re-transmitted, which may increase the amount of data transmitted each connection. On the other hand, when the server does not need previous information, its response is faster.
I. URL for HTTP protocol explanation
Http (Hypertext Transfer Protocol) is a stateless, application-layer protocol based on request and response modes. It is often based on TCP connections, HTTP1.1 provides a persistent connection mechanism. Most Web development applications are Web applications built on the HTTP protocol.
The format of http url (a URL is a special type of URI that contains sufficient information for searching a resource) is as follows:
Http: // host [":" port] [abs_path]
Http indicates that network resources are to be located through the HTTP protocol; host indicates a valid Internet host domain name or IP address; port specifies a port number. If it is null, the default port 80 is used; abs_path specifies the URI of the requested resource. If abs_path is not provided in the URL, it must be given in the form of "/" when it is used as the request URI. Generally, this work is automatically completed by the browser.
1. Enter www.guet.edu.cn
The browser automatically converts to: http://www.guet.edu.cn/
2. http: 192.168.0.116: 8080/index. jsp
Ii. HTTP protocol details
An http request consists of three parts: request line, message header, and request body.
1. The Request line starts with a Method symbol and is separated by spaces, followed by the Request URI and Protocol Version. The format is as follows: Method Request-uri http-Version CRLF
The Method indicates the Request Method, the Request-URI is a unified resource identifier, the HTTP-Version indicates the HTTP protocol Version of the Request, and the CRLF indicates the carriage return and line feed (except for the CRLF as the end, separate CR or LF characters are not allowed ).
There are multiple request methods (all methods are capitalized). The methods are described as follows:
GET Request to GET the resource identified by Request-URI
POST attaches new data to the resource identified by Request-URI
HEAD Request to obtain the Response Message Header of the resource identified by Request-URI
The PUT Request server stores a resource and uses Request-URI as its identifier.
The DELETE Request server deletes the resource identified by Request-URI.
TRACE Request information received by the server for testing or diagnosis
CONNECT reserved for future use
OPTIONS requests query server performance, or query resource-related OPTIONS and requirements
GET method: when you enter a URL in the address bar of the browser to access the webpage, the browser uses the GET method to obtain resources from the server. For example: GET/form.html HTTP/1.1 (CRLF)
The POST method requires the request server to accept the data attached to the request. It is often used to submit forms.
Eg: POST/reg. jsp HTTP/(CRLF)
Accept: image/gif, image/x-xbit,... (CRLF)
HOST: www.guet.edu.cn (CRLF)
Content-Length: 22 (CRLF)
Connection: Keep-Alive (CRLF)
Cache-Control: no-cache (CRLF)
(CRLF) // This CRLF indicates that the message header has ended and is previously the message header.
User = jeffrey & pwd = 1234 // the data submitted below this row
The HEAD method is almost the same as the GET method. For the response part of the HEAD request, its HTTP header contains the same information as the GET request. With this method, you do not need to transmit the entire resource content to obtain the information of the resource identified by Request-URI. This method is often used to test the validity, accessibility, and recent updates of hyperlinks.
2. Post-Request Header
3. Request body (omitted)
Iii. Response to HTTP protocol details
After receiving and interpreting the request message, the server returns an HTTP Response Message.
HTTP response is composed of three parts: Status line, message header, and response body.
1. The status line format is as follows:
HTTP-Version Status-Code Reason-Phrase CRLF
HTTP-Version indicates the HTTP protocol Version of the server, Status-Code indicates the response Status Code sent back by the server, and Reason-Phrase indicates the text description of the Status Code.
The status code consists of three numbers. The first number defines the response category and has five possible values:
1xx: indicates that the request has been received and continues to be processed.
2xx: Success-indicates that the request has been successfully received, understood, and accepted
3xx: Redirection-further operations are required to complete the request
4xx: client error-the request has a syntax error or the request cannot be implemented
5xx: Server Error -- the server fails to fulfill the valid request
Common status codes, status descriptions, and descriptions:
200 OK // client request successful
400 Bad Request // The client Request has a syntax error and cannot be understood by the server
401 Unauthorized // The request is Unauthorized. This status code must be used with the WWW-Authenticate header domain
403 Forbidden // The server receives the request but rejects the service.
404 Not Found // The requested resource does Not exist. For example, the incorrect URL is entered.
500 Internal Server Error // unexpected Server Error
503 Server Unavailable // The Server cannot process client requests currently and may return to normal after a period of time
Eg: HTTP/1.1 200 OK (CRLF)
2. Post-Response Header
3. The response body is the content of the resource returned by the server.
Iv. Explanation of HTTP protocol
An HTTP message consists of a client-to-server request and a server-to-client response. Request Message and Response Message are both from the start line (for request message, the start line is the request line, and for response message, the start line is the status line), the message header (optional ), empty line (only CRLF line), message body (optional.
HTTP message headers include common headers, request headers, response headers, and object headers.
Each header field consists of the name + ":" + space + value. The name of the message header field is case-insensitive.
1. Common Header
In a common header, there are a few header fields used for all request and response messages, but not for transmitted entities, only for transmitted messages.
Cache-Control is used to specify Cache commands. Cache commands are unidirectional (Cache commands in the response may not appear in the request ), it is independent (the cache command of one message does not affect the cache mechanism of the other message processing), and the similar header domain used by HTTP1.0 is Pragma.
Cache commands for requests include: no-cache (used to indicate that the request or response message cannot be cached), no-store, max-age, max-stale, min-fresh, only-if-cached;
Cache commands for response include public, private, no-cache, no-store, no-transform, must-revalidate, proxy-revalidate, max-age, and s-maxage.
For example, to instruct the IE browser (client) Not to Cache pages, the Server JSP program can be written as follows: response. sehHeader ("cache-Control", "no-Cache ");
// Response. setHeader ("Pragma", "no-cache"); equivalent to the above Code, usually both //
This Code sets the common header domain: Cache-Control: no-cache in the sent response message.
Date common header field indicates the Date and time of message generation
The Connection common header field allows sending the specified Connection option. For example, if the specified connection is continuous or the "close" option is specified, a notification is sent to the server. After the response is complete, the connection is closed.
2. Request Header
The request header allows the client to send additional request information and client information to the server.
Common request headers
The Accept request header field is used to specify the types of information the client accepts. Eg: Accept: image/gif indicates that the client wants to Accept resources in the GIF image format; Accept: text/html indicates that the client wants to Accept html text.
The Accept-Charset request header field is used to specify the character set accepted by the client. Eg: Accept-Charset: iso-8859-1, gb2312. if this field is not set in the request message, it is acceptable by default for any character set.
The Accept-Encoding Request Header domain is similar to Accept, but it is used to specify acceptable content Encoding. Eg: Accept-Encoding: gzip. deflate. If the domain server is not set in the request message, it is assumed that the client can Accept all content Encoding.
The Accept-Language Request Header domain is similar to Accept, but it is used to specify a natural Language. Eg: Accept-Language: zh-cn. If this header field is not set in the request message, the server assumes that the client is acceptable to all languages.
The Authorization request header domain is used to prove that the client has the right to view a resource. When a browser accesses a page, if the response code of the server is 401 (unauthorized), it can send a request containing the Authorization request header domain, requiring the server to verify the request.
Host (this header field is required when a request is sent)
The Host request header field is used to specify the Internet Host and port number of the requested resource. It is usually extracted from the http url. For example:
We enter: http://www.guet.edu.cn/index.html in the browser
The request message sent by the Browser contains the Host Request Header domain, as follows:
The default port number is 80. If the port number is specified, it is changed to: Host: www.guet.edu.cn: the specified port number.
When we log on to the forum online, we will often see some welcome information, which lists the names and versions of your operating system, the names and versions of your browsers, this is often amazing for many people. In fact, the server application obtains this information from the User-Agent Request Header domain. The User-Agent request header field allows the client to tell the server its operating system, browser, and other attributes. However, this header field is not required. If we write a browser and do not use the User-Agent to request the header field, the server will not be able to know our information.
Example of request header:
GET/form.html HTTP/1.1 (CRLF)
Accept: image/gif, image/x-xbitmap, image/jpeg, application/x-shockwave-flash, application/vnd. ms-excel, application/vnd. ms-powerpoint, application/msword, */* (CRLF)
Accept-Language: zh-cn (CRLF)
Accept-Encoding: gzip, deflate (CRLF)
If-Modified-Since: Wed, 05 Jan 2007 11:21:25 GMT (CRLF)
If-None-Match: W/"80b1a4c018f3c41: 8317" (CRLF)
User-Agent: Mozilla/4.0 (compatible; MSIE6.0; Windows NT 5.0) (CRLF)
Host: www.guet.edu.cn (CRLF)
Connection: Keep-Alive (CRLF)
3. Response Header
The Response Header allows the server to transmit additional response information that cannot be placed in the status line, as well as information about the server and the next access to the resource identified by the Request-URI.
Common Response Headers
The Location response header field is used to redirect the receiver to a new Location. Location response header fields are often used when domain names are changed.
The Server response header contains the software information used by the Server to process requests. It corresponds to the User-Agent Request Header domain. Below is
An example of the Server response header domain:
The WWW-Authenticate Response Header domain must be included in the 401 (unauthorized) Response Message. When the client receives the 401 Response Message and sends the Authorization Header domain request server to verify the message, the server response header contains this header field.
Eg: WWW-Authenticate: Basic realm = "Basic Auth Test! "// You can see that the server uses a basic authentication mechanism for requested resources.
4. Object Header
Both request and response messages can be transmitted as an entity. An object consists of the object header domain and the Object Body, but it does not mean that the object header domain and the Object Body must be sent together, but only the object header domain can be sent. The object header defines metadata about the Object Body (eg: whether there is an entity body) and the resource identified by the request.
Common Object Headers
The Content-Encoding object header field is used as a modifier of the media type. Its value indicates the Encoding of additional Content that has been applied to the Object Body, to obtain the media types referenced in the Content-Type header field, the corresponding decoding mechanism must be adopted. Such as Content-Encoding, which is used to record the File compression method, eg: Content-Encoding: gzip
The Content-Language object header field describes the natural Language used by the resource. If this field is not set, the entity content will be provided to all languages for reading.
. Eg: Content-Language: da
The Content-Length object header field is used to specify the Length of the Object Body, which is represented by a decimal number stored in bytes.
The Content-Type object header field specifies the media Type of the Object Body sent to the recipient. Eg:
Content-Type: text/html; charset = ISO-8859-1
Content-Type: text/html; charset = GB2312
The Last-Modified object header field is used to indicate the Last modification date and time of the resource.
The Expires object header field specifies the response expiration date and time. To enable the proxy server or browser to update the cache after a period of time (when accessing the previously visited page again, load the page directly from the cache, shorten the response time and reduce the server load, we can use the Expires object header field to specify the page expiration time. Eg: Expires: Thu, 15 Sep 2006 16:23:12 GMT
The client and cache of HTTP1.1 must regard other illegal date formats (including 0) as expired. Eg: to prevent the browser from caching pages, we can also use the Expires object header field to set it to 0. The jsp program is as follows: response. setDateHeader ("Expires", "0 ");
5. Use telnet to observe the communication process of the http protocol
Purpose and principle of the experiment:
Using the MS telnet tool, you can manually enter the http request information to send a request to the server. After the server receives, interprets, and accepts the request, a response is returned, the response will be displayed in the telnet window, so as to enhance the understanding of the http communication process from the perceptual aspect.
1. Enable telnet
1.1 Enable telnet
Run --> cmd --> telnet
1.2 Enable telnet echo
2. Connect to the server and send a request
2.1 open www.guet.edu.cn 80 // note that the port number cannot be omitted
Headers/index. asp HTTP/1.0
/* You can change the Request Method and request the content of the Guilin homepage. Enter the following message */
Open www.guet.edu.cn 80
GET/index. asp HTTP/1.0 // request resource content
2.2 open www.sina.com.cn 80 // enter telnet www.sina.com.cn 80 directly under the command prompt symbol
Headers/index. asp HTTP/1.0
3. Experiment results:
3.1 Request Information 2.1 the response is:
HTTP/1.1 200 OK // request successful
Server: Microsoft-IIS/5.0 // web Server
Date: Thu, 08 Mar 200707: 17: 51 GMT
Expries: Thu, 08 Mar 2007 07:16:51 GMT
Set-Cookie: ASPSESSIONIDQAQBQQQB = BEJCDGKADEDJKLKKAJEOIMMH; path =/
// Resource content omitted
3.2 Request Information 2.2 The response is:
HTTP/1.0 404 Not Found // request failed
Date: Thu, 08 Mar 2007 07:50:50 GMT
Server: Apache/2.0.54 <Unix>
Last-Modified: Thu, 30 Nov 2006 11:35:41 GMT
X-Cache: MISS from zjm152-78.sina.com.cn
Via: 1.0 zjm152-78.sina.com.cn: 80 <squid/2.6.STABLES-20061207>
X-Cache: MISS from th-143.sina.com.cn
Lost connection to the host
Press any key to continue...
4. Note: 1. If an input error occurs, the request will not succeed.
2. the header domain is case-insensitive.
3. For more information about the HTTP protocol, see RFC2616 and find the file at http://www.letf.org/rfc.
4. the development background program must master the http protocol
VI,HTTP-related technical supplements
High-level protocols include file transfer protocol (FTP), email Transmission Protocol (SMTP), Domain Name System Service (DNS), network news Transmission Protocol (NNTP), and HTTP.
There are three types of mediation: Proxy, Gateway, and Tunnel. A Proxy accepts the request according to the absolute format of the URI and overrides all or part of the message, send formatted requests to the server using the uri id. The gateway is a receiving proxy and serves as the upper layer of some other servers. If necessary, you can translate the request to the lower layer server protocol. A channel serves as a relay point between two connections that do not change messages. A channel is often used when communication requires an intermediary (such as a firewall) or an intermediary that cannot identify messages.
Proxy: An intermediate program that can act as a server or a client and create a request for other clients. Requests are transmitted to other servers through possible translation. A proxy must explain before sending the request information and rewrite it if possible. A proxy is often used as a portal through a firewall client. A proxy can also be used as a help application to handle requests that are not completed by a user proxy through the Protocol.
Gateway: a server that acts as an intermediate medium for other servers. Different from the proxy, the gateway accepts the request as if it is the source server for the requested resource; the client sending the request does not realize that it is dealing with the gateway.
The gateway is often used as a portal for servers that use firewalls. The Gateway can also be used as a protocol translator to access resources stored in non-HTTP systems.
Tunnel: it is an intermediary program used as two connection relay. Once activated, the channel is considered not to belong to HTTP Communication, although the channel may be initialized by an HTTP request. When the two ends of the relay connection are closed, the channel disappears. The channel is frequently used when a Portal must exist or Intermediary cannot interpret the relay communication.
2. Protocol Analysis advantages-HTTP analyzer detects Network Attacks
Analyzing and processing high-level protocols in a modular manner will be the direction of future intrusion detection.
Common ports 80, 3128, and 8080 of HTTP and its proxies are specified using the port label in the network section.
3. HTTP Content Lenth restriction vulnerability resulting in DoS Attacks
When using the POST method, you can set ContentLenth to define the length of the data to be transmitted, for example, ContentLenth: 999999999. Before the transfer is complete, the internal storage will not be released. Attackers can exploit this vulnerability, send junk data to the WEB server until the memory of the WEB server is exhausted. This attack method basically does not leave any trace.
4. conception of DoS attacks using the characteristics of HTTP
The server is busy processing the attacker's forged TCP connection requests and ignoring the client's normal requests (after all, the client's normal request rate is very small). From the perspective of normal customers, the server loses response, which is called SYNFlood attack (SYN Flood attack) on the server ).
Smurf and TearDrop use ICMP packets to attack Flood and IP fragments. This article uses the "normal connection" method to generate DoS attacks.
Port 19 has been used for Chargen attacks in the early stage, that is, Chargen_Denial_of_Service,! The method they use is to generate a UDP connection between the two Chargen servers so that the server can process too much information and get DOWN. Therefore, there must be two conditions for killing a WEB server: 1. chargen service 2. HTTP service available
Method: The attacker spoofs the source IP address and sends a connection request (Connect) to N Chargen servers. After receiving the connection, Chargen returns a 72-byte rst stream per second (based on actual network conditions, this is faster) to the server.
5. Http Fingerprint Recognition Technology
The principle of Http fingerprint recognition is also the same: records the tiny differences in Http protocol execution by different servers. http fingerprint recognition is much more complex than TCP/IP stack fingerprint recognition, because custom Http server configuration files, adding plug-ins or components make it easy to change Http response information, this makes it difficult to identify; however, the custom TCP/IP stack behavior needs to be modified on the core layer, so it is easy to identify.
It is very easy for the server to return different Banner information. For an open-source Http server like Apache, users can modify the Banner information in the source code, then the Http service will take effect again. For Http servers that do not have open source code, such as Microsoft's IIS or Netscape, you can modify it in the Dll file that stores Banner information, relevant articles have been discussed. I will not go into details here. Of course, this modification is still very effective. another method to blur Banner information is to use plug-ins.
Common Test requests:
1: Send basic Http requests to HEAD/Http/1.0
2: DELETE/Http/1.0 sends unpermitted requests, such as Delete requests
3: GET/Http/3.0 sends an invalid Http Request
4: GET/JUNK/1.0 sends an incorrect Http Request
Http fingerprint recognition tool Httprint can effectively determine the type of Http server by combining fuzzy logic technology with statistical principles. it can be used to collect and analyze signatures generated by different Http servers.
6. Others: to improve the performance of your browser, modern browsers also support concurrent access. Multiple connections are established when you browse a Web page, in order to quickly obtain multiple icons on a web page, this can more quickly complete the transmission of the entire web page.
HTTP1.1 provides this continuous connection method, while the next-generation HTTP protocol: HTTP-NG has increased support for session control, rich content negotiation and other methods to provide
More efficient connection.