From: http://blog.csdn.net/gueter/article/details/1524447
HTTP is an object-oriented protocol at the application layer. It is applicable to distributed hypermedia information systems due to its simple and fast method. It proposed in 1990 that, after several years of use and development, it has been continuously improved and expanded. Currently, the sixth version of HTTP/1.0 is used in WWW, standardization of HTTP/1.1 is in progress, and suggestions for HTTP-NG (Next Generation of HTTP) have been put forward. The main features of HTTP protocol can be summarized as follows: 1. Support for customer/Server modes. 2. simple and fast: when a customer requests a service from the server, they only need to send the request method and path. Common Request methods include get, Head, and post. Each method specifies the type of contact between the customer and the server. Because the HTTP protocol is simple, the program size of the HTTP server is small, so the communication speed is fast. 3. Flexibility: HTTP allows transmission of any type of data objects. The type being transferred is marked by Content-Type. 4. No connection: No connection means that only one request is allowed for each connection. After the server processes the customer's request and receives the customer's response, the connection is disconnected. This method can save transmission time. 5. Stateless: HTTP is stateless. Stateless means that the Protocol has no memory for transaction processing. The lack of status means that if subsequent processing requires the previous information, it must be re-transmitted, which may increase the amount of data transmitted each connection. On the other hand, when the server does not need previous information, its response is faster.
I. url for HTTP protocol explanation
HTTP (Hypertext Transfer Protocol) is a stateless, application-layer protocol based on request and response modes. It is often based on TCP connections, http1.1 provides a persistent connection mechanism. Most Web development applications are Web applications built on the HTTP protocol.
The format of http url (a URL is a special type of URI that contains sufficient information for searching a resource) is as follows: http: // host [": "port] [abs_path] HTTP indicates that network resources are to be located through the HTTP protocol; host indicates a valid Internet host domain name or IP address; port indicates a port number. If it is null, the default port 80 is used; abs_path specifies the URI of the requested resource. If abs_path is not provided in the URL, it must be given in the form of "/" when it is used as the request URI. Generally, this work is automatically completed by the browser. Enter: www.guet.edu.cn browser automatically converted to: http://www.guet.edu.cn/2, http: 192.168.0.116: 8080/index. jsp
Ii. http protocol details
An HTTP request consists of three parts: request line, message header, and request body.
1. The request line starts with a method symbol and is separated by spaces, followed by the requested URI and Protocol version. The format is as follows: method Request-Uri http-version CRLF where method represents the request method; Request-Uri is a unified resource identifier; http-version represents the HTTP protocol version of the request; CRLF indicates carriage return and line feed (except for CRLF at the end, separate CR or lf characters are not allowed ).
There are multiple request methods (all methods are capitalized). The methods are described as follows: GET request to get the resource post identified by request-Uri after the resource identified by request-Uri is appended with a new data head request to get the response message header of the resource identified by request-Uri put request the server to store a resource, use request-Uri as its identifier. The Delete request server deletes the resource trace information identified by request-Uri, it is mainly used to test or diagnose whether connect retains the performance of the server to be queried using options requests in the future, or to query resource-related options and requirements. Example: Get method: when you enter a URL in the address bar of a browser to access the webpage, the browser uses the get method to obtain resources from the server. For example: Get/form.html HTTP/1.1 (CRLF)
The post method requires the request server to accept the data attached to the request. It is often used to submit forms. Eg: Post/Reg. jsp HTTP/(CRLF) accept: image/GIF, image/X-xbit ,... (CRLF )... HOST: www.guet.edu.cn (CRLF) Content-Length: 22 (CRLF) connection: keep-alive (CRLF) cache-control: No-Cache (CRLF) // This CRLF indicates that the message header has ended. Before that, the message header user = Jeffrey & Pwd = 1234 // the data submitted below this row
The head method is almost the same as the get method. For the response part of the head request, its HTTP header contains the same information as the GET request. With this method, you do not need to transmit the entire resource content to obtain the information of the resource identified by request-Uri. This method is often used to test the validity, accessibility, and recent updates of hyperlinks. 2. Post-Request Header 3. Request body (omitted)
Iii. Response to HTTP protocol details
After receiving and interpreting the request message, the server returns an HTTP Response Message.
HTTP response is composed of three parts: Status line, message header, response body 1, status line format: http-version status-code reason-phrase CRLF, HTTP-version indicates the HTTP protocol version of the server. Status-code indicates the response status code sent back by the server. Reason-phrase indicates the text description of the status code. The status code consists of three numbers. The first number defines the response type and has five possible values: 1xx: indication information-indicating that the request has been received and the request continues to process 2XX: success-indicates that the request has been successfully received, understood, and accepted 3xx: Redirection-to complete the request, you must perform further operations 4xx: client error-the request has a syntax error or the request cannot be implemented 5xx: server-side error -- common status code, status description, and description for failed requests by the server: 200 OK // The client request is successful. 400 bad request // The client request has a syntax error, cannot be understood by the server as 401 unauthorized // The request is unauthorized. This status code must be used with the WWW-authenticate report // header domain together with the 403 Forbidden // server to receive the request, however, the 404 Not found request does not exist. For example: incorrect URL 500 internal server error // Unexpected error occurred on the server 503 server unavailable // The server cannot process client requests currently. After a while, // may return to normal eg: HTTP/1.1 200 OK (CRLF)
2. Post-Response Header
3. The response body is the content of the resource returned by the server.
Iv. Explanation of HTTP protocol
An HTTP message consists of a client-to-server request and a server-to-client response. Request Message and Response Message are both from the start line (for request message, the start line is the request line, and for response message, the start line is the status line), the message header (optional ), empty line (only CRLF line), message body (optional.
HTTP message headers include common headers, request headers, response headers, and object headers. Each header field consists of the name + ":" + space + value. The name of the message header field is case-insensitive.
1. Common headers in common headers, there are a few header fields used for all request and response messages, but not for transmitted entities, only for transmitted messages. Eg: cache-control is used to specify cache commands. cache commands are unidirectional (Cache commands in the response may not appear in the request ), it is independent (the cache command of one message does not affect the cache mechanism of the other message processing), and the similar header domain used by http1.0 is Pragma. Cache commands for requests include: No-Cache (used to indicate that the request or response message cannot be cached), No-store, Max-age, Max-stale, Min-fresh, and only-if-cached; cache commands for response include: public, private, no-cache, no-store, no-transform, must-revalidate, proxy-revalidate, Max-age, S-maxage. for example, to instruct IE browser (client) Not to cache pages, you can write the Server JSP program as follows: response. sehheader ("cache-control", "No-Cache"); // response. setheader ("Pragma", "No-Cache"); serves the same purpose as the above Code. Generally, the code of both // sets the common header domain in the response message to be sent: cache-control: No-Cache
Date common header field indicates the date and time of message generation
The connection common header field allows sending the specified connection option. For example, if the specified connection is continuous or the "close" option is specified, a notification is sent to the server. After the response is complete, the connection is closed.
2. Request Header the request header allows the client to transmit additional information of the request and the client's own information to the server. The common request header ACCEPT is used to specify the types of information the client accepts. Eg: accept: image/GIF indicates that the client wants to accept resources in the GIF image format; accept: text/html indicates that the client wants to accept HTML text. The accept-charset request header field is used to specify the character set accepted by the client. Eg: Accept-charset: iso-8859-1, gb2312. if this field is not set in the request message, it is acceptable by default for any character set. The accept-encoding Request Header domain is similar to accept, but it is used to specify acceptable content encoding. Eg: Accept-encoding: gzip. Deflate. If the domain server is not set in the request message, it is assumed that the client can accept all content encoding. The accept-language Request Header domain is similar to accept, but it is used to specify a natural language. Eg: Accept-language: ZH-CN. If this header field is not set in the request message, the server assumes that the client is acceptable to all languages. Authorization authorization request header domain is mainly used to prove that the client has the right to view a resource. When a browser accesses a page, if the response code of the server is 401 (unauthorized), it can send a request containing the authorization request header domain, requiring the server to verify the request. Host (this header field is required when a request is sent) The host request header field is used to specify the Internet host and port number of the requested resource. It is usually extracted from the http url, for example: enter in the browser: The request message sent by the http://www.guet.edu.cn/index.html browser will contain the host Request Header domain, as follows: Host: www.guet.edu.cn here using the default port 80, if the port number is specified, it becomes: Host: www.guet.edu.cn: Specifies the port number. When we log on to the forum online, we will often see some welcome information, which lists the name and version of your operating system, the name and version of the browser you use are often amazing. In fact, the server application obtains the information from the User-Agent Request Header domain. The User-Agent request header field allows the client to tell the server its operating system, browser, and other attributes. However, this header field is not required. If we write a browser and do not use the User-Agent to request the header field, the server will not be able to know our information. Example of request header: Get/form.html HTTP/1.1 (CRLF) accept: image/GIF, image/X-xbitmap, image/JPEG, application/X-Shockwave-flash, application/vnd. MS-Excel, application/vnd. MS-PowerPoint, application/MSWord, */* (CRLF) Accept-language: ZH-CN (CRLF) Accept-encoding: gzip, deflate (CRLF) If-modified-since: wed, 05 Jan 2007 11:21:25 GMT (CRLF) If-None-Match: W/"80b1a4c018f3c41: 8317" (CRLF) User-Agent: Mozilla/4.0 (compatible; msie6.0; windows NT 5.0) (CRLF) Host: www.guet.edu.cn (CRLF) connection: keep-alive (CRLF)
3. The response header allows the server to transmit additional response information that cannot be placed in the status line, as well as information about the server and the information about the next access to the resource identified by the request-Uri. Common Response Header location Response Header domain is used to redirect the recipient to a new location. Location response header fields are often used when domain names are changed. The response header field of the server contains the software information used by the server to process requests. It corresponds to the User-Agent Request Header domain. The following is an example of the server response header domain: SERVER: APACHE-Coyote/1.1 www-authenticate the Response Header domain must be included in the 401 (unauthorized) Response Message, when the client receives a 401 Response Message and sends an Authorization Header domain request to the server for verification, the server response header contains this header domain. Eg: www-Authenticate: Basic realm = "basic auth test! "// You can see that the server uses a basic authentication mechanism for requested resources.
4. Each object header request and response message can transmit an object. An object consists of the object header domain and the Object Body, but it does not mean that the object header domain and the Object Body must be sent together, but only the object header domain can be sent. The object header defines metadata about the Object Body (eg: whether there is an entity body) and the resource identified by the request. The content-encoding object header field is used as a modifier of the media type. Its value indicates the encoding of additional content that has been applied to the Object Body, to obtain the media types referenced in the Content-Type header field, the corresponding decoding mechanism must be adopted. The content-encoding method is used to record the compression method of a document. The Eg: Content-encoding: gzip content-language entity header field describes the natural language used by the resource. If this field is not set, the entity content is provided to all language readers. Eg: Content-language: Da Content-Length the object header field is used to specify the length of the Object Body, represented by a decimal number stored in bytes. Content-Type object header field terms indicate the media type of the Object Body sent to the recipient. Eg: Content-Type: text/html; charset = ISO-8859-1 Content-Type: text/html; charset = gb2312 last-modified object header field is used to indicate the last modification date and time of the resource. The expires object header field specifies the response expiration date and time. To enable the proxy server or browser to update the cache after a period of time (when accessing the previously visited page again, load the page directly from the cache, shorten the response time and reduce the server load, we can use the expires object header field to specify the page expiration time. Clients and caches For eg: expires: Thu, 15 Sep 2006 16:23:12 GMT http1.1 must treat other illegal date formats (including 0) as expired. Eg: to prevent the browser from caching pages, we can also use the expires object header field to set it to 0. The JSP program is as follows: Response. setdateheader ("expires", "0 ");
5. Use telnet to observe the communication process of the HTTP protocol
Tutorial purpose and principle: Use the MS Telnet tool to manually input HTTP request information to send a request to the server. After the server receives, interprets, and accepts the request, a response is returned, the response will be displayed in the Telnet window, so as to enhance the understanding of the HTTP communication process from the perceptual aspect.
Tutorial steps:
1. Enable telnet 1.1 and enable telnet to run --> cmd --> Telnet
1.2 enable the Telnet echo function set localecho
2. Connect to the server and send the request 2.1 Open www.guet.edu.cn 80 // note that the port number cannot be omitted
Head/index. asp http/1.0 HOST: www.guet.edu.cn/* We can change the Request Method to request the content of the Guilin electronic homepage. Enter the message as follows */Open www.guet.edu.cn 80 get/index. asp http/1.0 // request resource content HOST: www.guet.edu.cn
2.2 Open www.sina.com.cn 80 // enter Telnet www.sina.com.cn 80 head/index. asp HTTP/1.0 HOST: www.sina.com.cn
3. Experiment results:
3.1 Request Information 2.1 the response is:
HTTP/1.1 200 OK // request succeeded server: Microsoft-IIS/5.0 // web server Date: Thu, 08 mar 200707: 17: 51 GMT connection: keep-alive Content-Length: 23330 Content-Type: text/html expries: Thu, 08 Mar 2007 07:16:51 GMT set-COOKIE: aspsessionidqaqbqqqb = bejcdgkadedjklkkajeoimmh; Path =/cache-control: Private
// Resource content omitted
3.2 Request Information 2.2 The response is:
HTTP/1.0 404 Not found // request failure Date: Thu, 08 Mar 2007 07:50:50 GMT server: Apache/2.0.54 <UNIX> last-modified: Thu, 30 Nov 2006 11:35:41 GMT etag: "usage" Accept-ranges: bytes X-powered-by: mod_xlayout_ary/0.0.1vhs.markii.remix vary: Accept-encoding Content-Type: text/html X-Cache: Miss from zjm152-78.sina.com.cn: 1.0 zjm152-78.sina.com.cn: 80 <squid/2.6.stables-20061207> X-Cache: Miss from th-143.sina.com.cn connection: Close
Lost connection to the host
Press any key to continue...
4. Note: 1. If an input error occurs, the request will not succeed. 2. the header domain is case-insensitive. 3. For more information about the HTTP protocol, see rfc2616 and find the file at http://www.letf.org/rfc. 4. the development background program must master the HTTP protocol
VI,HTTP-related technical supplements
1. Basic: High-level protocols include file transfer protocol FTP, email transmission protocol SMTP, Domain Name System Service DNS, network news transmission protocol NNTP and HTTP: proxy, gateway, and tunnel. A proxy accepts the request according to the absolute format of the URI and overrides all or part of the message, send formatted requests to the server using the uri id. The gateway is a receiving proxy and serves as the upper layer of some other servers. If necessary, you can translate the request to the lower layer server protocol. A channel serves as a relay point between two connections that do not change messages. A channel is often used when communication requires an intermediary (such as a firewall) or an intermediary that cannot identify messages. Proxy: An intermediate program that can act as a server or a client and create a request for other clients. Requests are transmitted to other servers through possible translation. A proxy must explain before sending the request information and rewrite it if possible. A proxy is often used as a portal through a firewall client. A proxy can also be used as a help application to handle requests that are not completed by a user proxy through the Protocol. Gateway: a server that acts as an intermediate medium for other servers. Different from the proxy, the gateway accepts the request as if it is the source server for the requested resource; the client sending the request does not realize that it is dealing with the gateway. The gateway is often used as a portal for servers that use firewalls. The Gateway can also be used as a protocol translator to access resources stored in non-HTTP systems. Tunnel: it is an intermediary program used as two connection relay. Once activated, the channel is considered not to belong to HTTP Communication, although the channel may be initialized by an HTTP request. When the two ends of the relay connection are closed, the channel disappears. The channel is frequently used when a portal must exist or intermediary cannot interpret the relay communication.
2. Protocol Analysis advantages-HTTP analyzer detects network attacks and analyzes and processes high-level protocols in a modular manner. This will be the direction of future intrusion detection. Common ports 80, 3128, and 8080 of HTTP and its proxies are specified using the port label in the network section.
3. When using the POST method, you can set contentlenth to define the length of data to be transmitted, for example, contentlenth: 999999999. Before the transfer is completed, internal Storage will not be released. Attackers can exploit this vulnerability to continuously send junk data to the Web server until the Web server memory is exhausted. This attack method basically does not leave any trace. Http://www.cnpaf.net/Class/HTTP/0532918532667330.html
4. Some Ideas about DoS attacks using the characteristics of the HTTP protocol some ideas the server is busy processing the attacker's forged TCP connection requests and ignoring the normal requests of the customer (after all, the normal Request Rate of the client is very high) small ), in this case, from the perspective of normal customers, the server loses response. This situation is called synflood attack (SYN Flood attack) on the server ). Smurf and Teardrop use ICMP packets to attack flood and IP fragments. This article uses the "normal connection" method to generate DoS attacks. Port 19 has been used for chargen attacks in the early stage, that is, chargen_denial_of_service,! The method they use is to generate a UDP connection between the two chargen servers so that the server can process too much information and get down. Therefore, there must be two conditions for killing a web server: 1. chargen service 2. there are HTTP service methods: attackers forge source IP addresses and send CONNECT requests to N chargen servers. After chargen receives the connection, it returns a 72-byte rst stream per second (based on the actual network situation, this is faster) to the server.
5. Http fingerprint recognition technology the principle of HTTP fingerprint recognition is basically the same: record the tiny differences in HTTP protocol execution by different servers. HTTP fingerprint recognition is much more complex than TCP/IP stack fingerprint recognition, because custom HTTP server configuration files, adding plug-ins or components make it easy to change HTTP response information, this makes it difficult to identify; however, the custom TCP/IP stack behavior needs to be modified on the core layer, so it is easy to identify. it is very easy for the server to return different banner information. For an open-source HTTP server like Apache, users can modify the banner information in the source code, then the HTTP service will take effect again. For HTTP servers that do not have open source code, such as Microsoft's IIS or Netscape, you can modify it in the DLL file that stores banner information, relevant articles have been discussed. I will not go into details here. Of course, this modification is still very effective. another method to blur banner information is to use plug-ins. Common Test requests: 1: Head/HTTP/1.0 send basic HTTP requests 2: delete/HTTP/1.0 send those not allowed requests, such as Delete request 3: GET/HTTP/3.0 send an illegal version of HTTP Protocol Request 4: Get/junk/1.0 send an incorrect HTTP Protocol Request HTTP fingerprint recognition tool httprint, it uses statistical principles and fuzzy logic technology to effectively determine the type of HTTP server. it can be used to collect and analyze signatures generated by different HTTP servers.
6. Others: to improve the performance of your browser, modern browsers also support concurrent access. Multiple connections are established when you browse a Web page, in order to quickly obtain multiple icons on a web page, this can more quickly complete the transmission of the entire web page. Http1.1 provides this continuous connection method, while the next-generation HTTP protocol: HTTP-NG increases support for session control, rich content negotiation and other methods to provide more efficient connections.