HTTP protocol Introduction
HTTP is an object-oriented protocol belonging to the application layer, which is suitable for distributed hypermedia information System because of its simple and fast way. It was proposed in 1990, after several years of use and development, has been continuously improved and expanded. Currently used in the WWW is the sixth edition of Http/1.0, http/1.1 standardization work is in progress, and Http-ng (Next Generation of HTTP) has been proposed.
The main features of the HTTP protocol can be summarized as follows:
1. Support client/server mode. 2. Simple and fast: When a customer requests a service from the server, it simply transmits the request method and path. The request method commonly has, POST. Each method specifies a different type of contact between the customer and the server. Because the HTTP protocol is simple, the HTTP server's program size is small, so the communication speed is fast. 3. Flexible: HTTP allows the transfer of any type of data object. The type being transmitted is marked by Content-type. 4. No connection: The meaning of no connection is to limit the processing of only one request per connection. When the server finishes processing the customer's request and receives the customer's answer, the connection is disconnected. In this way, the transmission time can be saved. 5. Stateless: The HTTP protocol is a stateless protocol. Stateless means that the protocol has no memory capacity for transactional processing. A lack of state means that if the previous information is required for subsequent processing, it must be re-routed, which may cause the amount of data to be transferred per connection to increase. On the other hand, it responds faster when the server does not need the previous information.
First, the HTTP protocol detailed URL Chapter
HTTP (Hypertext Transfer Protocol) is a request-and-response mode-based, stateless, application-level protocol, often based on TCP connection, HTTP1.1 version of a continuous connection mechanism, the vast majority of web development, is built on the HTTP protocol on the Web application.
The HTTP URL (url is a special type of URI that contains enough information to find a resource) in the following format:
http://host[":" Port][abs_path]http means to locate network resources through the HTTP protocol, host represents a legitimate Internet host domain name or IP address, port specifies a port number, and NULL uses the default port 80;abs_ path Specifies the URI of the requested resource, and if Abs_path is not given in the URL, it must be given as a "/" when it is the request URI, which is usually done automatically by the working browser.
eg
1, Input: www.guet.edu.cn browser automatically converted to: HTTP://WWW.GUET.EDU.CN/2, http:192.168.0.116:8080/index.jsp
Second, HTTP protocol detailed request the HTTP request consists of three parts, namely: Request line, message header, request body
1, the request line begins with a method symbol, separated by a space, followed by the requested URI and version of the Protocol
The format is as follows: Method Request-uri http-version CRLF
Where method means the request; Request-uri is a uniform resource identifier; Http-version represents the HTTP protocol version of the request; CRLF represents a carriage return and a newline (except for the end of CRLF, a separate CR or LF character is not allowed).
There are several ways to request a method (all uppercase), each of which is interpreted as follows: Get request gets the resource identified by Request-uri
Post appends new data to the resource identified by Request-uri
The HEAD request gets the response message for the resource identified by Request-uri the put request server stores a resource and uses Request-uri as its identity delete request server to delete the resource identified by Request-uri
TRACE requests the server to echo received request information, primarily for testing or diagnostics
CONNECT reserved for future use
The options request query server performance, or query resource-related options and Requirements application Example: Get method: When you access a Web page by entering a URL in the address bar of the browser, the browser uses the Get method to get the resource to the server, eg:get/form.html http/1.1 ( CRLF)
The Post method requires the requested server to accept the data appended to the request and is often used to submit the form. eg:post/reg.jsp http/(CRLF) accept:image/gif,image/x-xbit,... (CRLF)
...
HOST:www.guet.edu.cn (CRLF)
Content-length:22 (CRLF)
Connection:keep-alive (CRLF)
Cache-control:no-cache (CRLF)
(CRLF)//The CRLF indicates that the message header has ended, preceded by a message header user=jeffrey&pwd=1234//This line is submitted with the following data
The head method is almost the same as the Get method, and for the response part of the head request, the information contained in the HTTP header is the same as the information obtained through the GET request. Using this method, you can obtain information about the resources identified by Request-uri without transmitting the entire resource content. This method is commonly used to test the validity of hyperlinks, whether they can be accessed, and whether they have been updated recently. 2. After the request header is described
3. Request body (slightly)
Iii. The response of the HTTP protocol in detail after receiving and interpreting the request message, the server returns an HTTP response message.
The HTTP response is also made up of three parts: the status line, the message header, the response body 1, and the status line format as follows:
Http-version Status-code reason-phrase CRLF which, http-version represents the version of the server HTTP protocol; Status-code indicates the response status code sent back by the server; Reason-phrase represents a textual description of the status code. The status code consists of three digits, the first number defines the category of the response, and there are five possible values:
1XX: Indicates that the request has been received and continues processing 2xx: Success-Indicates that the request has been successfully received, understood, and accepted 3xx: Redirect-A further action must be taken to complete the request 4xx: Client Error--Request syntax error or request cannot be implemented 5XX: Server-Side error--the server failed to implement a legitimate request common status code, status description, Description:
$ OK//client request succeeded
Bad Request//client requests have syntax errors and cannot be understood by the server
401 Unauthorized//request unauthorized, this status code must be used with the Www-authenticate header field
403 Forbidden//server receives request, but refuses to provide service
404 Not Found//request resource not present, eg: Wrong URL entered
Internal Server error//server unexpected errors
503 Server Unavailable//server is currently unable to process client requests and may return to normal after some time
eg:http/1.1 OK (CRLF)
2. After the response header 3, the response body is the content of the resources returned by the server
Iv. HTTP protocol Details of the message header chapter
HTTP messages consist of client-to-server requests and server-to-client responses. Both the request message and the response message are from the start line (for the request message, the start line is the request line, for the response message, the start line is the status line), the message header (optional), the empty line (only the CRLF line), and the message body (optional) is composed.
The HTTP message header includes the normal header, the request header, the response header, and the entity header. Each header field consists of a name + ":" + a Space + value, and the name of the message header field is case-insensitive.
1. Normal header
In the normal header, a small number of header fields are used for all request and response messages, but not for the transferred entity, only for the transmitted messages.
eg
The Cache-control is used to specify the cache instruction, the cache instruction is unidirectional (the cache instruction appearing in the response may not appear in the request), and is independent (the cache instruction of one message does not affect the caching mechanism of another message processing), and HTTP1.0 uses a similar header domain of pragma. Cache directives on request include: No-cache (used to indicate that a request or response message cannot be cached), No-store, Max-age, Max-stale, Min-fresh, only-if-cached; Private, No-cache, No-store, No-transform, Must-revalidate, Proxy-revalidate, Max-age, s-maxage.eg: to indicate IE browser (client) Do not cache the page, the server-side JSP program can be written as follows: Response.sehheader ("Cache-control", "No-cache");//response.setheader ("Pragma", "No-cache function equivalent to the above code, usually both//the code will set the normal header field in the Sent response message: Cache-control:no-cache
The Date normal header field indicates the day and time the message was generated connection the normal header field allows the option to send a specified connection. For example, specify whether the connection is continuous, or
Specify the "Close" option to notify the server to close the connection after the response is complete
2. The request header request header allows the client to pass additional information about the request to the server side, as well as the client itself. Common Request Headers
The Acceptaccept request header field is used to specify which types of information the client accepts. Eg:accept:image/gif, indicating that the client wants to accept the GIF image format resources; Accept:text/html, indicating that the client wants to accept HTML text.
The Accept-charsetaccept-charset request header field is used to specify the character set accepted by the client. eg:accept-charset:iso-8859-1,gb2312. If the field is not set in the request message, the default is to accept any character set.
The Accept-encodingaccept-encoding request header field is similar to accept, but it is used to specify acceptable content encoding. Eg:accept-encoding:gzip.deflate. If the domain server is not set in the request message, the client is assumed to be acceptable for various content encodings.
Accept-language
The Accept-language request header field is similar to accept, but it is used to specify a natural language. EG:ACCEPT-LANGUAGE:ZH-CN. If the header field is not set in the request message, the server assumes that the client is acceptable for each language.
The Authorizationauthorization request header domain is primarily used to prove that a client has permission to view a resource. When a browser accesses a page, if a response code of 401 (unauthorized) is received from the server, a request containing the authorization request header domain can be sent, requiring the server to validate it. Host (the header domain is required when sending a request) the host request header domain is used primarily to specify the Internet host and port number of the requested resource, which is usually extracted from the HTTP URL, eg: we enter in the browser: http://www.guet.edu.cn/ In the request message sent by the Index.html browser, the Host request header field is included, as follows: Host:www.guet.edu.cn this uses the default port number 80, and if the port number is specified, it becomes: Host:www.guet.edu.cn: Specify the port number
User-agent
When we go online to the forum, often see some welcome information, which lists the name and version of your operating system, the name and version of the browser you are using, which often makes a lot of people feel amazing, in fact, the server application is from user-agent this request header domain to obtain this information. The User-agent request header domain allows the client to tell the server about its operating system, browser, and other properties. However, this header field is not required, and if we write a browser ourselves without using the User-agent request header domain, then the server side will not be able to know our information.
An example of a request header:
Get/form.html http/1.1 (CRLF) Accept:image/gif,image/x-xbitmap,image/jpeg,application/x-shockwave-flash, application/vnd.ms-excel,application/vnd.ms-powerpoint,application/msword,*/* (CRLF) ACCEPT-LANGUAGE:ZH-CN (CRLF)
Accept-encoding:gzip,deflate (CRLF) if-modified-since:wed,05 Jan 11:21:25 GMT (CRLF) if-none-match:w/" 80b1a4c018f3c41:8317 "(CRLF) user-agent:mozilla/4.0 (compatible; MSIE6.0; Windows NT 5.0) (CRLF)
Host:www.guet.edu.cn (CRLF)
Connection:keep-alive (CRLF)
(CRLF)
3. The response header response header allows the server to pass additional response information that cannot be placed in the status line, as well as information about the server and the next access to the resource identified by Request-uri. Common response Headers
The Locationlocation response header field is used to redirect the recipient to a new location. Location response header fields are commonly used when changing domain names.
The Serverserver response header field contains the software information that the server uses to process the request. Corresponds to the User-agent request header field. Below is
An example of the server Response header field:
server:apache-coyote/1.1
The www-authenticatewww-authenticate response header domain must be contained in a 401 (unauthorized) response message when the client receives a 401 response message and sends the authorization header domain to the request server to validate it. The service-side response header contains the header field. Eg:www-authenticate:basic realm= "Basic Auth test!"//You can see that the server is using a Basic authentication mechanism for the requested resource.
4. Entity header request and response messages can all be routed one entity. An entity consists of an Entity header field and an entity body, but it does not mean that the entity header fields and entity bodies are sent together, and only the entity header fields can be sent. The entity header defines the meta-information about the entity body (eg: there is no entity body) and the resource identified by the request.
Common entity Headers
The Content-encodingcontent-encoding Entity header field is used as a modifier for the media type, and its value indicates the encoding of additional content that has been applied to the entity body, thus obtaining the media type referenced in the Content-type header field. The corresponding decoding mechanism must be used. Content-encoding This method of compressing the document, Eg:content-encoding:gzip
The Content-languagecontent-language Entity header field describes the natural language used by the resource. The domain is not set and the entity content is considered to be available to all languages for reading
Stakeholders Eg:content-language:da
The Content-lengthcontent-length Entity header field is used to indicate the length of the entity body, expressed as a decimal number stored in bytes.
Content-type
The Content-type Entity header field term indicates the media type that is sent to the recipient's entity body. eg
content-type:text/html;charset=iso-8859-1content-type:text/html;charset=gb2312
The Last-modifiedlast-modified Entity header field is used to indicate the last modification date and time of the resource.
The Expiresexpires Entity header field gives the date and time when the response expires. In order for a proxy server or browser to update the cache after a period of time (once again accessing pages that have been visited, loading directly from the cache, shortening response times, and reducing server load), we can use the Expires entity header domain to specify when the page expires. eg:expires:thu,15 SEP 2006 16:23:12 GMTHTTP1.1 client and cache must treat other illegal date formats (including 0) as expired. Eg: in order to let the browser do not cache the page, we can also take advantage of the Expires entity header domain, set as 0,jsp in the program as follows: Response.setdateheader ("Expires", "0");
V. Using TELNET to observe the communication process of the HTTP protocol
Experimental purpose and Principle:
Using MS's Telnet tool, a request is made to the server by entering the HTTP request information manually, and the server receives, interprets, and accepts the request and returns a response that is displayed on the Telnet window to deepen the perception of the HTTP protocol's communication process.
Experimental steps:
1. Turn on Telnet
1.1 Open Telnet Run-->cmd-->telnet
1.2 Turning on the Telnet echo function set Localecho
2. Connect to the server and send the request
2.1 Open www.guet.edu.cn 80//Note the port number cannot be omitted
Head/index.asp http/1.0host:www.guet.edu.cn
/* We can change the request method, request Guilin Electronic homepage content, enter the message as follows */open www.guet.edu.cn 80
get/index.asp http/1.0//content of request resources
Host:www.guet.edu.cn
2.2 Open www.sina.com.cn 80//Enter Telnet www.sina.com.cn directly under the command prompt symbol 80
Head/index.asp http/1.0host:www.sina.com.cn
3 Experimental results:
3.1 Request Information 2.1 The resulting response is:
http/1.1 OK
server:microsoft-iis/5.0
date:thu,08 Mar 200707:17:51 GMT
Connection:keep-alive
content-length:23330
Content-type:text/html
expries:thu,08 Mar 07:16:51 GMT
set-cookie:aspsessionidqaqbqqqb=bejcdgkadedjklkkajeoimmh;path=/
Cache-control:private
Omission of resource contents
3.2 Request Information 2.2 The resulting response is:
http/1.0 404 Not Found//request failed
Date:thu, Mar 07:50:50 GMT
server:apache/2.0.54 <Unix>
Last-modified:thu, 2006 11:35:41 gmtetag: "6277a-415-e7c76980"
Accept-ranges:bytes
X-powered-by:mod_xlayout_jh/0.0.1vhs.markii.remixvary:accept-encoding
Content-type:text/html
X-cache:miss from zjm152-78.sina.com.cn
via:1.0 zjm152-78.sina.com.cn:80<squid/2.6.stables-20061207>
X-cache:miss from th-143.sina.com.cn
Connection:close
Request successful//web Server
I lost the connection to the mainframe.
Press any key to continue ...
4. Note: 1, an input error occurred, the request will not succeed. 2, the header field is not case-sensitive.
3, a deeper understanding of the HTTP protocol, you can view RFC2616, find the file on the HTTP://WWW.LETF.ORG/RFC.
4, the development background program must master the HTTP protocol
Vi. HTTP protocol-related technical supplements
1. Foundation:
High-level protocols include: File Transfer Protocol FTP, e-mail Transport protocol SMTP, Domain Name System service DNS, Network News Transfer Protocol NNTP and HTTP protocol, such as three kinds of mediation: Proxy, Gateway (gateways) and channel (tunnel), An agent accepts requests based on the absolute format of the URI, rewrites all or part of the message, and sends the formatted request to the server through the URI's identity. The gateway is a receiving agent that acts as the upper layer of some other servers and, if necessary, translates the request to the underlying server protocol. A channel acts as a relay point between two connections that do not change the message. The channel is often used when the communication needs to pass through an intermediary (for example, a firewall, etc.) or if the content of the message is not recognized by the intermediary.
Proxy: An intermediary program that can act as a server or as a client to establish requests for other clients. Requests are either internally or passed to other servers through possible translations. An agent must interpret and overwrite it if possible before sending the request information. Proxies are often used as portals through the firewall's client side, and proxies can be used as a help app to handle requests that are not completed by the user agent through the protocol. Gateway: A server that acts as an intermediary for other servers. Unlike the proxy, the gateway accepts the request as if it were the source server for the requested resource, and the requesting client is unaware that it is dealing with the gateway. Gateways are often used as server-side portals through firewalls, and gateways can be used as a protocol translator to access resources stored in non-HTTP systems.
Channel (tunnel): is a broker that is a relay of two connections. Once activated, the channel is considered not to be an HTTP communication, although the channel may be initialized by an HTTP request. The channel disappears when both ends of the relayed connection are closed. A channel is often used when a portal must exist or the intermediary (intermediary) cannot interpret the relay's traffic.
2. Advantages of Protocol Analysis-http Analyzer detects network attacks
The analysis and processing of high-level protocols in a modular manner will be the direction of future intrusion detection. Common ports 80, 3128, and 8080 for HTTP and its proxies are used in the network section with the port tag
The provisions
3, HTTP protocol content lenth limit vulnerability causes a denial of service attack using the Post method, you can set Contentlenth to define the length of the data that needs to be transferred, such as contentlenth:999999999, which is not released until the transfer is complete. An attacker could exploit this flaw to continuously send spam data to the Web server until the Web server ran out of memory. This method of attack does not leave a trace. Http://www.cnpaf.net/Class/HTTP/0532918532667330.html
4. Some ideas for denial-of-service attacks using the features of the HTTP protocol the server is busy processing the client's bogus TCP connection request without having to ignore the client's normal request (after all, it is very small), and from a normal customer's point of view, the server loses its response, which we call: The server side is under Synflood attack (SYN flood attack). Smurf, teardrop and so on are using ICMP packets to flood and IP fragment attacks. This article uses a "normal connection" method to generate a denial of service attack. 19 ports in the early days already someone used to do chargen attacks, that is, Chargen_denial_of_service, but! They use a UDP connection between the two Chargen servers to get the server to handle too much information and down. There must be 2 conditions to kill a Web server: 1. There is a Chargen service 2. There is an HTTP service method: An attacker forges a source IP to send a connection request (connect) to n Chargen, and Chargen receives a connection and returns 72 bytes of character stream per second ( In fact, according to the actual situation of the network, this speed faster) to the server.
5, HTTP fingerprint recognition technology the principle of HTTP fingerprint recognition is basically the same: Record different server to the HTTP protocol execution
To identify the minor differences. HTTP fingerprinting is much more complex than TCP/IP stack fingerprinting, because customizing the HTTP server's configuration file, adding plug-ins or components makes it easy to change the response information of HTTP, which makes recognition difficult; However, customizing the behavior of the TCP/IP stack requires modifying the core layer. So it's easy to identify.
To make the server return different banner information settings is very simple, such as Apache, open source HTTP server, the user can modify the banner information in the source code, and then restart the HTTP service to take effect; For an HTTP server that does not expose the source code, such as Microsoft IIS or Netscape, can be modified in the DLL file that holds the banner information, the relevant articles are discussed, and we will not repeat them here. Of course, the effect of such a modification is good. Another way to blur banner information is to use a plugin.
Common Test requests:
1:head/http/1.0 sends basic HTTP requests 2:delete/http/1.0 send those requests that are not allowed, such as DELETE requests 3:get/http/3.0 send an illegal version of the HTTP protocol request 4:get/junk/ 1.0 sending an incorrect specification HTTP protocol request HTTP Fingerprint Identification Tool Httprint, it can effectively determine the type of HTTP server by using statistical principles and combining fuzzy logic techniques. It can be used to collect and analyze signatures generated by different HTTP servers.
6, Other: In order to improve the user's performance when using the browser, the modern browser also supports concurrent access, browse a Web page while establishing multiple connections, to quickly obtain a number of icons on a Web page, so that the entire page can be faster to complete the transmission. This continuous connection is provided in HTTP1.1, while the next-generation HTTP protocol: Http-ng adds support for session control, rich content negotiation, and more to provide
more efficient connections.
HTTP protocol Detailed