Turn from: http://blog.csdn.net/gueter/archive/2007/03/08/1524447.aspx
Introduction
HTTP is an object-oriented protocol belonging to the application layer, which is suitable for distributed hypermedia information System because of its simple and fast way. It was proposed in 1990, after several years of use and development, has been continuously improved and expanded. Currently used in the WWW is the sixth edition of Http/1.0, http/1.1 is in progress, and Http-ng (Next Generation of HTTP) has been proposed.
The main features of the HTTP protocol can be summarized as follows:
1. Support client/server mode.
2. Simple and quick: When a client requests a service from a server, it simply transmits the request method and path. The request method commonly has, POST. Each method prescribes a different type of customer contact with the server. Because the HTTP protocol is simple, the HTTP server's program is small, so the communication speed is very fast.
3. Flexible: HTTP allows the transfer of any type of data object. The type being transferred is marked by Content-type.
4. No connection: The implication of no connection is to limit the processing of only one request per connection. When the server finishes processing the customer's request and receives a reply from the customer, the connection is disconnected. In this way, the transmission time can be saved.
5. Stateless: The HTTP protocol is a stateless protocol. Stateless means that the protocol has no memory capability for transaction processing. A lack of status means that if the preceding information is required for subsequent processing, it must be retransmission, which may result in an increase in the amount of data transmitted per connection. On the other hand, it responds faster when the server does not need prior information.
first, the HTTP protocol detailed URL Chapter
HTTP (Hypertext Transfer Protocol) is based on the request and response mode, stateless, Application layer protocol, often based on TCP connection, HTTP1.1 version of a continuous connection mechanism, the vast majority of web development, are built on the HTTP protocol on the Web applications.
The format of the HTTP URL, which is a special type of URI that contains enough information to find a resource, is as follows:
http://host[":" Port][abs_path]
HTTP indicates that a network resource is to be located through the HTTP protocol; Host represents a legitimate Internet host domain name or IP address; port specifies a port number, NULL, and the default port 80;abs_path specifies the URI of the requested resource; If Abs_ is not given in the URL Path, then when it is the request URI, it must be given in the form of "/", which is usually done automatically by the working browser.
eg
1. Input: www.guet.edu.cn
Browser automatically converted to: http://www.guet.edu.cn/
2, http:192.168.0.116:8080/index.jsp
second, the HTTP protocol detailed request of the article
The HTTP request consists of three parts: the request line, the message header, the request body
1. The request line begins with a method symbol, separated by a space, followed by the requested URI and protocol version, in the following format: Request-uri http-version CRLF
Where method represents the request methods; Request-uri is a Uniform Resource identifier, http-version represents the HTTP protocol version of the request, and CRLF represents a carriage return and a newline (except for the CRLF at the end, a separate CR or LF character is not allowed).
The request method (all uppercase) has a variety of methods, which are interpreted as follows:
Get request gets the resource identified by Request-uri
Post appends new data to the resource identified by Request-uri
The head request gets the response message header of the resource identified by Request-uri
The put request server stores a resource and uses Request-uri as its identity
Delete requests that the server delete the resources identified by the Request-uri
TRACE requests that the server echo the requested information received, primarily for testing or diagnostics
CONNECT reserved for future use
Options request query server performance, or query resource-related choices and requirements
Application Examples:
Get method: When accessing a Web page in the browser's address bar, the browser takes the access method to the server to obtain resources, eg:get/form.html http/1.1 (CRLF)
The Post method requires the requested server to accept the data appended to the request and is often used to submit the form.
eg:post/reg.jsp http/(CRLF)
Accept:image/gif,image/x-xbit,... (CRLF)
...
HOST:www.guet.edu.cn (CRLF)
Content-length:22 (CRLF)
Connection:keep-alive (CRLF)
Cache-control:no-cache (CRLF)
(CRLF)//The CRLF indicates that the message header has ended before the message header
user=jeffrey&pwd=1234//This is the data submitted below
The head method is almost the same as the Get method, and the HTTP header contains information that is the same as the information that is received from the request. Using this method, you can get information about the resources identified by Request-uri without having to transfer the entire resource content. This method is often used to test the validity of a hyperlink, whether it can be accessed, and whether it has recently been updated.
2, the request header later described
3, the request body (slightly)
third, the HTTP protocol detailed response
After receiving and interpreting the request message, the server returns an HTTP response message.
The HTTP response is also composed of three parts: status line, message header, response body
1, the status line format is as follows:
Http-version Status-code reason-phrase CRLF
Where http-version represents the version of the server HTTP protocol, Status-code represents the response status code sent back by the server, and reason-phrase a textual description of the status code.
The status code consists of three digits, the first number defines the category of the response, and there are five possible values:
1XX: Indicates information--indicates that the request has been received and continues processing
2XX: Success-Indicates that the request has been successfully received, understood, accepted
3XX: Redirect-requires further action to complete the request
4XX: Client Error--Request has a syntax error or the request cannot be implemented
5XX: Server-side Error-Server failed to implement legitimate request
Common status Code, status description, Description:
OK//Client Request successful
Bad Request//client requests have syntax errors that cannot be understood by the server
401 Unauthorized//request unauthorized, this status code must be used in conjunction with the Www-authenticate header field
403 Forbidden//server receives request, but refuses to provide service
404 Not Found//request resource does not exist, eg: entered the wrong URL
Internal Server error//Servers unexpected error
503 Server unavailable//servers are currently unable to process client requests and may return to normal after a period of time
eg:http/1.1 OK (CRLF)
2, the response header later described
3, the response body is the content of the resources returned by the server
four, the HTTP protocol detailed message header
HTTP messages consist of requests from the client to the server and responses from the server to the client. Both the request message and the response message are from the start line (for the request message, the start line is the request line, for the response message, the start line is the status line), the message header (optional), the empty line (only the CRLF row), and the message body (optional).
The HTTP message header includes a normal header, a request header, a response header, and an entity header.
Each header field is made up of the name + ":" + space + value, and the name of the message header field is case independent.
1. General Header
In the normal header, a few header fields are used for all requests and response messages, but not for the transmitted entities, only for the transmitted messages.
eg
Cache-control is used to specify a caching instruction, the cache instruction is one-way (the cached instruction appearing in the response may not appear in the request), and is independent (a message's caching instruction does not affect the caching mechanism of another message processing), and HTTP1.0 uses a similar header domain as pragma.
The cached instruction at request includes: No-cache (used to indicate that the request or response message cannot be cached), No-store, Max-age, Max-stale, Min-fresh, only-if-cached;
The caching instructions for the response include: public, Private, No-cache, No-store, No-transform, Must-revalidate, Proxy-revalidate, Max-age, S-maxage.
Eg: in order to instruct IE browser (client) do not cache pages, server-side JSP program can be written as follows: Response.sehheader ("Cache-control", "No-cache");
Response.setheader ("Pragma", "no-cache"); function equivalent to the above code, usually both//shared
This code sets the normal header field in the Sent response message: Cache-control:no-cache
The date normal header field represents the day and time the message was generated
Connection the normal header field allows you to send the option to specify a connection. For example, specify that the connection is contiguous, or specify the "close" option to notify the server that when the response completes, the connection is closed
2. Request Header
The request header allows the client to deliver the requested additional information to the server side as well as the client's own information.
Common Request Headers
Accept
The Accept Request header field is used to specify which types of information the client accepts. Eg:accept:image/gif, indicating that the client wants to accept a resource in GIF format; accept:text/html, which indicates that the client wants to accept HTML text.
Accept-charset
The Accept-charset request header field is used to specify the character set accepted by the client. eg:accept-charset:iso-8859-1,gb2312. If you do not set this field in the request message, the default is that any character set is acceptable.
Accept-encoding
The Accept-encoding request header field is similar to the Accept, but it is used to specify an acceptable content encoding. Eg:accept-encoding:gzip.deflate. If the domain server is not set in the request message, the client is assumed to be acceptable for various content encodings.
Accept-language
The Accept-language request header field is similar to Accept, but it is used to specify a natural language. EG:ACCEPT-LANGUAGE:ZH-CN. If the header domain is not set in the request message, the server assumes that the client is acceptable to all languages.
Authorization
The authorization request header domain is primarily used to prove that the client has permission to view a resource. When a browser accesses a page, if the response code received from the server is 401 (not authorized), you can send a request containing the authorization request header domain to require the server to authenticate it.
Host (This header field is required when sending a request)
The host request header domain is primarily used to specify the Internet host and port number of the requested resource, which is typically extracted from the HTTP URL, eg:
We enter in the browser: http://www.guet.edu.cn/index.html
In the request message sent by the browser, the host Request header field is included, as follows:
Host:www.guet.edu.cn
The default port number 80 is used here, and if the port number is specified, it becomes: Host:www.guet.edu.cn: Specify the port number
User-agent
When we log on to the forum on the Internet, often you will see some welcome information, which lists the name and version of your operating system, the name and version of the browser you are using, which often makes a lot of people wonder, in fact, the server application is getting this information from the User-agent request header domain. The User-agent request header domain allows the client to tell the server about its operating system, browser, and other properties. However, this header domain is not required, if we write a browser ourselves, do not use the User-agent request header domain, then the server side can not know our information.
Request Header Example:
Get/form.html http/1.1 (CRLF)
Accept:image/gif,image/x-xbitmap,image/jpeg,application/x-shockwave-flash,application/vnd.ms-excel,application /vnd.ms-powerpoint,application/msword,*/* (CRLF)
ACCEPT-LANGUAGE:ZH-CN (CRLF)
Accept-encoding:gzip,deflate (CRLF)
IF-MODIFIED-SINCE:WED,05 2007 11:21:25 GMT (CRLF)
if-none-match:w/"80b1a4c018f3c41:8317" (CRLF)
user-agent:mozilla/4.0 (compatible; MSIE6.0; Windows NT 5.0) (CRLF)
Host:www.guet.edu.cn (CRLF)
Connection:keep-alive (CRLF)
(CRLF)
3, Response header
The response header allows the server to pass additional response information that cannot be placed in the status row, as well as information about the server and the next access to resources identified by Request-uri.
Frequently used response headers
Location
The Location response header field is used to redirect the recipient to a new location. The location response header domain is often used when changing a domain name.
Server
The server Response header field contains the software information that the servers use to process the request. Corresponds to the user-agent request header domain. Below is
An example of the server response header domain:
server:apache-coyote/1.1
Www-authenticate
The www-authenticate response header domain must be contained in a 401 (unauthorized) response message when the client receives a 401 response message and sends the authorization header domain to request the server to authenticate it, and the service-side response header contains the header field.
Eg:www-authenticate:basic realm= "Basic Auth test!" You can see that the server is using the Basic authentication mechanism for the requested resource.
4, Entity header
Both request and response messages can transmit an entity. An entity consists of an Entity header field and an entity body, but does not mean that the entity header field and the entity body are sent together, and that only the Entity header field can be sent. The entity header defines meta information about the entity body (eg: there is no entity body) and the resource identified by the request.
Common entity Headers
Content-encoding
The Content-encoding Entity header field is used as a modifier of the media type, and its value indicates the encoding of the additional content that has been applied to the entity body, and therefore the corresponding decoding mechanism must be used to obtain the media type referenced in the Content-type header domain. Content-encoding the compression method used to record documents, Eg:content-encoding:gzip
Content-language
The Content-language Entity header field describes the natural language used by the resource. If the domain is not set, the entity content will be provided to all languages for reading
Actors Eg:content-language:da
Content-length
The Content-length Entity header field is used to indicate the length of the entity body and decimal digits stored in bytes.
Content-type
The Content-type entity header domain Term indicates the media type of the entity body that is sent to the recipient. eg
Content-type:text/html;charset=iso-8859-1
content-type:text/html;charset=gb2312
Last-modified
The Last-modified Entity header field is used to indicate the date and time the resource was last modified.
Expires
Expires the Entity header field gives the date and time the response expired. In order for the proxy server or browser to update the cache after a period of time (accessing the visited pages directly from the cache, shortening the response time and reducing the server load), we can use the Expires Entity header field to specify when the page expires. eg:expires:thu,15 SEP 2006 16:23:12 GMT
HTTP1.1 clients and caches must consider other illegal date formats (including 0) as expired. Eg: in order for the browser to not cache the page, we can also use the Expires entity header domain, set to 0,jsp in the following: Response.setdateheader ("Expires", "0");
v. Using Telnet to observe the communication process of HTTP protocol
Experimental purposes and principles:
Using the MS Telnet Tool, a request is made to the server by manually entering the HTTP request information, and the server receives, interprets, and accepts the request, and returns a response that is displayed on the Telnet window to deepen the perception of the HTTP protocol's communication process.
Experiment steps:
1. Open Telnet
1.1 Open Telnet
Run-->cmd-->telnet
1.2 Turn on the Telnet echo feature
Set Localecho
2, connect the server and send the request
2.1 Open www.guet.edu.cn 80//Note the port number cannot be omitted
Head/index.asp http/1.0
Host:www.guet.edu.cn
/* We can change the request method, request Guilin Electronic homepage content, the input message is as follows * *
Open www.guet.edu.cn 80
Get/index.asp http/1.0//Request Resource Content
Host:www.guet.edu.cn
2.2 Open www.sina.com.cn 80/////Enter Telnet www.sina.com.cn directly under the command prompt symbol 80
Head/index.asp http/1.0
Host:www.sina.com.cn
3 Experimental results:
3.1 Request Information 2.1 The response is:
http/1.1//Request Success
server:microsoft-iis/5.0//web Server
date:thu,08 Mar 200707:17:51 GMT
Connection:keep-alive
content-length:23330
Content-type:text/html
expries:thu,08 Mar 2007 07:16:51 GMT
SET-COOKIE:ASPSESSIONIDQAQBQQQB=BEJCDGKADEDJKLKKAJEOIMMH; path=/
Cache-control:private
Resource content omitted
3.2 Request Information 2.2 The response is:
http/1.0 404 Not Found//request failed
Date:thu, Mar 2007 07:50:50 GMT
server:apache/2.0.54 <Unix>
Last-modified:thu, Nov 2006 11:35:41 GMT
ETag: "6277a-415-e7c76980"
Accept-ranges:bytes
X-powered-by:mod_xlayout_jh/0.0.1vhs.markii.remix
Vary:accept-encoding
Content-type:text/html
X-cache:miss from zjm152-78.sina.com.cn
via:1.0 zjm152-78.sina.com.cn:80<squid/2.6.stables-20061207>
X-cache:miss from th-143.sina.com.cn
Connection:close
Lost the connection to the host
Press any key to continue ...
4. Note: 1, there is an input error, the request will not succeed.
2, the header field is not case-insensitive.
3, a deeper understanding of the HTTP protocol, you can view the RFC2616, the file found on the HTTP://WWW.LETF.ORG/RFC.
4, the development of the background program must master the HTTP protocol
Six, HTTP protocol-related technical supplements
1. Foundation:
High-level protocols include File Transfer Protocol FTP, e-mail Transport protocol SMTP, Domain Name System services DNS, Network News Transfer Protocol NNTP, and HTTP protocol
An intermediary consists of three agents (proxies), gateways, and channels (tunnel), an agent that accepts requests based on the absolute format of the URI, rewrites all or part of the message, and sends the formatted request to the server through the identifier of the URI. A gateway is a receiving agent that acts as the upper layer of some other server and, if necessary, translates the request to the underlying server protocol. A channel serves as a relay point between two connections that do not change messages. A channel is often used when communication needs to be through an intermediary (for example, a firewall, etc.) or when the intermediary does not recognize the content of the message.
Proxy: An intermediary program that can act as a server or as a client to create requests for other clients. Requests are delivered internally or through possible translations to other servers. An agent must explain and override it if possible before sending the request information. Agents are often used as portals to the client side of the firewall, and proxies can be used as a help application to handle requests that are not completed by the user agent through a protocol.
Gateway: A server that serves as intermediary for other servers. Unlike the proxy, the gateway accepts the request as if it were the source server for the requested resource, and the requesting client does not realize that it is dealing with the gateway.
Gateways are often used as server-side portals through firewalls, and gateways can be used as a protocol translator to access resources stored in non-HTTP systems.
Channel (tunnel): is a mediation program that is used as a two-link relay. Once activated, the channel is considered not to be part of HTTP traffic, although the channel may have been initialized by an HTTP request. The channel disappears when the relay connection is closed at both ends. Channels are often used when a portal (portal) must exist or an intermediary (intermediary) cannot interpret a relay's communication.
2, the advantages of protocol Analysis-http Analyzer detect network attacks
The analysis and processing of high-level protocols in a modular manner will be the direction of future intrusion detection.
Common ports 80, 3128, and 8080 of HTTP and its proxies are specified in the network section with the port label
3. HTTP protocol content Lenth limit vulnerabilities cause denial of service attacks
When using the Post method, you can set the Contentlenth to define the length of data that needs to be transmitted, such as contentlenth:999999999, before the transfer completes, the memory is not released, and the attacker can exploit this flaw Send garbage data to the Web server continuously until the Web server is running out of memory. This method of attack basically does not leave traces.
Http://www.cnpaf.net/Class/HTTP/0532918532667330.html
4. Some ideas of using HTTP protocol's characteristics to do denial of service attack
The server side is busy handling the attacker's spoofed TCP connection request and ignoring the client's normal request (after all, the client's normal request ratio is very small), from the normal customer's point of view, the server has lost its response, which we call: The server side by the Synflood attack (SYN flood attack).
And Smurf, teardrop and so on are using ICMP message to flood and IP fragment attack. This article uses a "normal connection" method to generate a denial-of-service attack.
19 ports have been used in the early days to make Chargen attacks, that is, Chargen_denial_of_service, but. They use the method is to create a UDP connection between two Chargen servers, so that the server processing too much information and down, then kill a Web server must have 2 conditions: 1. Chargen Services 2. HTTP service available
Method: The attacker forged the source IP to the N-Chargen send connection request (connect), Chargen received the connection will return 72 bytes per second character stream (in fact, according to the actual network, this faster) to the server.
5, HTTP Fingerprint identification technology
The principle of HTTP fingerprint recognition is broadly the same: Record the difference between servers in the implementation of HTTP protocol. HTTP fingerprint recognition is much more complex than TCP/IP stack fingerprint recognition, because the configuration file for the custom HTTP server, Adding plug-ins or components makes it easy to change the response information for HTTP, which makes it difficult to identify; however, the behavior of custom TCP/IP stacks requires modifications to the core layer, so it is easy to identify.
The setting for the server to return different banner information is simple, an Open-source HTTP server like Apache, where users can modify banner information in the source code, and then the HTTP service becomes effective. For HTTP servers that do not have open source code, such as Microsoft's IIS or Netscape, can be modified in the DLL file where the banner information is stored, and the relevant articles are discussed here. Of course, the effect of such modifications is good. Another method of fuzzy banner information is to use Plug-ins.
Common Test requests:
1:head/http/1.0 send a basic Http request
2:delete/http/1.0 send requests that are not allowed, such as DELETE requests
3:get/http/3.0 send an illegal version of the HTTP protocol request
4:get/junk/1.0 send an incorrect specification HTTP protocol request
HTTP fingerprint recognition Tool Httprint, it can effectively determine the type of HTTP server by using the statistical principle and combining the fuzzy logic technology. It can be used to collect and analyze signatures generated by different HTTP servers.
6, Other: In order to improve the user's performance when using the browser, modern browsers also support the concurrent access mode, browse a Web page while establishing multiple connections to quickly get a number of icons on a Web page, so that more quickly complete the transmission of the entire Web page.
This continuous connection is provided in the HTTP1.1, and the Next Generation HTTP protocol: Http-ng adds support for session control, rich content negotiation, etc. to provide
more efficient connections.