A Free Trial That Lets You Build Big!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service
The HTTP protocol is an abbreviation for the Hyper Text Transfer Protocol (Hypertext Transfer Protocol), which is used to transfer hypertext to the local browser from the World Wide Web (www:world Wide Web) server.
HTTP is a TCP/IP communication protocol that transmits data (HTML files, image files, query results, and so on).
HTTP is an object-oriented protocol belonging to the application layer, which is suitable for distributed hypermedia information System because of its simple and fast way. It was proposed in 1990, after several years of use and development, has been continuously improved and expanded. Currently used in the WWW is the sixth edition of Http/1.0, http/1.1 standardization work is in progress, and Http-ng (Next Generation of HTTP) has been proposed.
The HTTP protocol works on the client-server architecture. The browser sends all requests via URLs to the HTTP server, which is the Web servers, as an HTTP client. The Web server sends a response message to the client, based on the received request.HTTP request-response model. jpg Main Features
1, simple and fast: When a customer requests a service from the server, it simply transmits the request method and path. The request method commonly has, POST. Each method specifies a different type of contact between the customer and the server. Because the HTTP protocol is simple, the HTTP server's program size is small, so the communication speed is fast.
2, Flexible: HTTP allows the transfer of any type of data objects. The type being transmitted is marked by Content-type.
3. No connection: The meaning of no connection is to limit the processing of only one request per connection. When the server finishes processing the customer's request and receives the customer's answer, the connection is disconnected. In this way, the transmission time can be saved.
4. Stateless: The HTTP protocol is a stateless protocol. Stateless means that the protocol has no memory capacity for transactional processing. A lack of state means that if the previous information is required for subsequent processing, it must be re-routed, which may cause the amount of data to be transferred per connection to increase. On the other hand, it responds faster when the server does not need the previous information.
5, support B/S and C/s mode.
HTTP uses a Uniform Resource identifier (Uniform Resource Identifiers, URI) to transfer data and establish a connection. A URL is a special type of URI that contains enough information to find a resource
URL, full name is Uniformresourcelocator, Chinese is called the Uniform Resource Locator, is used on the Internet to identify a resource address. Take the following URL as an example to introduce the parts of the common URL:Http://www.aspxfans.com:8080/news/index.asp?boardID=5&ID=24618&page=1#name
As you can see from the URL above, a complete URL includes the following sections:
1. Part of the agreement: the protocol portion of the URL is "http:", which means that the Web page uses the HTTP protocol. You can use multiple protocols in the Internet, such as http,ftp, and so on, in this case the HTTP protocol. "//" after "HTTP" is a delimiter
2. Domain name part: The domain name portion of the URL is "www.aspxfans.com". A URL, you can also use the IP address as the domain name
3. Port section: followed by the domain name is the port, between the domain name and the port using ":" As the delimiter. The port is not a required part of the URL and if the port portion is omitted, the default port will be used
4. Virtual Directory part: From the first "/" after the domain name to the last "/", is the virtual directory part. The virtual directory is also not a required part of the URL. The virtual directory in this example is "/news/"
5. File name part: From the last "/" after the domain name to "?" "So far, is the file name part, if there is no"? ", then from the domain name after the last"/"Start to" # "so far, is the document part, if not"? "and" # ", then from the last"/"after the domain name to the end, is the file name section. The file name in this example is "index.asp". The file name section is also not a required part of the URL, and if omitted, the default file name is used
6. Anchor part: From the beginning of "#" to the end, are the anchor parts. The anchor section in this example is "name". The anchor part is also not a required part of the URL
7. Parameters section: from "? The part between start and # is the parameter part, also called the search section, the query part. In this example, the parameter section is "Boardid=5&id=24618&page=1". Parameters can be allowed to have more than one parameter, with "&" as the delimiter between parameters and parameters.
(Original: http://blog.csdn.net/ergouge/article/details/8185219)Uri and URL difference uri, is Uniform Resource Identifier, Uniform Resource identifier, used to uniquely identify a resource.
Every resource available on the Web, such as HTML documents, images, video clips, programs, etc., is a URI to locate.
URIs are generally made up of three parts:
① naming mechanism for accessing resources
② host name of the storage resource
The name of the ③ resource itself, represented by a path, with emphasis on resources.
URLs are strings used on the Internet to describe information resources, mainly used in various WWW client programs and server programs, especially the famous mosaic.
URLs can be used in a unified format to describe various information resources, including files, server addresses and directories. The URL is generally composed of three parts:
① Protocol (or service mode)
② Host IP address (sometimes including port number) for this resource
③ the specific address of the host resource. such as directory and file name, etc.
URIs define a Uniform resource identity in an abstract, high-level concept, whereas URLs and urns are the exact way the resource is identified. URLs and urns are all a kind of URI. Generally speaking, each URL is a URI, but not necessarily every URI is a URL. This is because the URI also includes a subclass, the Uniform Resource Name (URN), which names the resource but does not specify how to locate the resource. The mailto, News, and ISBN URIs above are examples of urns.
In the Java URI, a URI instance can represent absolute or relative, as long as it conforms to the syntax rules of the URI. The URL class not only conforms to semantics, but also contains information that locates the resource, so it cannot be relative.
In the Java class Library, the URI class does not contain any method of accessing the resource, its only function is parsing.
Instead, the URL class can open a stream that reaches the resource.
The client sends an HTTP request to the server for a request message that includes the following format:request Line, request header (header), blank line, and four parts of request data. ( but note the difference between request get and post )HTTP request message structure. png
The first part: The request line, which describes the request type, the resource to access, and the HTTP version used.
GET /562f25980001b1b106000338.jpg HTTP/1.1Host img.mukewang.comUser-Agent Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36Accept image/webp,image/*,*/*;q=0.8Referer http://www.imooc.com/Accept-Encoding gzip, deflate, sdchAccept-Language zh-CN,zh;q=0.8
The Get description request type is GET, [/562f25980001b1b106000338.jpg] is the resource to be accessed, and the last part of the row illustrates the use of the HTTP1.1 version.The second part: The request header, followed by the request line (that is, the first line) after the section, to explain the server to use additional information
From the second line to the request header, host will indicate the destination of the request. User-agent, both server-side and client script access to it, is an important basis for browser type detection logic. This information is defined by your browser and is automatically sent in each request, etc.Part Three: blank line, a blank line behind the request header is required
Even if the request data for part four is empty, there must be a blank line.Part IV: The request data is also called the principal, you can add any other data. (The request data for this example is empty ) POST Request Example, using the request that Charles crawled:
post/http1.1host:www.wrox.comuser-agent:mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;. NET clr 2.0.50727;. NET clr 3.0.04506 .648;. NET clr 3.5.21022) content-type:application/x-www-form-urlencodedcontent-length: 40connection:keep-alivename=professional%20ajax&publisher=wiley
The first part: The request line, the first line is the POST request, and the http1.1 version.
The second part: The request head, the second line to the sixth line.
Part Three: blank line, blank line in line seventh.
Part IV: Request data, line eighth.
In general, the server will return an HTTP response message after receiving and processing a request from the client.The HTTP response is also made up of four parts : The status line, the message header, the blank line, and the response body.
The first part: The status line, consists of the HTTP protocol version number, the status code, the status message three parts.
HTTP/1.1 200 OKDate: Fri, 22 May 2009 06:07:21 GMTContent-Type: text/html; charset=UTF-8<html> <head></head> <body> <!--body goes here--> </body></html>
The first behavior status line, (http/1.1) indicates that the HTTP version is 1.1, the status code is 200, and the status message is (OK)Part II: Message headers that describe some additional information that the client will use
The second line and the third behavior message header,
Date: The day and time the response was generated; Content-type: The MIME-type HTML (text/html) is specified and the encoding type is UTF-8
The HTML portion following the empty line is the response body.==============================================Status Code of HTTP
The status code consists of three digits, and the first number defines the category of the response, divided into five categories:1XX: Indicates that the request has been received and continues processing 2xx: Success-Indicates that the request has been successfully received, understood, Accept 3xx: Redirect-A further action must be taken to complete the request 4xx: Client Error-Request syntax error or request not implemented 5XX: server-side error-the server failed to implement a legitimate request
Common Status Codes:
< Span class= "Hljs-number" >200 OK //client request succeeded 400 B Ad request //client request has a syntax error and cannot be understood by the server 401 unauthorized //request is not authorized, this status code must be used with the Www-authenticate header domain 403 Forbidden //Server received the request, but refused to provide service 404 not Found //the request resource does not exist, Eg: entered the wrong Url500 Internal Server error //server has an unexpected error 503 Server Unavailable //server is currently unable to process client requests, may return to normal after some time
More Status Codes http://www.runoob.com/http/http-status-codes.htmlHTTP request method
HTTP requests can use a variety of request methods, depending on the HTTP standard.
HTTP1.0 defines three methods of request: GET, POST, and head.
HTTP1.1 has five new request methods: Options, PUT, DELETE, TRACE, and CONNECT methods.
How HTTP Works
GET 请求指定的页面信息，并返回实体主体。HEAD 类似于get请求，只不过返回的响应中没有具体的内容，用于获取报头POST 向指定资源提交数据进行处理请求（例如提交表单或者上传文件）。数据被包含在请求体中。POST请求可能会导致新的资源的建立和/或已有资源的修改。PUT 从客户端向服务器传送的数据取代指定的文档的内容。DELETE 请求服务器删除指定的页面。CONNECT HTTP/1.1协议中预留给能够将连接改为管道方式的代理服务器。OPTIONS 允许客户端查看服务器的性能。TRACE 回显服务器收到的请求，主要用于测试或诊断。
The HTTP protocol defines how Web clients request Web pages from a Web server and how the server routes Web pages to clients. The HTTP protocol uses the request/response model. The client sends a request message to the server that contains the requested method, URL, protocol version, request header, and request data. The server responds with a status line that includes the version of the Protocol, the success or error code, the server information, the response header, and the response data.
The following are the steps for HTTP request/Response:1. Client connects to Web server
An HTTP client, typically a browser, establishes a TCP socket connection with the HTTP port of the Web server (default is 80). For example, http://www.oakcms.cn.2. Sending HTTP requests
Through TCP sockets, the client sends a text request message to the Web server, which consists of a request line, a request header, a blank line, and 4 parts of the requested data.3. The server accepts the request and returns the HTTP response
The Web server resolves the request and locates the requested resource. The server writes a copy of the resource to the TCP socket, which is read by the client. A response consists of a status line, a response header, a blank line, and a 4 portion of the response data.4. Release the connection TCP connection
If the connection mode is close, the server actively shuts down the TCP connection, the client shuts down the connection passively, releases the TCP connection, and if the connection mode is keepalive, the connection is maintained for a period of time and the request can continue to be received;5. client browser parsing HTML content
The client browser parses the status line first to see the status code indicating whether the request was successful. Each response header is then parsed, and the response header informs the following character sets for several bytes of HTML documents and documents. The client browser reads the response data HTML, formats it according to the syntax of the HTML, and displays it in a browser window.
For example: Type the URL in the browser address bar and press ENTER to experience the following process:
1. The browser requests the DNS server to resolve the IP address of the domain name in the URL;
2, after resolving the IP address, according to the IP address and the default port 80, and the server to establish a TCP connection ;
3, the browser issued a read file (the URL in the back part of the corresponding file) HTTP request, the request message as a TCP three handshake Third message data sent to the server;
4, the server responds to the browser request, and the corresponding HTML resulting sent to the browser;
5, release the TCP connection ;
6, the browser will be the HTML text and display content;Get and post requests differ by GET request
GET /books/?sex=man&name=Professional HTTP/1.1Host: www.wrox.comUser-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.6)Gecko/20050225 Firefox/1.0.1Connection: Keep-Alive
Note that the last line is a blank linePOST request
POST / HTTP/1.1Host: www.wrox.comUser-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.6)Gecko/20050225 Firefox/1.0.1Content-Type: application/x-www-form-urlencodedContent-Length: 40Connection: Keep-Alivename=Professional%20Ajax&publisher=Wiley
1,get commits, the requested data will be appended to the URL (that is, the data placed in the HTTP protocol header), to split the URL and transfer data , multiple parameters with & connection ;
For example: Login.action?name=hyddd&password=idontknow&verify=%e4%bd%a0%e5%a5%bd.
If the data is an English letter/number, sent as is, if it is a space, converted to +, if it is Chinese/other characters, the string is directly encrypted with BASE64, such as:%E4%BD%A0%E5%A5%BD, where the xx in%xx is the symbol in 16 binary notation ASCII.
Post submission: Place the submitted data in the package of the HTTP packet . In the example above, the red font indicates the actual transfer dataas a result, the data submitted by get is displayed in the Address bar, while the post is submitted, the address bar does not change
2, the size of the transmitted data: first of all: the HTTP protocol does not restrict the size of the transmitted data, the HTTP protocol specification does not limit the length of the URL.
The main limitations in the actual development are:
GET: Specific browsers and servers have restrictions on URL length, such as IE's limit on URL length is 2083 bytes (2k+35). For other browsers, such as Netscape, Firefox, etc., there is theoretically no length limit, and its limitations depend on the support of the operating system.
Therefore, for a get commit, the transmitted data is limited by the URL length.
POST: The theoretical data is not limited because it is not transmitted via a URL. However, the actual Web server will be required to limit the size of the post submission data, Apache, IIS6 have their own configuration.
The security of post is higher than the security of get. For example: Through get submit data, user name and password will appear in plaintext on the URL, because (1) the login page may be cached by the browser, (2) Other people to view the browser's history, then others can get your account number and password, in addition, Using get to submit data may also cause Cross-site request forgery attack
4. The HTTP GET,POST,SOAP protocol is all running on HTTP
(1) Get: The request parameter is appended to the URL as a sequence of key/value pairs (query string)
The length of the query string is limited by the Web browser and Web server (ie supports up to 2048 characters) and is not suitable for transporting large datasets at the same time, it is unsafe
(2) Post: The request parameter is transmitted in a different part of the HTTP header (named entity body), which is used to transfer the form information, so the Content-type must be set to: application/x-www-form- Urlencoded. The post is designed to support user fields on Web Forms, and its parameters are also transmitted as key/value.
However: it does not support complex data types, because post does not define the semantics and rules for transferring data structures.
(3) Soap: is a dedicated version of HTTP POST, followed by a special XML message format
Content-type is set to: Text/xml Any data can be XML.
The HTTP protocol defines a number of ways to interact with the server, the most basic of which are 4, get,post,put,delete, respectively. A URL address is used to describe a resource on a network, and the Get, POST, PUT, delete in HTTP corresponds to the search for this resource, change, increase, delete 4 operations. Our most common is get and post. Get is typically used to get/query resource information, and post is typically used to update resource information.
Let's look at the difference between get and post
Get submitted data is placed after the URL, to split the URL and transfer data, the parameters are connected with &, such as editposts.aspx?name=test1&id=123456. The Post method is to put the submitted data in the body of the HTTP packet.
The data size for get commits is limited (because the browser has a limit on the length of the URL), and there is no limit to the data submitted by the Post method.
The Get method needs to use Request.QueryString to get the value of the variable, and the Post method takes the value of the variable by Request.Form.
The Get method submits the data, which brings security problems, such as a login page, when the data is submitted via get, the user name and password will appear on the URL, and if the page can be cached or someone else can access the machine, the user's account and password can be obtained from the history record.
This is the HTTP protocol during the restoration of code used to write the idea and the protocol to distinguish between the judgment!!!!!!!!!! Need to sort out the text description
We noticed that the first line in the HTTP request message was preceded by a get, yes, it is actually a method of the HTTP request, similar to the post, head, and so on. Generally known is get and post, like in servlet programming there are doget and dopost two ways to submit HTTP requests.
For HTTP response messages, the first line begins with the version number of the protocol, such as http/1.1, which is now popular with http/1.1. We can use these to determine whether the HTTP data is stored in the TCP data message.
There are many ways to implement this procedure, I use is one of the most clumsy way, that is, according to determine whether the IP packet----to determine whether it is the logic of the HTTP packet, and finally the content of the HTTP message printed out . Before the program starts we need to define the package format for some important protocols.
Let's take a look at how the logic of the program you just said is implemented.
1. Determine if the IP packet is a packet. Let us recall that the encapsulation format of the Ethernet is defined in RFC 894, consisting of the destination address (6 bytes), the source address (6 bytes), the type (2 bytes), the data \, and the CRC (4 bytes) . We only need to pay attention to the type of this field in the head, when it is 0x0800, it means that the data holds the IP datagram, and when it is 0x0806, the data holds the ARP request/reply , and when it is 0x8035, The data is saved by Rarp request/reply . so by comparing whether its type is 0x0800, it can reach the goal.
2. determine if it is a TCP grouping . Similar to the above, you can determine whether the Protocol field in the IP header is 0x0600 .
3. Determine if it is an HTTP message. Based on the HTTP message format explained above, we only need to determine whether the beginning is "GET", "POST", "http/1.1" can be done .
The program essentially does not completely restore the functionality of the HTTP protocol, for HTTP request data and response data to parse, the real should be able to parse the data format by Content-type, and according to the corresponding parsing way to decode, as well as the processing of Chinese characters and so on ~ ~ Finally, the entire program of the source code, there are any comments or suggestions can be free to spit groove, humbly accept. PS: note is not fully written please forgive ~ ~
The simple learning of the HTTP protocol is a little old but good.
Start building with 50+ products and up to 12 months usage for Elastic Compute Service