HTTP protocol (Introduction)

Last Update:2017-03-20 Source: Internet

Author: User

Tags http post soap

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Introduction to HTTP

The HTTP protocol is an abbreviation for the Hyper Text Transfer Protocol (Hypertext Transfer Protocol), which is used to transfer hypertext to the local browser from the World Wide Web (www:world Wide Web) server.

HTTP is a TCP/IP communication protocol that transmits data (HTML files, image files, query results, and so on).

HTTP is a stateless, application-level protocol based on the request and response pattern.

The HTTP protocol defines the protocol for file transfer between the server and the client. This means that when the communication is specified, HTML and other files are allowed to be sent from the server to the client browser.

Key Features

1, simple and fast: When a customer requests a service from the server, it simply transmits the request method and path. The request method commonly has, POST. Each method specifies a different type of contact between the customer and the server. Because the HTTP protocol is simple, the HTTP server's program size is small, so the communication speed is fast.

2, Flexible: HTTP allows the transfer of any type of data objects. The type being transmitted is marked by Content-type.

3. No connection: The meaning of no connection is to limit the processing of only one request per connection. When the server finishes processing the customer's request and receives the customer's answer, the connection is disconnected. In this way, the transmission time can be saved.

4. Stateless: The HTTP protocol is a stateless protocol. Stateless means that the protocol has no memory capacity for transactional processing. A lack of state means that if the previous information is required for subsequent processing, it must be re-routed, which may cause the amount of data to be transferred per connection to increase. On the other hand, it responds faster when the server does not need the previous information.

5, support B/S and C/s mode.

URL of http

HTTP uses a Uniform Resource identifier (Uniform Resource Identifiers, URI) to transfer data and establish a connection. A URL is a special type of URI that contains enough information to find a resource

URL, full name is Uniformresourcelocator, Chinese is called the Uniform Resource Locator, is used on the Internet to identify a resource address. Take the following URL as an example to introduce the parts of the common URL:

Http://www.aspxfans.com:8080/news/index.asp?boardID=5&ID=24618&page=1#name

As you can see from the URL above, a complete URL includes the following sections:
1. Part of the agreement: the protocol portion of the URL is "http:", which means that the Web page uses the HTTP protocol. You can use multiple protocols in the Internet, such as http,ftp, and so on, in this case the HTTP protocol. "//" after "HTTP" is a delimiter

2. Domain name part: The domain name portion of the URL is "www.aspxfans.com". A URL, you can also use the IP address as the domain name

3. Port section: followed by the domain name is the port, between the domain name and the port using ":" As the delimiter. The port is not a required part of the URL and if the port portion is omitted, the default port will be used

4. Virtual Directory part: From the first "/" after the domain name to the last "/", is the virtual directory part. The virtual directory is also not a required part of the URL. The virtual directory in this example is "/news/"

5. File name part: From the last "/" after the domain name to "?" "So far, is the file name part, if there is no"? ", then from the domain name after the last"/"Start to" # "so far, is the document part, if not"? "and" # ", then from the last"/"after the domain name to the end, is the file name section. The file name in this example is "index.asp". The file name section is also not a required part of the URL, and if omitted, the default file name is used

6. Anchor part: From the beginning of "#" to the end, are the anchor parts. The anchor section in this example is "name". The anchor part is also not a required part of the URL

7. Parameters section: from "? The part between start and # is the parameter part, also called the search section, the query part. In this example, the parameter section is "Boardid=5&id=24618&page=1". Parameters can be allowed to have more than one parameter, with "&" as the delimiter between parameters and parameters.

Request message for HTTP requests

The client sends an HTTP request to the server for a request message that includes the following format:

Request line, request header (header), blank line, and four parts of request data.

There are several ways to request a method (all uppercase), and each method is interpreted as follows:
Get request gets the resource identified by the Request-uri
Post appends new data to the resource identified by Request-uri
HEAD request Gets the response message header for the resource identified by Request-uri
PUT Request server stores a resource and uses Request-uri as its identity
Delete Request server deletes the resource identified by the Request-uri
TRACE requests the server to echo received request information, primarily for testing or diagnostics
CONNECT reserved for future use
Options request the performance of the query server, or query for resource-related choices and requirements

Get request example, using the request that Charles crawled:

GET /562f25980001b1b106000338.jpg HTTP/1.1Host    img.mukewang.comUser-Agent    Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36Accept image/webp,image/*,*/*;q=0.8Referer http://www.imooc.com/Accept-Encoding gzip, deflate, sdchAccept-Language zh-CN,zh;q=0.8

The first part: the request line, which describes the request type, the resource to access, and the HTTP version used.

The Get Description request type is get,[/562f25980001b1b106000338.jpg] is the resource to be accessed, and the last part of the row illustrates the use of the HTTP1.1 version.

The second part: the request header, followed by the request line (that is, the first line) after the section, to explain the server to use additional information

From the second line to the request header, host will indicate the destination of the request. User-agent, both server-side and client script access to it, is an important basis for browser type detection logic. This information is defined by your browser and is automatically sent in each request, etc.

Part Three: blank line, a blank line behind the request header is required

Even if the request data for part four is empty, there must be a blank line.

Part IV: The request data is also called the principal, you can add any other data.

The request data for this example is empty.

HTTP Response message Response

In general, the server will return an HTTP response message after receiving and processing a request from the client.

The HTTP response is also made up of four parts: the status line, the message header, the blank line, and the response body.

The status code consists of three digits, the first number defines the category of the response, and there are five possible values:
1XX: Indication information--Indicates that the request has been received and continues processing
2XX: Success-Indicates that the request has been successfully received, understood, accepted
3XX: Redirect--further action is required to complete the request
4XX: Client Error--Request syntax error or request not implemented
5XX: Server-side error-the server failed to implement a legitimate request
Common status codes, status descriptions, descriptions:
$ OK//client request succeeded
Bad Request//client requests have syntax errors and cannot be understood by the server
401 Unauthorized//request unauthorized, this status code must be used with the Www-authenticate header field
403 Forbidden//server receives request, but refuses to provide service
404 Not Found//request resource not present, eg: Wrong URL entered
Internal Server error//server unexpected errors
503 Server Unavailable//server is currently unable to process client requests and may return to normal after some time
eg:http/1.1 OK (CRLF)

Example

HTTP/1.1 200 OKDate: Fri, 22 May 2009 06:07:21 GMTContent-Type: text/html; charset=UTF-8<html>      <head></head> <body> <!--body goes here--> </body></html>

The first part: The status line, consists of the HTTP protocol version number, the status code, the status message three parts.

The first behavior status line, (http/1.1) indicates that the HTTP version is 1.1, the status code is 200, and the status message is (OK)

Part II: Message headers that describe some additional information that the client will use

The second line and the third behavior message header,
Date: The day and time the response was generated; Content-type: The MIME-type HTML (text/html) is specified and the encoding type is UTF-8

The third part: a blank line, a blank line after the message header is required Part IV: The response body, the text information that the server returns to the client.

The HTML portion following the empty line is the response body.

How HTTP Works

The HTTP protocol defines how Web clients request Web pages from a Web server and how the server routes Web pages to clients. The HTTP protocol uses the request/response model. The client sends a request message to the server that contains the requested method, URL, protocol version, request header, and request data. The server responds with a status line that includes the version of the Protocol, the success or error code, the server information, the response header, and the response data.

The following are the steps for HTTP request/Response:

1. Client connects to Web server

An HTTP client, typically a browser, establishes a TCP socket connection with the HTTP port of the Web server (default is 80). For example, http://www.oakcms.cn.

2. Sending HTTP requests

Through TCP sockets, the client sends a text request message to the Web server, which consists of a request line, a request header, a blank line, and 4 parts of the requested data.

3. The server accepts the request and returns the HTTP response

The Web server resolves the request and locates the requested resource. The server writes a copy of the resource to the TCP socket, which is read by the client. A response consists of a status line, a response header, a blank line, and a 4 portion of the response data.

4. Release the connection TCP connection

If the connection mode is close, the server actively shuts down the TCP connection, the client shuts down the connection passively, releases the TCP connection, and if the connection mode is keepalive, the connection is maintained for a period of time and the request can continue to be received;

5. client browser parsing HTML content

The client browser parses the status line first to see the status code indicating whether the request was successful. Each response header is then parsed, and the response header informs the following character sets for several bytes of HTML documents and documents. The client browser reads the response data HTML, formats it according to the syntax of the HTML, and displays it in a browser window.

For example: Type the URL in the browser address bar and press ENTER to experience the following process:

1. The browser requests the DNS server to resolve the IP address of the domain name in the URL;

2, after resolving the IP address, according to the IP address and the default port 80, and the server to establish a TCP connection;

3, the browser issued a read file (the URL in the back part of the corresponding file) HTTP request, the request message as a TCP three handshake third message data sent to the server;

4, the server responds to the browser request, and the corresponding HTML resulting sent to the browser;

5, release the TCP connection;

6, the browser will be the HTML text and display content;

Get and post requests differ by GET request

GET /books/?sex=man&name=Professional HTTP/1.1Host: www.wrox.comUser-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.6)Gecko/20050225 Firefox/1.0.1Connection: Keep-Alive

Note that the last line is a blank line

POST request

POST / HTTP/1.1Host: www.wrox.comUser-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.6)Gecko/20050225 Firefox/1.0.1Content-Type: application/x-www-form-urlencodedContent-Length: 40Connection: Keep-Alivename=Professional%20Ajax&publisher=Wiley

1, get commits, the requested data will be appended to the URL (that is, the data placed in the HTTP protocol header), to split the URL and transfer data, multiple parameters with & connection; for example: login.action?name=hyddd&password= Idontknow&verify=%e4%bd%a0%E5%A5%BD. If the data is an English letter/number, sent as is, if it is a space, converted to +, if it is Chinese/other characters, the string is directly encrypted with BASE64, such as:%E4%BD%A0%E5%A5%BD, where the xx in%xx is the symbol in 16 binary notation ASCII.

Post submission: Place the submitted data in the package of the HTTP packet. In the example above, the red font indicates the actual transfer data

As a result, the data submitted by get is displayed in the Address bar, while the post is submitted, the address bar does not change

2, the size of the transmitted data: first of all: the HTTP protocol does not restrict the size of the transmitted data, the HTTP protocol specification does not limit the length of the URL.

The main limitations in the actual development are:

GET: Specific browsers and servers have restrictions on URL length, such as IE's limit on URL length is 2083 bytes (2k+35). For other browsers, such as Netscape, Firefox, etc., there is theoretically no length limit, and its limitations depend on the support of the operating system.

Therefore, for a get commit, the transmitted data is limited by the URL length.

POST: The theoretical data is not limited because it is not transmitted via a URL. However, the actual Web server will be required to limit the size of the post submission data, Apache, IIS6 have their own configuration.

3. Security

The security of post is higher than the security of get. For example: Through get submit data, user name and password will appear in plaintext on the URL, because (1) the login page may be cached by the browser, (2) Other people to view the browser's history, then others can get your account number and password, in addition, Using get to submit data may also cause Cross-site request forgery attack

4. The HTTP GET,POST,SOAP protocol is all running on HTTP

(1) Get: The request parameter is appended to the URL as a sequence of key/value pairs (query string)
The length of the query string is limited by the Web browser and Web server (ie supports up to 2048 characters) and is not suitable for transporting large datasets at the same time, it is unsafe

(2) Post: The request parameter is transmitted in a different part of the HTTP header (named entity body), which is used to transfer the form information, so the Content-type must be set to: application/x-www-form- Urlencoded. The post is designed to support user fields on Web Forms, and its parameters are also transmitted as key/value.
However: it does not support complex data types, because post does not define the semantics and rules for transferring data structures.

(3) Soap: is a dedicated version of HTTP POST, followed by a special XML message format
Content-type is set to: Text/xml Any data can be XML.

The HTTP protocol defines a number of ways to interact with the server, the most basic of which are 4, get,post,put,delete, respectively. A URL address is used to describe a resource on a network, and the Get, POST, PUT, delete in HTTP corresponds to the search for this resource, change, increase, delete 4 operations. Our most common is get and post. Get is typically used to get/query resource information, and post is typically used to update resource information.

Let's look at the difference between get and post

Get submitted data is placed after the URL, to split the URL and transfer data, the parameters are connected with &, such as editposts.aspx?name=test1&id=123456. The Post method is to put the submitted data in the body of the HTTP packet.
The data size for get commits is limited (because the browser has a limit on the length of the URL), and there is no limit to the data submitted by the Post method.
The Get method needs to use Request.QueryString to get the value of the variable, and the Post method takes the value of the variable by Request.Form.
The Get method submits the data, which brings security problems, such as a login page, when the data is submitted via get, the user name and password will appear on the URL, and if the page can be cached or someone else can access the machine, the user's account and password can be obtained from the history record.

HTTP protocol (Introduction)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More