TCP/IP (vii) the play-to-HTTP protocol

Source: Internet
Author: User
Tags http post soap

Objective

The previous blog post simply introduces the HTTP protocol that belongs to the application layer, and this article will learn the HTTP protocol in detail, which is a protocol that must be used in web development. Although I am big data, but learn a little more certainly is

There's no harm in it. National Day holiday 7 days, many people are thinking how to play, I also want to go out to play, but there is no way, efforts to have a way out, refueling!

First, HTTP Protocol Overview 1.1, HTTP protocol Introduction

1) Protocol: the rules or rules that must be complied with for communication between two computers in a computer communication network , Hypertext Transfer Protocol (HTTP) is a communication protocol that allows Hypertext Markup Language (HTML) The document is routed from the Web server to the client's browser.

2) The HTTP protocol is an abbreviation for the Hyper Text Transfer Protocol (Hypertext Transfer Protocol), which is used to transfer hypertext to the local browser from the World Wide Web (www:world Wide Web) server.

3) HTTP is a protocol based on TCP/IP communication to pass data (HTML files, image files, query results, etc.).

4) HTTP is an object-oriented protocol belonging to the application layer , which is suitable for distributed hypermedia information System because of its simple and fast way. It was proposed in 1990, after several years of use and development, has been continuously improved and expanded.

Currently used in the WWW is the sixth edition of Http/1.0, http/1.1 standardization work is in progress, and Http-ng (Next Generation of HTTP) has been proposed.

5) The HTTP protocol works on the client-server architecture . The browser sends all requests via URLs to the HTTP server, which is the Web servers, as an HTTP client. The Web server sends a response message to the client, based on the received request .

    

1.2. HTTP protocol Features

1) Simple and fast: When a customer requests a service from the server, it simply transmits the request method and path. The request method commonly has, POST. Each method specifies a different type of contact between the customer and the server. Because the HTTP protocol is simple, the HTTP server's program size is small, so the communication speed is fast.

2) Flexible:HTTP allows the transfer of any type of data object . The type being transmitted is marked by Content-type.

3) No connection: The meaning of no connection is to limit the processing of only one request per connection . When the server finishes processing the customer's request and receives the customer's answer, the connection is disconnected. In this way, the transmission time can be saved.

4) Stateless: The HTTP protocol is a stateless protocol . stateless means that the protocol has no memory capacity for transactional processing. A lack of state means that if the previous information is required for subsequent processing, it must be re-routed, which may cause the amount of data to be transferred per connection to increase.

On the other hand, it responds faster when the server does not need the previous information.
5) Support B/S and C/s mode.

Second, url and URI2.1, url

In fact, the previous simple to know what is the URL.

HTTP uses a Uniform Resource identifier (Uniform Resource Identifiers, URI) to describe a resource on a network to transfer data and establish a connection . A URL is a special type of URI that contains enough information to find a resource.

URL, full name is Uniformresourcelocator, Chinese is called the Uniform Resource Locator , which is the address used on the Internet to identify a resource, and its components are:

Http://www.zyh.com:8080/woss/index.html?username=10086&password=123456#name

As you can see from the URL above, a complete URL includes the following sections:
1) Protocol section: the protocol portion of the URL is "http:", which means that the Web page uses the HTTP protocol. You can use multiple protocols in the Internet, such as http,ftp, and so on, in this case the HTTP protocol. "//" after "HTTP" is a delimiter

2) Domain Name section: The domain name portion of the URL is "www.zyh.com". A URL, you can also use the IP address as the domain name

3) port section: followed by the domain name is the port, between the domain name and the port using ":" As the delimiter. Port is not a required part of the URL, if the port portion is omitted, the default port (80) will be used

4) virtual directory part: From the first "/" after the domain name to the last "/", is the virtual directory part. The virtual directory is also not a required part of the URL. The virtual directory in this example is "/woss/"

5) file name part: From the last "/" after the domain name to "?" "So far, is the file name part, if there is no"? ", then from the domain name after the last"/"Start to" # "so far, is the document part, if not"? "and" # ", then from the last"/"after the domain name begins to the end,

is the file name section. The file name in this example is "index.html". The file name section is also not a required part of the URL, and if omitted, the default file name is used

6) Anchor part: From the beginning of "#" to the end, are the anchor parts. The anchor section in this example is "name". The anchor part is also not a required part of the URL

7) Parameter section: from "? The part between start and # is the parameter part, also called the search section, the query part. In this example, the parameter section is "username=10086&password=123456". Parameters can be allowed to have more than one parameter, with "&" as the delimiter between parameters and parameters.

2.2. The difference between URI and URI

1) URI, is the Uniform Resource Identifier, a Uniform resource identifier used to uniquely identify a resource .
Every resource available on the Web, such as HTML documents, images, video clips, programs, etc., is a URI to locate.
URIs are generally made up of three parts:
naming mechanism for accessing resources
Host name of the storage resource
The name of the resource itself, represented by a path, with emphasis on resources.

2) The URL is Uniform Resource Locator, a Uniform Resource Locator, which is a specific URI that the URL can use to identify a resource and also how to locate the resource .
URLs are strings used on the Internet to describe information resources, mainly used in various WWW client programs and server programs, especially the famous mosaic.
URLs can be used in a unified format to describe various information resources, including files, server addresses and directories. The URL is generally composed of three parts:
Protocol (or service mode)
The host IP address (and sometimes the port number) that contains the resource
The specific address of the host resource. such as directory and file name, etc.

Third, the work flow

An HTTP operation is called a transaction, and its working process can be divided into four steps:
1) First the client and the server need to establish a connection. As soon as you click on a hyperlink, the HTTP work begins .
2) After the connection is established, the client sends a request to the server in the form of a Uniform Resource Identifier (URL), protocol version number, followed by MIME information including the request modifier, client information, and possible content .
3) When the server receives the request, it gives the corresponding response information in the form of a status line, including the protocol version number of the information, a successful or incorrect code, followed by MIME information including server information, entity information, and possible content.
4) The information returned by the client receiving server is displayed on the user's display by the browser, and then the client disconnects from the server.
If an error occurs in one of the steps above, the information that generates the error is returned to the client, with the display output. For the user, these processes are done by HTTP itself, the user just click with the mouse, waiting for information to display it.

Let's take a look at the graph to understand:

When we open the browser, enter the URL in the Address bar, and then we see the page.

In fact, after we enter the URL, our browser sends a request to the Web server, the Web server receives the request, processes it, generates the corresponding response, sends it to the browser, and the browser parses the HTML in the response. So we see the Web page , the process is as follows:

    

It is possible that our request was passed through a proxy server and finally arrived at the Web server. The process is as follows:

     

A proxy server is a transit point for network information, and its functions are:

Increased access speed, most proxy servers have caching capabilities.

Breaking the limits, that's FQ.

Hide identities.

Attention:

HTTP is a transport-layer-based TCP protocol, and TCP is an end-to-end connection-oriented protocol. The so-called end-to-end can be understood as process-to-process communication. so HTTP begins with a TCP connection before starting the transfer, and the TCP connection process requires a so-called "three handshake".

The three-time handshake for the TCP connection shown.
After the TCP three handshake, a TCP connection is established, at which point the HTTP can be transmitted. An important concept is connection-oriented, where HTTP is not disconnected from the TCP connection between completion of the transfer. In HTTP1.1 (set by connection header) This is the default behavior .

Iv. Request Message 4.1 in HTTP, request message format

The client sends an HTTP request to the server for a request message that has a certain format:

     

As can be seen from the above, the request message consists of four parts:

  request Line, request header (header), blank line, and request data four components

    

The method in the first line represents the request methods , such as "POST", "GET", Path-to-resoure represents the requested resource (URL), Http/version-number represents Version number of the HTTP protocol

When the "GET" method is used, the body is empty.

4.2. Request message with GET request

When we visit Sohu's official website, I use the Firebug crawl request message

  

The first part: the request line, which describes the request type, the resource to access, and the HTTP version used .

get/http://www.sohu.com http/1.1 request line, except here it was separated, the way the request URL version

The second part: the request header, followed by the request line (that is, the first line) after the section, to explain the server to use additional information .

1) Host: Host name www.solu.com

2) User-agent: What proxy server to use, this is Firefox, that is Firefox

3) Accept: What types of data can be received

4) Accept-language: Indicates the user wants the first to want the version, once the arrangement goes on, first Chinese, then English

5) Accept-encoding: The data compression format that can be sent by the notification server

6) Cookies: a technology on the browser side that logs user information on the server, but also saves a copy in the browser .

7) Connection: The way of connection, there are two kinds, non-persistent connection and persistent connection, non-persistent connection, one request/response corresponds to a TCP connection, a call should be connected to shut down, and then send the request on the establishment of a TCP connection, on the contrary, the use of persistent connection

8) Upgrade-insecure-requests: This directive is used to allow the browser to automatically upgrade requests from HTTP to HTTPS, for a large number of HTTP resources containing HTTP Web pages directly to HTTPS without error. To be concise, is equivalent to a transition between HTTP and HTTPS.

Part Three: blank line, a blank line behind the request header is required
Even if the request data for part four is empty, there must be a blank line .

Part IV: The request data is also called the principal, you can add any other data.
Request data is empty when using Get mode request.

Since the general request message will not have the request data, so after 9 there is no content, generally if you want to send the data past the degree will be added after the domain name, and then send the data to the past

4.3. Request message with POST request

  

The first part: The request line, the first line is the POST request, and the http1.1 version.
The second part: The request head, the second line to the sixth line.
Part Three: blank line, blank line in line seventh.
Part IV: Request data, line eighth.

Five, HTTP request detailed 5.1, HTTP request method

HTTP requests can use a variety of request methods, depending on the HTTP standard.
HTTP1.0 defines three methods of request: GET, POST, and head.

GET: Requests the specified page information and returns the entity principal to submit data to the specified resource for processing requests (such as submitting a form or uploading a file). The data is included in the request body. A POST request may result in the creation of new resources and  /  or modification of existing resources . HEAD: Similar to a GET request, except that there is no specific content in the returned response to get the header 

HTTP1.1 has five new request methods: Options, PUT, DELETE, TRACE, and CONNECT methods.

PUT: Supersedes the contents of the specified document from the data that the client sends to the server. Delete: The requested server deletes the specified page. Connect:http/1. The 1 protocol is reserved for proxy servers that can change connections to pipelines. OPTIONS: Allow clients to view server performance. TRACE: Echo the request received by the server, primarily for testing or diagnostics. 
5.2. Differences between get and post requests

The HTTP protocol defines a number of ways to interact with the server, the most basic of which are 4, get,post,put,delete, respectively. a URL address is used to describe a resource on a network, and the Get, POST, PUT, delete in HTTP corresponds to the search for this resource, change, increase, delete 4 operations .

Our most common is get and post. get is typically used to get/query resource information, and post is typically used to update resource information.

1) Submit data method: Get submit, the requested data will be appended to the URL (that is, the data placed in the HTTP protocol header), to split the URL and transfer data, multiple parameters with & connection .

For example: Login.action?name=hyddd&password=idontknow&verify=%e4%bd%a0%e5%a5%bd. If the data is an English letter/number, it is sent as is, if it is a space, converted to +,

In the case of Chinese/Other characters, the string is encrypted directly with BASE64 , which is derived from the following example:%E4%BD%A0%E5%A5%BD, where xx in%xx is the ASCII represented by the symbol in 16 notation.

Post submission: Place the submitted data in the package of the HTTP packet . The data submitted in the previous example is below the carriage return line.

2) The size of the transmitted data: first of all: theHTTP protocol does not limit the size of the transmitted data, and the HTTP protocol specification does not limit the length of the URL. The main limitations in the actual development are:

GET: specific browsers and servers have restrictions on URL length , such as IE's limit on URL length is 2083 bytes (2k+35). For other browsers, such as Netscape, Firefox, etc., there is theoretically no length limit, and its limitations depend on the support of the operating system.

Therefore, for a get commit, the transmitted data is limited by the URL length.

POST: The theoretical data is not limited because it is not transmitted via a URL. However, the actual Web server will be required to limit the size of the post submission data , Apache, IIS6 have their own configuration.

3) Security:post security is higher than get security . For example: Through get submit data, user name and password will appear in plaintext on the URL, because (1) the login page may be cached by the browser, (2) Other people to view the browser's history, then others can get your account number and password,

In addition, using get to submit data can also cause Cross-site request forgery attacks.

4) The HTTP GET,POST,SOAP protocol is all running on HTTP

Get: The request parameter is appended to the URL as a sequence of key/value pairs (query string)
The length of the query string is limited by the Web browser and Web server (ie supports up to 2048 characters) and is not suitable for transporting large datasets at the same time, it is unsafe

Post: The request parameter is transmitted in a different part of the HTTP header (named entity body), which is used to transfer the form information, so the Content-type must be set to: application/x-www-form-urlencoded.

The post is designed to support user fields on Web Forms, and its parameters are also transmitted as key/value. However: it does not support complex data types, because post does not define the semantics and rules for transferring data structures.

Soap: is a dedicated version of HTTP POST, followed by a special XML message format, Content-type set to: Text/xml Any data can be XML.
To summarize the difference between get and post, as described above:

Get submitted data is placed after the URL , to split the URL and transfer data, the parameters are connected to &, such as Login.action?name=hyddd&password=idontknow&verify =%e4%bd%a0%E5%A5%BD. The Post method is to put the submitted data in the body of the HTTP packet .

there is a limit to the data size for get submissions (because the browser has a limit on the length of the URL), and there is no limit to the data that is submitted by the Post method.

The Get method needs to use Request.QueryString to get the value of the variable, and the Post method takes the value of the variable by Request.Form .

The Get method submits the data, which brings security problems , such as a login page, when the data is submitted via get, the user name and password will appear on the URL, and if the page can be cached or someone else can access the machine, the user's account and password can be obtained from the history record.

5.3. Open a Web page requires the browser to send multiple request requests

1) When you enter the URL http://www.cnblogs.com in the browser, the browser sends a request to get the http://www.cnblogs.com HTML. The server sends the response back to the browser.
2) The browser parses the HTML in response and discovers that it references a lot of other files than slices, CSS files, and JS files.
3) The browser will automatically send the request again to get pictures, CSS files, or JS files.
4) Wait until all the files have been downloaded successfully. The Web page is displayed.

Six, HTTP in response message (response) 6.1, Response message format

In general, the server will return an HTTP response message after receiving and processing a request from the client. The format is as follows:

  

The HTTP response is also made up of four parts: the status line, the message header, the blank line, and the response body.

6.2. Response message

The first part: the status line , consists of the HTTP protocol version number, the status code, the status message three parts.

The first behavior status line, (http/1.1) indicates that the HTTP version is 1.1, the status code is 200, and the status message is (OK)

Part II: message headers that describe some additional information that the client will use

The second line and the third behavior message header. Date: The day and time the response was generated; Content-type: The MIME-type HTML (text/html) is specified and the encoding type is UTF-8

Part Three: blank line , a blank line after the message header is required

Part IV: The response body , the text information that the server returns to the client.

The HTML portion following the empty line is the response body.

6.3. Response Status Code

The status code consists of three digits, and the first number defines the category of the response, divided into five categories:
1XX: Indication information--Indicates that the request has been received and continues processing
2XX: Success-Indicates that the request has been successfully received, understood, accepted
3XX: Redirect--further action is required to complete the request
4XX: Client Error--Request syntax error or request not implemented
5XX: Server-side error-the server failed to implement a legitimate request

The common status codes are:

 $Ok//Client Request succeeded -Bad Request//client requests have syntax errors and cannot be understood by the server401Unauthorized//request is not authorized, this status code must be used with the Www-authenticate header field403Forbidden//the server received the request but refused to provide the service404Not Found//Request resource does not exist, eg: the wrong URL was entered -Internal Server Error//Unexpected error occurred on server503Server unavailable//the server is currently unable to process client requests and may return to normal after some time
Seven, HTTP working principle

In front of the content of the HTTP is very detailed, then we have a general look at how it works!

The HTTP protocol defines how Web clients request Web pages from a Web server and how the server routes Web pages to clients . The HTTP protocol uses the request/response model . The client sends a request message to the server,

The request message contains the requested method, URL, protocol version, request header, and request data . The server responds with a status line that includes the version of the Protocol, the success or error code, the server information, the response header, and the number of responses .

Steps for HTTP request/Response:

1) The client connects to the Web server

An HTTP client, typically a browser, establishes a TCP socket connection with the HTTP port of the Web server (default is 80). For example, http://www.oakcms.cn.

2) Send HTTP request

Through TCP sockets, the client sends a text request message to the Web server, which consists of a request line, a request header, a blank line, and 4 parts of the requested data.

3) The server accepts the request and returns the HTTP response

The Web server resolves the request and locates the requested resource. The server writes a copy of the resource to the TCP socket, which is read by the client. A response consists of a status line, a response header, a blank line, and a 4 portion of the response data.

4) Release the connection to the TCP connection

If the connection mode is close, the server actively shuts down the TCP connection, the client shuts down the connection passively, releases the TCP connection, and if the connection mode is keepalive, the connection is maintained for a period of time and the request can continue to be received;

5) client browser parsing HTML content

The client browser parses the status line first to see the status code indicating whether the request was successful. Each response header is then parsed, and the response header informs the following character sets for several bytes of HTML documents and documents.

The client browser reads the response data HTML, formats it according to the syntax of the HTML, and displays it in a browser window.

For example: Type the URL in the browser address bar and press ENTER to experience the following process:

The browser requests the DNS server to resolve the IP address of the domain name in the URL;

After resolving the IP address, the TCP connection is established with the server according to the IP address and the default port 80.

The browser issues an HTTP request to read the file (the file that corresponds to the domain name in the URL), which is sent to the server as the data for the third message of the TCP three handshake;

The server responds to the browser request and sends the corresponding HTML resulting to the browser;

Release the TCP connection;

The browser adds the HTML text and displays the content;


To this end, really much, like on the "recommended" Oh!

TCP/IP (vii) the play-to-HTTP protocol

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.