1 HTTP Overview 1.1 HTTP Introduction
HTTP is an abbreviation for Hyper Text Transfer Protocol (Hypertext Transfer Protocol). HTTP is a transfer protocol used to transfer hypertext from a WWW server to a local browser. It can make the browser more efficient and reduce the network transmission. It not only ensures that the computer transmits hypertext documents correctly and quickly, but also determines which part of the document is being transmitted, and which content is displayed first (such as text before graphics), and so on.
HTTP is an application-layer protocol that consists of requests and responses and is a standard client server model.
1.2 HTTP location in the network model
TCP/IP architecture corresponds to OSI |
Tcp / ip |
Osi |
Application Layer |
Application Layer Presentation Layer Session Layer |
Host-to-host layer (TCP) (also known as transport Layer) |
Transport Layer |
Network Layer (IP) (also known as interconnect layer) |
Network layer |
Network interface layer (also known as link layer) |
Data Link Layer |
Physical Layer |
TCP/IP architecture corresponds to OSI |
Layers in the OSI |
Function |
TCP/IP protocol family |
Application Layer |
File transfer, email, file service, virtual terminal |
Tftp,http,snmp,ftp,smtp,dns,telnet, wait. |
Presentation Layer |
Translation, encryption, compression |
No protocol. |
Session Layer |
Dialog control, setting up synchronization points (cont.) |
No protocol. |
Transport Layer |
Port addressing, segmented reassembly, traffic, error control |
Tcp,udp |
Network layer |
Logical addressing, route selection |
Ip,icmp,ospf,eigrp,igmp |
Data Link Layer |
Framing, physical addressing, traffic, error, access control |
Slip,cslip,ppp,mtu |
Physical Layer |
Set up network topology, bit transfer, bit synchronization |
iso2110,ieee802,ieee802.2 |
1.3 Version Introduction
The HTTP protocol has evolved into many versions, most of which are backwards compatible. The use of the HTTP version number is described in RFC 2145. The client tells the server at the beginning of the request that it takes the protocol version number, while the latter uses the same or earlier protocol version in the response. Each version is described below:
http/0.9
is obsolete. Only get one request method is accepted, no version number is specified in the communication, and the request header is not supported. Because this version does not support the Post method, clients cannot pass too much information to the server.
http/1.0
This is the first HTTP protocol version that specifies the version number in the newsletter, and is still widely used, especially in proxy servers.
http/1.1
The current version. Persistent connections are adopted by default and work well with proxy servers. It also supports the pipeline to send multiple requests simultaneously in order to reduce the line load and increase the transfer speed.
http/1.1 compared to the http/1.0 protocol is mainly reflected in:
L Cache Processing
• Bandwidth optimization and use of network connections
L Management of Error notification
L send messages in the network
L Maintenance of Internet address
L Safety and integrity
In HTTP 0.9 and 1.0 using non-persistent connections, under non-persistent connections, each TCP connects to only one Web object, and the connection is closed after each request-reply response. Now using persistent connections on HTTP 1.1, you do not have to create a new connection for each Web object, and a connection can transfer multiple objects. This continuation of the connection significantly reduces the request latency because the client does not have to make a TCP interaction acknowledgement again after the first request to create the connection.
The HTTP1.1 also provides bandwidth optimization, such as 1.1, which introduces chunked transfer encoding to allow the streaming of content sent on a continuous connection instead of the original buffer transmission. The HTTP pipeline allows the customer to further reduce the delay time by sending multiple requests before the last response is received.
2 HTTP Basic concepts and Properties 2.1 client and server Introduction
HTTP is a standard for client terminal (user) and server-side (Web) requests and responses. By using a Web browser, crawler, or other tool, the client initiates an HTTP request to the specified port on the server (the default port is 80). We call this client a user agent. Some resources, such as HTML files and images, are stored on the answering server. We call this answering server the source server (Origin server).
The communication process for HTTP is the process by which the client sends requests to the server and the server responds when the request is received.
2.2 HTTP Request Method 2.2.1 HTTP Request Method overview
The http/1.1 protocol defines eight ways to indicate the different ways in which the Request-uri specified resources are operated:
1) OPTIONS: Returns the HTTP request method that the server supports for a specific resource. You can also test the functionality of your server with a request to send a ' * ' to the Web server.
2) HEAD: Ask the server for a response that is consistent with the GET request, except that the response body will not be returned. This method allows you to obtain meta information contained in the response message header without having to transmit the entire response content.
3) GET: Make a request to a specific resource. Note: The Get method should not be used in operations that produce "side effects", such as in Web apps. One of the reasons is that get can be accessed by web spiders and other casual.
4) POST: Submits data to the specified resource for processing requests (such as submitting a form or uploading a file). The data is included in the request body. A POST request may result in the creation of new resources and modification of existing resources.
5) PUT: Uploads its latest content to the specified resource location.
6) Delete: The request server deletes the resource identified by the Request-uri.
7) TRACE: Echo the request received by the server, primarily for testing or diagnostics.
8) The connect:http/1.1 protocol is reserved for proxy servers that can change connections to pipelines. Typically used for SSL-encrypted server links.
Method names are case-sensitive. When a request is directed to a resource that does not support the corresponding request method, the server should return the status Code 405 (method not allowed), and the Status Code 501 (not implemented) should be returned when the server does not recognize or support the corresponding request method.
The HTTP server should at least implement the get and head methods, and the other methods are optional. Of course, all of the implementations supported by the methods should conform to the respective semantic definitions of the methods described above. In addition, in addition to the methods described above, a specific HTTP server can also extend a custom method.
Of the 8 methods mentioned above, the Get and post methods are commonly used.
2.2.2 HTTP Common Request method
If the user is not set, by default the browser sends a GET request to the server, for example, in the browser directly enter the address access, the point hyperlink access and so on are get, the user wants to send the request to the post, can be implemented by changing the form's submission method.
2.2.2.1 GET
If the request method is get mode, you can bring the data to the server in the form of the requested URL address, and multiple data are separated by &, for example:
GET/MAIL/1.HTML?NAME=ABC&PASSWORD=XYZ http/1.1
The feature of Get mode: The data size cannot exceed 1K because it is appended to the URL address.
2.2.2.2 POST
If the request is in the form of a post, you can send data to the server in the requested entity content, for example:
Post/servlet/paramsservlet http/1.1
Host:
content-type:application/x-www-form-urlencoded
Content-length:28
Name=abc&password=xyz
Post mode features: The amount of data transmitted is unrestricted.
2.3 HTTP Message Description 2.3.1 HTTP request Message 2.3.1.1 HTTP request message structure
The HTTP request message consists of 3 parts (Request line + request header + request body):
The following is an actual request message:
① is the request method, and get and post are the most common HTTP methods, in addition to the delete, HEAD, OPTIONS, PUT, TRACE. However, most current browsers only support get and post,spring 3.0 to provide a hiddenhttpmethodfilter that allows you to specify these special HTTP methods via the "_method" form parameter (in fact, submit the form via post). After the server is configured with Hiddenhttpmethodfilter, spring simulates the corresponding HTTP method based on the values specified by the _method parameter, so that the processing method can be mapped using these HTTP methods.
② to request the corresponding URL address, which consists of a complete request for the host property of the message header Url,③ is the protocol name and version number.
④ is the message header of HTTP, the message header contains several attributes, in the form of "attribute name: Attribute value", the service side obtains the client's information accordingly.
⑤ is the newspaper style, which encodes the component values in a page form into a formatted string by param1=value1¶m2=value2 key-value pairs, which host data for multiple request parameters. Not only the style can pass the request parameters, the request URL can also pass the request parameters in a way similar to "/chapter15/user.html?param1=value1¶m2=value2".
2.3.1.2 HTTP request header information detailed
Common message headers in HTTP requests:
1) accept:text/html,image/*
The client tells the server its own type of support. If yes: accept:*/* I support everything.
2) accept-charset:iso-8859-1
The encoding used to tell the server that the client is using.
3) accept-encoding:gzip,compress
Used to tell the server that the client supports the compression format.
4) ACCEPT-LANGUAGE:EN-US,ZH-CN
The language environment of the client.
5) host:www.it315.org:80
The client uses this header to tell the server what host name it wants to access.
6) If-modified-since:tue, Jul 18:23:51gmt
The client tells the server through this header the cache time of the resource.
7) referer:http://www.it315.org/index.jsp
An anti-theft chain that tells the server which resource to access the server from.
8) user-agent:mozilla/4.0 (compatible; MSIE5.5; Windows NT 5.0)
Tell the server about your own software environment.
9) Cookies
This header allows the client to bring data to the server.
) connection:close/keep-alive
Whether to remain connected after the request is complete.
One) Date:tue, Jul 18:23:51 GMT
The time that the client accesses the server.
2.3.2 HTTP Response Message 2.3.2.1 HTTP response message structure
HTTP response messages are also made up of three parts (response line + response Head + response body):
The following is an actual HTTP response message:
① message protocol and version;
② Status code and status description;
The ③ response message header is also composed of several attributes;
④ responds to the style of the newspaper, that is, what we really want.
2.3.2.2 HTTP response header information detailed
1) location:http://www.nihao.org/index.jsp
This header is used with 302 status codes to tell the customer who to look for
2) Server:apache Tomcat
The server passes this header and tells the browser the type of the server
3) Content-encoding:gzip
Server through this header, tell the browser data compression format
4) content-length:80
The server passes this header and tells the browser the length of the loopback data
5) CONTENT-LANGUAGE:ZH-CN
Loopback language Environment
6) content-type:text/html; charset=gb2312
The server passes this header and tells the browser to echo the data type
7) Last-modified:tue, Jul 18:23:51 GMT
Server through this header, tell the browser the current resource cache time
8) refresh:1;url=http://www.it315.org
The server passes this header and tells the browser how long it takes to refresh
9) Content-disposition:attachment; Filename=aaa.zip
Server through this header, to download the way to open this header
Ten) Set-cookie:ss=q0=5lb_nq; Path=/search
One) transfer-encoding
The server passes this header and tells the browser the data transfer format.
) ETag
Cache related headers that can be updated in real time
Expires: -1//3 type of the header field of the forbidden cache
Controls how long the browser data is cached-1 or 0 does not cache,
Cache-control:no-cache)
Pragma:no-cache)
Server through 14,15 These two headers, control browser do not cache
connection:close/keep-alive)
Continue to remain connected or disconnected
Date:tue, Jul2000 18:23:51 GMT
The time at which the server writes back data.
2.4 Service-side Response status Code 2.4.1 HTTP response Status Code introduction
An HTTP status code (HTTP StatusCode) is a 3-bit numeric code that represents the HTTP response status of a Web server. If you visit a webpage frequently, you will encounter 503,404 errors.
Some of the common status codes are:
1) 200-server successfully returned to Web page
2) 404-The requested page does not exist
3) 503-Service Not available
2.4.2 Common response Status Code introduction
L 1XX (Temporary response)
A status code that represents a temporary response and requires the requestor to continue the operation.
100:continue (continued) The requestor shall continue to make the request. The server returns this code to indicate that the first part of the request was received and is waiting for the remainder.
101:switching protocols (switching protocol) The requestor has asked the server to switch protocols and the server has confirmed and is ready to switch.
L 2XX (Success)
Represents the status code that successfully processed the request.
The 200:ok (successful) server has successfully processed the request. Typically, this indicates that the server provided the requested Web page.
201:created (created) the request was successful and the server created a new resource.
202:accepted (accepted) the server has accepted the request but has not yet processed it.
The 203:non-authoritativeinformation (non-authoritative information) server has successfully processed the request, but the information returned may be from another source.
204:no content (no content) the server successfully processed the request, but did not return anything.
The 205:reset content (reset contents) server successfully processed the request, but did not return any content.
206:partial content (partial) the server successfully processed a partial GET request.
L 3xx (redirected)
Indicates that further action is required to complete the request. Typically, these status codes are used for redirection.
300:multiple Choices (multiple choices) for requests, the server can perform a variety of operations. The server can select an action based on the requestor (user agent) or provide a list of actions for the requestor to select.
301:moved permanently (permanently moved) The requested page has been permanently moved to the new location. When the server returns this response (a response to a GET or HEAD request), the requestor is automatically forwarded to the new location.
The 302:found (temporary move) server currently responds to requests from a Web page in a different location, but the requestor should continue to use the original location for future requests.
303:see Other (view other locations) The server returns this code when the requestor should use a separate GET request for the different locations to retrieve the response.
304:not Modified (not modified) The requested webpage has not been modified since the last request. When the server returns this response, the Web page content is not returned.
305:use Proxy (using proxy) the requestor can only use the proxy to access the requested Web page. If the server returns this response, it also indicates that the requestor should use the proxy.
307:temporary Redirect (Temporary redirect) server currently responds to requests from different locations of the Web page, but the requestor should continue to use the original location for future requests.
L 4xx (Request error)
These status codes indicate a possible error in the request and hinder the processing of the server.
The 400:bad request (Error request) server does not understand the requested syntax.
The 401:unauthorized (unauthorized) request requires authentication. The server may return this response for pages that need to log on.
403:forbidden (Forbidden) The server rejects the request.
404:not Found (not found) the server could not find the requested webpage.
405:method not allowed (method disabled) Disables the method specified in the request.
406:not acceptable (not accepted) cannot use the requested content attribute to respond to the requested Web page.
407:proxy authenticationrequired (requires proxy authorization) This status code is similar to 401 (unauthorized), but specifies that the requestor should authorize the use of the proxy.
408:request Timeout (Request timed out) when the server waits for a request.
The 409:conflict (conflict) server has a conflict when the request is completed. The server must include information about the conflict in the response.
410:gone (Deleted) If the requested resource has been permanently deleted, the server returns this response.
411:length Required (requires valid length) The server does not accept requests that do not contain a valid Content-Length header field.
412:precondition Failed (precondition not met) the server does not meet one of the prerequisites set by the requestor in the request.
413:request Entity Toolarge (Request entity is too large) the server cannot process the request because the request entity is too large to exceed the processing power of the server.
414:request-uri Too Long (the requested URI is too long) the URI of the request (usually the URL) is too lengthy for the server to process.
415:unsupported mediatype (Unsupported media type) The requested format is not supported by the requested page.
416:requested Range notsatisfiable (Request scope not compliant) if the page cannot provide the requested range, the server returns this status code.
417:expectation Failed (unmet expectations) the server does not meet the requirements for the "expected" Request header field.
L 5xx (server error)
These status codes indicate that the server has an internal error while trying to process the request. These errors may be the error of the server itself, not the request.
500:internal Server error (server internal errors) the servers encountered an error and could not complete the request.
501:not implemented (not implemented) the server does not have the ability to complete the request. For example, this code may be returned when the server does not recognize the request method.
The 502:bad Gateway (Error Gateway) server receives an invalid response from the upstream server as a gateway or proxy.
503:service unavailable (Service Unavailable) server is not currently available (due to overloading or downtime maintenance). Typically, this is only a temporary state.
The 504:gateway timeout (gateway Timeout) server acts as a gateway or proxy, but does not receive requests from the upstream server in a timely manner.
505:http version not supported (HTTP version unsupported) server does not support HTTP protocol versions used in requests.
3 HTTP Communication Flow
The HTTP client-to-server communication process is as follows:
L URL (Uniform Resource Locator) auto-parse
Httpurl contains enough information to find a resource in the following basic format: http://host[":" Port][abs_path], where HTTP is used to locate network resources using the HTTP protocol, and host represents a legitimate host domain name or IP address, PORT specifies a port number, the default 80;abs_path specifies the URI of the requested resource (Uniform Resource Identifier), and if Abs_path is not given in the URL, it must be given as a "/" when it is requested, usually by the work browser.
For example: Enter www.baidu.com; The browser will automatically convert to:/HTTP/www.baidu.com/
L Get IP, establish TCP connection
After you enter "http://www.xxx.com/" in the browser's address bar and submit it, first it looks in the DNS local cache table and, if there is one, tells the IP address directly. If not, the gateway DNS is required to look up, so after the corresponding IP is found, it will be returned to the browser.
When the IP is acquired, a three-time handshake connection is initiated with the requested TCP, and an HTTP request is made to the server after the connection is established.
L client sends HTTP request to server
Once a TCP connection is established, the client sends a request command to the server, and then sends some other information to the server in the form of a header message, after which the client sends a blank line to notify the server that it has ended sending the header message.
L Server answers and sends data to client
After a client makes a request to the server, the server responds with a client echo, such as:
http/1.1200 OK
The first part of the answer is the version number of the protocol and the response status code, just as the client sends information about itself along with the request, and the server sends the user with the answer about its own data and the requested document.
After the server sends the header information to the client, it sends a blank line to indicate that the header information is sent to the end, and then it sends the actual data requested by the user in the format described in the Content-type reply header information.
L Server shuts down TCP connection
In general, once the server sends the client-requested data to the client, it shuts down the TCP connection if the client or server joins this line of code in its header:
Connection:keep-alive
The TCP connection remains open after it is sent, so the client can continue to send the request over the same connection. Maintaining a connection saves the time it takes to establish a new connection for each request and also saves network bandwidth.
4 comparison of HTTP with other network Protocols 4.1 http vs. tcp/udp
TCP (Transmission Control Protocol), which provides a connection-oriented, reliable byte-stream service. Before the customer and the server Exchange data with each other, a TCP connection must be established between the two parties before the data can be transferred. TCP provides time-out re-send, discard duplicate data, test data, flow control and other functions to ensure that data can be transmitted from one end to the other. Ideally, once a TCP connection is established, the TCP connection is maintained until either side of the communication actively shuts down the connection. Server and client can proactively initiate a request to disconnect a TCP connection when disconnected
TCP sends the packet has the serial number, the other party receives the packet to give a feedback, if has not received the feedback to automatically perform the time-out resend, therefore the TCP biggest advantage is reliable. General Web page (HTTP), Mail (SMTP), remote connection (Telnet), file (FTP) transfer is used with TCP
UDP (User Datagram Protocol) is a simple non-connected Transport layer protocol for datagrams. UDP does not provide reliability, it simply sends the application to the IP layer's datagram, but does not guarantee that it will reach its destination. Because UDP does not have to establish a connection between the client and the server before transmitting the datagram, and there is no mechanism such as time-out retransmission, the transmission speed is fast
UDP is a message-oriented protocol, communication does not need to establish a connection, the transmission of data is naturally unreliable, UDP is generally used for multipoint communication and real-time data services, such as voice broadcast, video, QQ, TFTP (Simple File transfer), SNMP (Simple Network Management Protocol), RTP (Real-time delivery protocol) RIP (Routing Information protocol such as reporting stock market, aviation information), DNS (Domain name interpretation). Pay attention to the smooth speed.
HTTP protocol is an application based on TCP protocol, the HTTP connection uses "Request-response" method, not only need to establish a connection in the request, but also require the client to make a request to the server, the server side can reply to the data. After the request is finished, the connection is released actively. The process from establishing a connection to closing a connection is called a "one-time connection." Because HTTP is actively releasing the connection after each request ends, the HTTP connection is a "short connection", which requires constant connection requests to the server to maintain the client program's online status. As a general practice, there is no need to obtain any data immediately, and the client will keep a "keep-connected" request to the server at regular intervals, and the server responds to the client after receiving the request, indicating that the client is "online". If the server can not receive the client's request for a long time, it is considered that the client "offline", if the client cannot receive a reply from the server for a long time, it is considered that the network has been disconnected.
4.2 http vs. ftp
1. The HTTP protocol is used to browse the Web site, and FTP is used to access and transfer files, FTP file transfer a little bulk upload and maintain the meaning of the site, and HTTP file transfer is more for the end user to provide file transfer, such as movies, pictures, music and so on.
2. HTTP and FTP clients: the usual HTTP client is the browser, and the FTP service can be done through the command line or the user's own graphical interface client.
3. HTTP Header: HTTP header contains metadata, such as the last changed date, encoding, server name version and some other information, which does not exist in FTP.
4. FTP appears 10 years or so earlier than HTTP.
5. Data format: FTP can transmit acsii data or binary format data, while HTTP is only in binary format.
6. Pipelining in http: HTTP support pipelining, which means that the client can issue the next request before the last request is processed, with the result that some server client round-trip latency is omitted before the data is requested more than once. And FTP does not have this support.
7. Dynamic port in http: FTP One of the biggest problems is that it uses two connections, the first connection is used to send a control instruction, and when the data is accepted or sent, the second TCP connection is opened. Instead, HTTP uses dynamic ports in bidirectional transmissions.
8. Persistent connection in http: For an HTTP session, the client can maintain a single connection and use it for any number of data transfers. FTP creates a new connection each time the data is needed. It is not good to repeat the experience of creating a new connection, because each time you create a connection you have to have both sides handshake verification, which consumes a lot of time.
9. Compression algorithm in http: HTTP provides a way for clients and servers to negotiate options in some compression algorithms. Gzip is one of the most influential, and there is no such complex algorithm in FTP.
HTTP Support proxy: HTTP A big feature is the support agent, which is built in the protocol, and FTP is not supported.
11. And the one thing that FTP can stand out from is that the protocol is directly file-level oriented. This means that FTP has features such as a list of directories on a remote server that can be listed by command, and HTTP does not.
12. Speed. The final result will vary depending on the specific situation, a single transfer of a static file, FTP transfer faster, when the transfer of multiple files, HTTP faster.
Source of Speed advantage of ftp:
1) No meta-data is added to the emitted data, only the original binary files are transmitted.
2) No excessive chunked coding.
Source of Speed advantage of http:
1) Reuse of existing persistent connections, resulting in better TCP performance.
2) pipeline support makes it quicker to request multiple files from the same server.
3) The automatic compression mechanism enables less data to be transmitted.
4) There is no command/response mechanism to minimize the round-trip delay.
HTTP is suitable for uploading small files, FTP is suitable for uploading large files. HTTP uploads, in which the client needs to load all the files into memory, and FTP has an obvious innate advantage, does not need to load all the files into memory. If you determine the file is relatively small, you can use the HTTP protocol to upload, if the file is larger, it is best to use FTP to upload. In a local area network, files that exceed 500M are generally not suitable for uploading using the HTTP protocol.
If you want to allow HTTP to support large files, there are several options:
A) Increase the server memory because the client files are cached in server memory. (unverified)
b) block the HTTP upload, that is, a piece of the HTTP protocol to upload the file, and then merge all the fast.
c) Develop the upload plugin, or flash, or Silverlight, or ActiveX.
4.2 http vs. https
HTTPS (Secure Hypertext Transfer Protocol) It is a secure communication channel that adds a protocol that uses SSL encryption to transmit information based on the HTTP protocol.
HTTPS is based on HTTP development and is used to exchange information between client computers and servers. It uses Secure Sockets Layer (SSL) for information exchange, which simply means that it is a secure version of HTTP.
HTTPS is developed by Netscape and built into its browser to compress and decompress data and return the results that are sent back on the network. HTTPS actually applies the Netscape secure full Socket Layer (SSL) as a sub-layer of the HTTP application layer. (HTTPS uses port 443 instead of using port 80来 and TCP/IP to communicate like HTTP.) SSL uses 40-bit keywords as the RC4 stream encryption algorithm, which is appropriate for the encryption of business information. HTTPS and SSL support use of the digital authentication of the number, and if necessary, the user can confirm who the sender is.
The differences between HTTPS and HTTP are summarized as:
1) The HTTPS protocol requires a certificate to be applied to the CA (ca as a certificate authority), generally with few free certificates and a fee.
2) HTTP is a Hypertext Transfer Protocol, the information is plaintext transmission, HTTPS is a secure SSL encryption Transfer Protocol
3) HTTP and HTTPS use a completely different way of connecting the port is not the same, the former is 80, the latter is 443.
4) The HTTP connection is simple and stateless.
5) HTTPS protocol is built by the SSL+HTTP protocol can be encrypted transmission, identity authentication Network protocol than the HTTP protocol security.
HTTP Related Knowledge Summary