Chapter II, section II, web and HTTP
In this chapter, we need to discuss 5 important applications: Web, File transfer, e-mail, directory services, peer-to-peer, and in this section we will discuss the Web and its Application-layer protocol HTTP.
Outline
- Introduction to the Web
- HTTP Overview
- Non-persistent connections and persistent connections
- HTTP request/corresponding steps
- Non-persistent connection
- Serial TCP connections, parallel TCP connections, and response times for non-persistent connections
- Continuous connection
- HTTP Request Protocol
- Request Information
- Get Request method
- Post Request method
- Response format
- Response Status Code
- Cookies
- Web caching
notes# #Web简介
- WEB, the World Wild Web, invented by Tim Berners-lee, is made up of web pages that support the interconnection of web pages.
- Web pages (Web page) contains multiple objects (Objects), such as HTML files, JPEG pictures, video files, dynamic scripts, and so on, and most Web pages contain an HTML base file that contains links to other object references .
- The object's addressing (adressing) is done through the URL(Uniform resoure Locator) Uniform Resource Locator.
- The format is: Scheme://hostport/path: Http://www.somecompany.com/somePic/pic.png (Http is protocol name , www.somecompany.com is hostname host , somepic/pic.png for pathname path name )
# # HTTP Overview
- The HTTP protocol is an abbreviation for the Hyper Text Transfer Protocol ( Hypertext Transfer Protocol ), which is used to transfer hypertext to a local browser from a World Wide Web server.
- The HTTP protocol defines how Web clients request Web pages from a Web server and how the server routes Web pages to clients
- The HTTP protocol works on the client-server architecture . The browser sends all requests via URLs to the HTTP server, which is the Web servers, as an HTTP client. The Web server sends a response message to the client, based on the received request.
- HTTP is a TCP/IP communication protocol to pass data (HTML files, image files, query results, etc.), after the connection is established, the browser and server process can access TCP through the socket interface;
- HTTP is an object-oriented protocol belonging to the application layer , which is suitable for distributed hypermedia information System because of its simple and fast way.
- HTTP is a stateless protocol where the server sends the requested file to the customer without storing any information about the user
"HTTP Features"
- Simple and fast : When a customer requests a service from the server, it simply transmits the request method and path. The request method commonly has, POST. Each method specifies a different type of contact between the customer and the server. Because the HTTP protocol is simple, the HTTP server's program size is small, so the communication speed is fast.
- Flexible : HTTP allows the transfer of any type of data object. The type being transmitted is marked by Conten-type.
- non-persistent connection : The meaning of no connection is to restrict each connection to only one request , the server finishes processing the customer's request, and then, after receiving the customer's answer, the connection is disconnected. In this way, the transmission time can be saved.
- stateless: Stateless means that the server sends the requested file to the customer without storing any information about the user, without the ability to remember the transaction. A lack of state means that if the previous information is required for subsequent processing, it must be re-transmitted, which may result in an increase in the amount of data sent per connection. On the other hand, when the server does not need the previous information, its response is faster.
# # Non-persistent connection and persistent connection
"Steps for HTTP request/Response"
The HTTP protocol uses the request/response model . The client sends a request message to the server that contains the requested method, URL, protocol version, request header, and request data. The server responds with a status line that includes the version of the Protocol, the success or error code, the server information, the response header, and the response data.
- The client process connects to the Web server: an HTTP client process, usually a browser, that establishes a TCP socket connection with the HTTP port of the Web server (default is 80).
- Send HTTP request message : Through TCP sockets, the client sends a text request message to the Web server, and a request message consists of a request line, a request header, a blank line, and 4 parts of the requested data.
- The server accepts the request and returns an HTTP response : the Web server resolves the request and locates the requested resource. The server writes a copy of the resource to the TCP socket, which is read by the client. A response consists of a status line, a response header, a blank line, and a 4 portion of the response data.
- Release TCP connection: If the connection mode is close, the server actively shuts down the TCP connection, the client passively shuts down the connection, releases the TCP connection, and if the connection mode is keepalive, the connection will remain for a period of time, and the request can continue to be received within that time;
- The client browser parses the HTML content : The client browser parses the status line first to see the status code indicating whether the request was successful. Each response header is then parsed, and the response header informs the following character sets for several bytes of HTML documents and documents. The client browser reads the response data HTML, formats it according to the syntax of the HTML, and displays it in a browser window.
- When the client/server interaction runs on the TCP protocol, each request/response pair of the application is a separate TCP connection, the application uses a non-persistent connection, and when each request/response pair of the application is sent by the same TCP connection, the application uses a persistent connection.
- Http/1.0 uses non-persistent connections. http/1.1 using persistent connections by default
"Non-persistent connection"
- example : In a non-persistent connection, if you open a Web page of an HTML file and 10 inline image objects, HTTP will establish 11 TCP connections to transfer the files from the server to the client.
- Then assume that the URL of the abridged HTML file is: www.yesky.com/sompath/index.html.
- The HTTP client establishes a TCP connection with the HTTP server in the server host www.yesky.com.
- The HTTP client sends an HTTP request message. Contains/sompath/index.html.
- The HTTP server receives the request message from the server host memory or the hard drive to remove the object/sompath/index.html, emitting the response message for that object.
- The HTTP server tells TCP to close the TCP connection (TCP will not actually terminate the connection until the client receives the response message).
- The HTTP client receives the response message. TCP connection terminated. This message indicates that the object being disassembled is an HTML file. The client pulls out the file, and the analysis finds a reference to 10 JPEG objects.
- Repeat step 1~4 for each JPEG object that you refer to.
"Serial TCP connections, parallel TCP connections, and response times for non-persistent connections"
- 10 JPEG objects can be obtained by serial TCP connection; we can also get the object by parallel TCP and shorten the response time.
- Client requests a basic HTML file that takes 2 Rtt (Round trip time) plus a server server
- RTT: The time taken by a short packet to return a customer from the client to the server
- Establishing a TCP connection between the browser and the Web server requires a "three handshake" process (refer to the TCP three handshake details and release the connection process for three handshakes)
- First handshake initiates TCP connection, third handshake request file
- Disadvantages of TCP
- For each connection, TCP allocates TCP buffers on the client and server side, and maintains TCP variables. This can significantly increase the burden on Web servers that also serve requests from hundreds of different customers.
- 2 Rtt per Object
- Each object suffers a TCP slow start because each TCP connection starts at the slow start phase.
"Persistent Connection" (refer to HTTP protocol for persistent connection , non -persistent connection)
- Persistent Connection : The server keeps the TCP connection open after the response is sent. Subsequent requests and response messages between the same client and server are routed through the same connection. You do not need to establish a TCP connection again
- without pipelining (without pipelining): The customer sends a new request only after receiving the response from the previous request.
- 1 RTT delay per referenced object without pipelining compared to 2 RTT delays for non-persistent connections
- with pipelining: HTTP clients send a request immediately after encountering a reference, that is, the HTTP client can send a request to each referenced object next to each other. After the server receives these requests, it is also possible to send responses to individual objects one after another.
- There is less time for server empty requests in a persistent connection with pipelining, and all referenced objects experience only 1 RTT delay.
# # HTTP Request Protocol
HTTP protocol includes: Request message and Response message
"Request Information"
An HTTP request message consists of a request line , a request header (header), a blank line , and 4 parts of the request data , giving the general format of the request message.
- The first line of the HTTP request message is called the request line, and the request line consists of 3 fields of the method field , theURL field , and the HTTP protocol version field , separated by a space.
- The HTTP protocol request method has get, POST, HEAD, PUT, DELETE, OPTIONS, TRACE, CONNECT.
- The subsequent line is called the request header row:
- The request header consists of a keyword/value pair, one pair per line, a keyword and a value separated by a colon ":". The request header notifies the server that there is information about the client request, and the typical request headers are:
- User-agent: The type of browser that generated the request.
- Accept: A list of content types that the client can identify.
- Host: The hostname of the request, which allows multiple domain names to be located in the same IP address as the virtual host.
- Connection: Tell the server if a continuous connection is required
- Blank line: the last request header is followed by a blank line that sends a carriage return and a newline character, notifying the server that the following no longer has a request header.
- Carriage return: CR; line break: lf; Space: SP
"Get Request Method"
The most common kind of request, when the client to read the document from the server, when clicked on a link on the Web page or through the browser's address bar to enter a URL to browse the Web page, the use of the Get method. The Get method requires the server to place the URL-positioned resource in the data portion of the response message, which is sent back to the client. When using the Get method, the request parameter and the corresponding value are appended to the URL, using a question mark ("?" ) represents the end of the URL and the start of the request parameter, which is limited by the length of the pass parameter. For example,/index.jsp?id=100&op=bind, so that data passed by get is directly represented in the address, so we can send the result of the request as a link to the friend. (from: https://www.cnblogs.com/rainydayfmb/p/5319318.html) The following example asks for the request method header to open a photo of Moi in this blog:
GET/images/upup.gif http/1.1Host:Static. cnblogs.comConnection: Keep-Alive if-modified-since: Sun, -Dec . -: -: theGMTUser -agent: mozilla/5.0(Windows NT10.0; Win64; x64) applewebkit/537.36(khtml, like Gecko) chrome/65.0.3325.181safari/537.36 Accept: image/webp,image/apng,image/*,*/*;q=0.8 Referer: https:www.cnblogs.com/bundles/blog-common.css?v=px31qvjoe47mnazi9jusfk-ajuzmnpxa9peternr1qw1Accept - Encoding: gzip, deflate, BRAccept -language:zh-cn,zh;q=0.9, en;q=0.8, zh-tw;q=0.7
- address in the "?" The next part is the request data sent through GET, we can see clearly in the address bar, each data is separated by the "&" symbol. Obviously, this is not a good way to transfer private data. Also, because different browser-to-address character restrictions are also different, generally only up to 1024 characters can be recognized, so if you need to transfer large amounts of data, it is not appropriate to use the Get method. The
- HTTP default request method is get
- No request body
- Data volume is limited!
- get The request data is exposed in the browser's address bar
- Get requests frequently used actions:
- In the address bar of the browser directly given the URL, then it must be a GET request
- Clicking a hyperlink on a page must also be a GET request
- When a form is submitted, the form defaults to a GET request, but can be set to post
- request header:
1. Host: The requested Web server domain name Address 2, the User-agent:http client runs the browser type details. With this header information, the Web server can determine the type of browser for the HTTP request client. 3. Accept: Specify the type of content that the client can receive, and the order of the content type indicates the order that the customer receives 4, Accept-lanuage: Specifies the language that the HTTP client browser uses to display the preferred choice for returning information 5, Accept-encoding: Specifies that the Web server that the client browser can support returns the content compression encoding type. Indicates that the server is allowed to compress the output before it is sent to the client to conserve bandwidth. The return compression format that the client browser can support is set here. 6, accept-charset:http the client browser can accept the character encoding set
7.If-modified-since: Record the last time the webpage was updated to allow the cache to prove its object is not up-to-date 7, Content-type: Displays the content type submitted by this HTTP request. This property is typically only required for post submission. There are two encoding types for the Content-type property value: (1) "application/x-www-form-urlencoded": the type of encoding the form data is submitted to the server, The default defaults are "application/x-www-form-urlencoded". However, this encoding is inefficient when sending large amounts of text to the server, including text or binary data that contains non-ASCII characters. (2) "Multipart/form-data": When the file is uploaded, the type of encoding used should be "Multipart/form-data", which can both send text data and support binary data upload. You can use "application/x-www-form-urlencoded" when committing to form data, and when you commit a file, you need to use the "multipart/form-data" encoding type.
"Post Request Method"
Use the Post method to allow clients to provide more information to the server. The Post method encapsulates the request parameter in the HTTP request data, appears as a name/value, and can transmit a large amount of data so that the post does not have a limit on the size of the data being transmitted, and it is not displayed in the URL. The following example requests a method header when landing a mailbox:
Post/contacts/[email protected] http/1.1host:mail.163.comconnection:keep-alivecontent-length:36origin:https:// mail.163.comuser-agent:mozilla/5.0 (Windows NT 10.0; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/65.0.3325.181 safari/537.36content-type:application/ X-www-form-urlencodedaccept: */*referer:https://mail.163.com/js6/main.jsp?sid=rarahhmczfpasrsjryccjovqsypslzyl &df=mail163_letteraccept-encoding:gzip, deflate, braccept-language:zh-cn,zh;q=0.9,en;q=0.8,zh-tw;q=0.7
"Response format"
In general, the server will return an HTTP response message after receiving and processing a request from the client.
The HTTP response is also made up of four parts: the status line, the message header, the blank line, and the response body.
http/1.1 OK Connection:closeDate:Thu, 1998 12:00:15 GMT server:apache/1.3.0 (Unix) Last-modified:mon, the Ju N 1998 ... content-length:6821 content-type:text/html (data data data ...)
- The first part: The status line, consists of the HTTP protocol version number, the status code, the status message three parts.
- The first behavior status line, (http/1.1) indicates that the HTTP version is 1.1, the status code is 200, and the status message is (OK)
- Part II: Message headers that describe some additional information that the client will use
- Date: The day and time the response was generated;
- Content-type: HTML with MIME type specified (text/html)
- Part Three: blank line, a blank line after the message header is required
- Part IV: Response body, text information returned to the client by the server
"Response Status Code"
Status Code classification:
The status code consists of three digits, and the first number defines the category of the response, divided into five categories: 1xx: Indicates that the request has been received and continues processing 2xx: Success-Indicates that the request has been successfully received, understood, Accept 3xx: Redirect-A further action must be taken to complete the request 4xx: Client Error-Request syntax error or request not implemented 5XX: server-side error-the server failed to implement a legitimate request
Common Status Codes:
$ OK //Client request success The bad Request //client request has a syntax error and cannot be understood by the server 401 Unauthorized //Request Unauthorized, This status code must be used with the Www-authenticate header domain 403 Forbidden //server receives the request, but refuses to provide service 404 not Found//request resource does not exist, eg: wrong URL500 entered Internal Server error //server Unexpected error 503 server unavailable //server is currently unable to process client requests and may recover after a period of time
More Status code links: http://www.runoob.com/http/http-status-codes.html
"The difference between get and post requests" (ref.: 29884033) # # Cookies
- Because the HTTP protocol is stateless, unable to know the customer's information, making some applications difficult to implement, such as online shopping (you need to master the status of the client), so the introduction of cookie technology to solve the problem.
- A cookie is a mechanism for storing data on a remote browser and tracking and identifying the user's identity, that is, a cookie is a small piece of data stored on the client, and the browser (that is, the client) interacts with the cookie over the HTTP protocol and server side.
- Cookies allow a site to track users.
- There are 4 components of cookie technology:
- A cookie header line in an HTTP response message
- A cookie header line in the HTTP request message
- A cookie file is kept in the client system and managed by the user's browser
- A back-end database located on a Web site
Chestnuts:
# # Web Cache
- Web caches (Web cache), also known as proxy servers, are network entities that can represent the initial WEB server to satisfy HTTP requests.
- The WEB cache has its own disk storage and holds a copy of the most recently requested object in the storage space. By configuring the client browser, all HTTP requests to the user first point to the Web cache.
- Proxy server is both a server and a client: when it receives a request from a client and responds, it is a server, and when it makes a request to the initial server and receives a response, it is a customer.
- Reasons for using Web caching:
- The web cache can greatly reduce the response time of client requests;
- The Web cache reduces bandwidth costs by significantly reducing the amount of traffic that an organization can access to the Internet.
- Web caches are playing an increasingly important role in the Internet by using Content distribution networks (distribution NETWORK,CDN).
"Computer network" web and HTTP