12th Chapter HTTP Protocol

Source: Internet
Author: User

12.1 HTTP Protocol Introduction

HTTP (Hypertext Transfer Protocol, Hypertext Transfer Protocol) is one of the most widely used network protocols on the Internet. All WWW documents must comply with this standard. HTTP was originally designed to provide a way to publish and receive HTML pages. 1960 American Ted Nelson conceived a way to process text messages through a computer called hypertext (hypertext). This becomes the foundation of the HTTP Hypertext Transfer Protocol Standard architecture.

Hypertext is text with hyperlinks, which are text that is implemented to jump between documents based on some links.


The HTTP protocol is a stateless Protocol (stateless):

The server is unable to continuously track the source of visitors, in order to solve this problem introduced a cookie and session, to track and save the user's behavior


12.2 HTTP Technology Architecture

HTTP is a standard (TCP) for client and server-side requests and responses. The client is the end user and the server side is the Web site.

By using a Web browser, crawler, or other tool, the client initiates an HTTP request to the specified port on the server (the default port is 80).

This client is referred to as the user agent.

Some resources, such as HTML files and images, are stored on the answering server, which is known as the source server (Origin server).


There may be multiple middle tiers, such as proxies, gateways, or tunnels, between the user agent and the source server. Although the TCP/IP protocol is the most popular application on the Internet, the HTTP protocol does not stipulate that it must be used and based on the layers it supports. In fact, HTTP can be implemented on any other Internet protocol, or on other networks. HTTP only assumes that its underlying protocol provides reliable transmission, and any protocol that provides such assurances can be used by it.


Typically, a request is initiated by an HTTP client to establish a TCP connection to the server-specified port. The HTTP server listens on that port for requests sent by the client. Once the request is received, the server sends a status line back to the client, such as "http/1.1 OK" and the response message, which may be the requested file, error message, or some other information. The reason that HTTP uses TCP instead of UDP is that it requires a lot of data to open a Web page, while the TCP protocol provides transport control that organizes the data sequentially and corrects errors.


Resources requested through HTTP or HTTPS protocols are identified by the Uniform Resource Identifier (Uniform Resource Identifiers).


12.3 HTTP protocol Features

The HTTP protocol is a transport protocol used to transfer hypertext from a WWW server to a local browser. It can make the browser more efficient and reduce the network transmission. It not only ensures that the computer transmits hypertext documents correctly and quickly, but also determines which part of the document is being transmitted and which parts appear first (such as text before the graphic), and so on.


HTTP is the application-layer communication protocol between a client browser or another program and a Web server. The hypertext information is stored on the Web server on the Internet, and the client needs to transmit the hypertext information it wants to access over the HTTP protocol. HTTP contains commands and transmission information that can be used not only for Web access, but also for communication between other Internet/intranet application systems, enabling the integration of hypermedia access for a variety of application resources.


The website address that we enter in the address bar of the browser is called the URL (Uniform Resource Locator, Uniform Resource Locator). Just like every household has a house address, each page has an Internet address. When you enter a URL in the Address box of the browser or click a hyperlink, the URL determines the address of the forest Browse. The browser uses HTTP to extract the Web page code of the site on the Web server and translate it into a beautiful web page.


12.4 HTTP protocol version

Hypertext transfer protocols have evolved in many versions, most of which are backwards compatible. The use of the HTTP version number is described in RFC 2145. The client tells the server at the beginning of the request that it takes the protocol version number, while the latter uses the same or earlier protocol version in the response.

The versions of the HTTP protocol mainly include the following:

http/0.9: The original version, the function is simple. Only get one request method is accepted, no version number is specified in the communication, and the request header is not supported. Because this version does not support the Post method, clients cannot pass too much information to the server.

http/1.0: This is the first HTTP protocol version that specifies the version number in the newsletter and is still widely used, especially in proxy servers. MIME is supported.

http/1.1: Added the cache function, introduced a long connection (default adoption), can work well with the proxy server, support to send multiple requests in a managed manner simultaneously, in order to reduce the line load, improve transmission speed.

http/2.0: Dramatically improves Web performance, reduces network latency, and is typically used for HTTPS


http/1.1 compared to the http/1.0 protocol is mainly reflected in:

A) cache processing

b) Bandwidth optimization and use of network connections

c) Management of error notifications

d) Sending messages over the network

e) Maintenance of the Internet address

f) Safety and integrity


12.5 noun explanations

Html:hypertext Mark Language, Hyper-text markup language


Uri:uniform Resource Indentifier, the Uniform Resource identifier. Used to define a global scope (including but not limited to the Internet) to mark a unique way to locate a resource access path, or a naming method, called a Uniform Resource identifier. The unification here refers to the unification of the path format.


Url:uniform Resource location, Uniform Resource Locator, is a subset of URIs used to describe the unified representation format for Internet resources on the Internet (protocol://host:port/path/to/file)

URL Basic Syntax:

<scheme>://<user>:<password>@

Params: parameters, such as Http://www.idfsoft.com/bbs/index.html;gender=f, where the gender=f is a parameter

Query: The specific behavior that is passed to the relational database page. As HTTP://WWW.IDFSOFT.COM/BBS/ITEM.PHP?USERNAME=TOM&TITLE=ABC, this URL indicates that the entry to be queried is username=name and TITLE=ABC

Frag: Used to define a location in a larger page rather than at the beginning of the page. White point is location anchoring


Urn:uniform Resource naming, Uniform Resource name, also a subset of URIs


Mime:multipurpose Internet Mail Extension, multi-purpose Internet Message extension.

MIME can re-encode non-text data before transmission to text format and then transfer it to the other party, the receiver can revert to the original format in the opposite way, but also can call the appropriate program to open the file


HTTP transactions: The process of a request and response (response) of the HTTP protocol is called an HTTP transaction


Dynamic Web pages: contains static content and dynamic content (dynamic content needs to be executed)

Server-side storage is not an HTML document, but a script developed by the programming language, the script accepts parameters on the server run once, after the completion of the run will generate HTML-formatted documents, and the generated HTML document to the client


Web resource: Web resource.

static files:. Jpg,.gif,.html,.txt,.js,.css,.mp3,.avi

Dynamic files:. php,.jsp


Pv:page View, how many pages are open

Uv:user View, Independent IP volume


12.6 HTTP protocol messages

The HTTP protocol uses the request/response model. The client sends a request to the server that contains the requested method, URL, protocol version, and a mime-like message structure that contains the request modifier, customer information, and content. The server responds with a status line that includes the version of the message protocol, success or error encoding plus the server information, entity meta information, and possible entity content.


The message of the HTTP protocol has 2 kinds of request message and response message, and its syntax style is as follows:

Request Message Syntax:

<method> <request-URL> <version>

Response Message Syntax:

<version> <status> <reason-phrass>

The first line of the message is often referred to as the "start line" of the message, and the contents of the following label format are called the Header field (header field), each header field consists of a name (name) and values (value) separated by commas.

In addition, the response message usually has a body of information called the body, that is, the content of the response to the client.


Methods: The request method, indicating that the client wants the server to perform the actions of the resources, the following are common:

get: Get a resource from the server

HEAD: Gets the document's response header only from the server and does not send the response content. Using head is very efficient when we only need to look at the state of a page

POST: Sends the data to the server to process. Server-side usually by providing a form, the client fills in the data will put the content into the Entity-body submit to the server side

put: Stores the body portion of the request on the server. White point is uploading data

Delete: request to delete the specified document on the server

Trace: Trace request arrives at the Server intermediate proxy server

OPTIONS: Request Server returns the request method used for the specified resource support

Version: Protocol versions of HTTP, formatted as http/<major>.<minor>

Status: The response state code, which marks what happens during request processing, has the following common response status codes:

1XX:100-101, plain information tips

100: The server only receives partial requests, but once the server does not reject the request, the client should continue to send the remaining requests with the response status code "Continue"

101: Server conversion protocol, the server will follow the customer's request to convert to another protocol, the response status code "switching protocols"

2xx:200-206, "Success" class information

200: The request resource is OK. All requested data is sent via the entity-body portion of the response message, and the response status code is "OK"

201: The request is created, the new resource is created, and the response status code is "Created"

202: The request for processing has been accepted, but the processing is not completed and the response status code is "Accepted"

203: The document has returned normally, but some of the answer headers may be incorrect because a copy of the document is being used, and the response status code is "non-authoritative information"

204: No new documents. The browser should continue to display the original document. Response status code is "No Content"

205: No new documents. But the browser should reset what it displays. Used to force the browser to clear form input, response status code "Reset content"

206: The customer sent a GET request with a range header, and the server completed it

3xx:300-305, "redirect" Class information

301: Permanent Redirect, response status code "Moved permanently"

The requested URL points to a resource that has been deleted, but in the response message it indicates the new location where the resource is now, and the client needs to request a resource for the new location

302: Temporary Redirect, I am busy here, you want resources in another place also have, you go there first to, the response status code is "Found"

Similar to 301, but indicates in the response message that the resource is now in a temporary new location

304: The client issued a conditional request, but the server-side discovery client requested the resource has been cached by the client and has not changed, let the client directly into the cache to fetch. Response status code is "not Modified"

4xx:400-415, "Client Error" class information

400: Due to a syntax error in client request, cannot be understood by the server, the response status code is "Bad Request"

401: Need to enter account and password authentication to access resources, response status code "unauthorized"

403: Request is forbidden, response status code is "Forbidden"

404: The server cannot find the resource requested by the client, and the response status code is "not Found"

5xx:500-505, "server-side Error" Class information

500: Server internal error, response status code "Internal Server Error"

502: The proxy server received a pseudo-response from the backend server with a response status code of "Bad Gateway"

503: The server is currently not able to process client requests, after a period of time, the response status code is "Service"

Reason-phrass: Explain status State code situation, you succeeded, what succeeded, you failed, what failed, is to get file success/failure or upload file success/failure and so on.

Headers: A property used to mark a request or response

Each request or response message may contain an arbitrary header;

Each header has a header name, followed by a colon, followed by an optional space, followed by a value


Format: Name:value


classification of the header :

Generic header : Available in Request messages and response messages, the common content is as follows:

Date: The time the message was created

Connection: Connection status, such as Keep-alive,close, etc.

Via: Displays the middle node through which the message passed

Cache-control: The effective method and mechanism of controlling cache

Request Header : can only be used in request messages, the common content is as follows:

Accept: Notifies the server that the client can accept media types

Accept-charset: Notifies the server that the client can accept a character set

Accept-encoding: Notifies the server that the client can accept content encoding formats, such as Gzip

Accept-language: Notifies the server that the client can accept the language

CLIENT-IP: IP for Client

Host: The requested server name and port number

Referer: Contains the top level resource for the resource that is currently being requested

User-agent: Client Agent


Conditional Request Header :

Expect: What information is expected from the server

If-modified-since: Whether the requested resource has been modified since the time specified here

If-unmodified-since: Whether the requested resource has not been modified since the time specified here

If-none-match: The ETag label of the document stored in the local cache does not match the etag of the server document

If-match: Whether the etag of the document stored in the local cache matches the etag of the server document

Security Request Header :

Authorization: Send authentication information to the server, such as account number and password

COOKIE/COOKIE2: The client sends a cookie to the server

Proxy Request Header:

Proxy-authorization: Authenticating to a proxy server

response Header : Can only be used in response messages

Informational Nature:

Age: Duration of response

Server: Program software name and version

Negotiation header: Used when a resource has multiple representation methods

Accept-ranges: The type of request scope that the server can accept

Vary: Other header lists viewed by the server

Security Response Header:

Set-cookie: Setting Cookies to clients

Set-cookie2: Setting Cookie2 to Client

Www-authenticate: The client-side Challenge authentication form from the server

Entity Header : Identifying information about an entity

Allow: Lists the request methods that can be used for this entity

Location: Tell the client where the real entity is located

Content-encoding: Encoding format for content

Content-language: Language used for content

Content-length: Length of the body

Content-location: Where the entity is really located

Content-type: Object type of the principal

Cache correlation:

ETag: Extended label for entity

Expires: The expiration time of the entity

Last-modified: Time of last modification

Extension Header

entity-body: appended data or response when requested, may be empty


Sample Request message:

Get/http/1.1host:www.baidu.comconnection:keep-alive

Examples of response messages:

http/1.1 okx-powered-by:php/5.2.17vary:accept-encoding,cookie,user-agentcache-control:max-age=3, must-revalidatecontent-encoding:gzipcontent-length:6931


12.7 http Perimeter

Common protocols for viewing and analyzing tools:

Tcpdump

Tshark

Wireshark


Common HTTP Server programs:

HTTPD (Apache)

Nginx

Lighttpd

Application server: can handle dynamic files

Iis

Tomcat,jetty,jboss,resin

Webshpere,weblogic,oc4j


Common HTTP Stress Testing tools:

Ab:

Syntax: AB [options] URL

-N: Total number of requests

-C: Simulated concurrency number

-K: Tested in persistent connection mode

Webbench

Http_load

Jmeter

LoadRunner

Tcpcopy


Ulimit-n #: Adjust the number of files that the current user can open simultaneously


Web Server Resource Path mapping method:

Docroot

Alias

Virtual Host Docroot

User home Directory Docroot


Concurrent access Response Model (WEB I/O): This assumes that there is only one thread in each process

Single-process I/O structure: Initiates a process processing request, and processes only one at a time, multiple requests are serially responded

Multi-process I/O Fabric: Start multiple processes in parallel, each responding to a request

Multiplexing I/O Architecture: One process responds to multiple requests

Multithreaded model: One process generates multiple threads, and each thread responds to a user request

Event-driven

Multiplexed multi-process I/O Fabric: Starts multiple (m) processes, each responding to n requests


12.8 HTTPS

HTTPS is actually the result of applying SSL or TLS to the HTTP protocol, and HTTPS is listening on the tcp/443 port


The simplified process for SSL sessions is as follows:

(1) The client sends an alternative encryption method and requests a certificate from the server

(2) The server-side sends the certificate and the selected encryption method to the client

(3) The client obtains the certificate and verifies the certificate

If you trust the CA to which the certificate is issued:

A) Verify the legitimacy of the certificate source: Decrypt the digital signature on the certificate with the CA's public key

b) Verify the legality of the contents of the Certificate: Integrity verification

c) Check the validity period of the certificate

d) Check if the certificate has been revoked

e) The name of the owner of the certificate, consistent with the target host being accessed

(4) The client generates a temporary session key (symmetric key) and uses the server-side public key to encrypt this data to send to the server, completing the key exchange

(5) The server uses the key to encrypt the resource requested by the user and responds to the client

Note: SSL sessions are created based on IP address, so only one HTTPS virtual host can be used on a single IP host


Main operations of the Web server:

Establish connection--accept or reject client connection requests;

Receive requests-read HTTP request messages over the network;

Processing the request--parsing the request message and making the corresponding action;

Access to resources-the corresponding resources in the access request message;

Build response-Generates an HTTP response message using the correct header;

Send response-sends a generated response message to the client;

Logging--When a completed HTTP transaction is logged into the log file

This article is from the "Home" blog, please make sure to keep this source http://itchentao.blog.51cto.com/5168625/1931364

12th Chapter HTTP Protocol

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.