The HTTP protocol that I understand

Source: Internet
Author: User
Tags hmac session id unsupported http 2 ssl connection asymmetric encryption cipher suite

Objective

For the HTTP protocol, presumably everyone is not unfamiliar, often used in the work, especially for mobile and front-end developers, to obtain service-side data, basic go network requests are based on the HTTP protocol, especially restful + JSON this collocation especially mainstream. That if let everyone specifically talk about the HTTP protocol behind the history, principle, interaction process, and HTTPS differences, identity authentication, Web defense technology and other information, can you speak out, anyway, I said is smattering, although will often see this aspect of the article, But also only in the specific project development process encountered on a concept is not clear, will go deliberately to see, but did not deliberately summed up as always knowledge points, no complete expression described, in fact, this knowledge point is still not mastered good, so use the way of writing to explain is a good way, is currently practicing.

Mind Mapping

Before writing, this article mainly want to talk about the content in the following picture, by doing a mind mapping way to express an article content, I think the logic will be particularly clear, but also a knowledge point will be good to summarize.

HTTP history

Origin of development

In the March 1989, the Internet was only a minority. At the dawn of this internet, HTTP was born. Tim of CERN (European Organization for Nuclear Research)? Dr Berners-Lee (Tim Bernerslee) proposed a vision that would allow researchers to share knowledge in distant places. The basic idea originally conceived was the use of hypertext (hypertext), which is interconnected with multiple documents, to connect to a mutually-accessible WWW (World Wide Web). and the version from HTTP 1.0 to HTTP 1.1 to now HTTP 2.0, the current mainstream version is still based on the HTTP 1.1,http protocol is also currently the most widely used Internet protocol, all WWW files must adhere to this standard, design HTTP initial The goal is to provide a way to publish and receive HTML pages.

Tcp / ip

We all know that the HTTP protocol is based on the TCP/IP network hierarchy, it belongs to the application layer, TCP/IP network layer has a total of 5 layers, it belongs to the top layer, it is the next tier is the TCP/IP transport layer,:

From the logical parallel point of view, both the sender and the receiver are in the same parallel layer, the sender of each layer of information will be encapsulated in the next layer of information encapsulation encryption, and then pass by layer, through the actual physical link transmission, and then receive the information to decrypt the analysis, and constantly restore the message header information, Finally processing send imagining sent over the information, after processing, and then passed back in the same way, both transmission communication mode is full duplex mode. Prior to this need a connection process, so-called three times handshake, the end of the communication also has a disconnection process, that is, four handshake disconnection operation.

In the description of why the HTTP protocol understand TCP/IP content, because we need to know what the HTTP protocol actual communication process is, it depends on the environment is how, from the angle of view, actually experienced this 5 layer of communication, from the plane to see, by default, the client and the service side only to the level of communication only , that's because the package is convenient.

HTTP 1.1

Because the current mainstream in the use of HTTP version 1.1 is the main, then use this version to analyze.

An explanation of HTTP request protocol

A typical HTTP1.1 request protocol message structure can be divided into three blocks, that is, the request line, the head, the message body.

Request Line

The request line contains the HTTP request method, the requested URL, the HTTP protocol version three content, between them with a space interval, and with a carriage return + line break end. The HTTP request method has the following several, commonly have get, POST request.

    • OPTIONS
    • GET
    • HEAD
    • POST
    • DELETE
    • TRACE
    • CONNECT

Request Header

The head can be divided into three parts, the common head domain, the request header domain, the entity header domain. The common header and entity header domain parts also have the same definition in the Response Protocol section.

Common header fields

Common header field names function Description
Cache-control Cache control
Connection HTTP 1.1 is supported by default for long connections (keep-alive), and if you do not want to support long connections you need to write close in this domain
Date Indicates the date and time the message was generated
Pragma
Trailer
Transfer-encoding Inform the receiving end to ensure the reliable transmission of the message, the use of the message encoding method
Upgrade gives the new version or protocol that the sending side might want to "upgrade"
Via Shows the intermediary node (proxy, Gateway) of the message passing through
Warning

Request Header Field

Request Header Field name function Description
Accept Indicates the type of media that the requester can accept processing
Accept-charset Indicates which character sets are acceptable on the requester side
Accept-encoding Indicates which encoding format is acceptable on the requester side
Authorization Authorized
Expect Allow clients to list the server behavior required by a request
From Provides the e-mail address of the client user
Host Indicates the network host and port number on the requester side
If-match The server returns the ETag information in the response header, the client requests to add If-match in the header (the value of the response ETag), the service side to determine whether the etag is the same, if the same processing requests, otherwise, the request is not processed.
If-modified-since When the client requests a resource file, the header plus If-modified-since (value is the last modification time of the resource file), after the server receives the client to escalate the modification time and the file that is stored by the servers to compare the last modification time, if the same, the resource file is not updated, Returns a 304 status code that tells the client to use the original cache file. Otherwise, the resource content is returned.
If-none-match The service side returns the ETag information in the response header, the client requests the header to add If-none-match (the value is the response of the ETag), the service side to determine whether the etag is the same, if the same, indicating that the resource is not updated, return 304 status code, tell the client to use the original cache file. Otherwise, the resource content is returned.
If-range The header field is used with the Range header field, the server returns the ETag information in the response header, the client requests the header to add a If-range (value in response to the ETag), after the server receives the receiver to determine if the etag is the same, if the same, the return status Code 206, Returns the range of bytes specified for the range. If not, the status code 200 is returned, and the content is returned as an entire entity.
If-unmodified-since When a client requests a resource file, the header is prefixed with if-modified-since (the value is the last modified time of the resource file), and the client escalated the modification time against the server store's last modified time after the end is received, and if the same, returns the contents of the resource. If it is not the same, the status code 412 is returned.
Max-forwards Use with the trace, options method to limit the number of proxies or gateways on the path to the server.
Proxy-authorization Agent Authorization
Range Represents the number of bytes that the client requests to the server for a specified range: range:bytes=0-500 represents the number of bytes requested from 1th through No. 501. The range:bytes=100-represents the number of bytes requested from 101th to the first byte of the file count. RANGE:BYTES=-500 represents the number of the last 500 bytes requested. Range can specify multiple groups (range:bytes=500-600,601-999) at the same time. Not all server support byte range request, if the support byte range request, the server will return status Code 206, if not supported will return 200, the client needs to determine whether the service side supports byte range operation according to the status code. This field can be used for breakpoint downloads, that is, to request the following content at a breakpoint, or to download the same file for multithreading, where each thread is responsible for downloading a portion of a file, and multiple threads work together to complete the download of the entire file.
Referer Used to specify the source of client requests, from a search engine? or from other web links? Depending on the domain, the server can sometimes be used as an anti-theft chain, not in the specified range of sources, all rejected.
TE Indicates which transport encodings the client can accept.

Entity header Field

Entity header field name function Description
Allow Indicates the methods supported by the requested resource, such as, PUT
Content-encoding Specifies the encoding used for entity content
Content-language Specify the language in which the entity content is used
Content-length Indicates the number of bytes in the request entity
Content-location Locations that can be used to provide the corresponding resources for the entity
Content-md5 Specifies the MD5 of the entity content for the integrity check of the content (Base64 128-bit MD5)
Content-range
Content-type Specify the media type of the entity
Expires Indicate when an entity expires
Last-modified Indicates when the entity was last modified
HTTP response Protocol Detailed

HTTP1.1 's response protocol message structure can be divided into three blocks, that is, the state line, the head, the message body.

Status line

The status line contains the HTTP protocol version, the status code, and the reason phrase three content, which is separated by a space and ends with a carriage return + line break.

The status code consists of three digits, the first digit defines the response type, and there are five types of status codes

Status Code type function Description
1xx Report (Request received, continue processing)
2xx Successful (request is successfully received and processed)
3xx Re-hair
4xx Client error (protocol format for client errors and requests that cannot be processed)
5xx Server error (the server cannot complete a valid request processing)

The status code and the corresponding reason phrase are described in detail

Status Code reason Phrases Chinese Description
100 Continue Go on
101 Switching protocols Switching protocols
200 Ok Success
201 Created has been created
202 Accepted Accept
203 Non-authoritative Information Non-authoritative information
204 No Content No content
205 Reset Content Reset Content
206 Partial Content Part of the content
300 Multiple Choices Multiple selections
301 Moved Permanently Permanently moving
60W Found Found
303 See other See other
304 Not Modified No change.
305 Use Proxy Using proxies
60R Temporary Redirect Temporary re-hair
400 Bad Request Bad Request
60s Unauthorized Unauthorized
402 Payment Required Required Payment
403 Forbidden Disable
404 Not Found Not found
405 Method not allowed method is not allowed
50W Not acceptable Not acceptable.
407 Proxy Authentication Required Proxy Authentication Required
408 Request Timeout Request timed out
409 Confilict Conflict
410 Gone does not exist
411 Length Required Length Required
412 Precondition Failed Prerequisite failure
413 Request Entity Too Large Request entity too large
414 Request-uri Too Long Request URI too long
415 Unsupported Media Type Unsupported media types
416 Requested Range not satisfiable Request scope is not satisfied
417 Expectation Failed Expected failure
500 Internal Server Error Internal server Error
501 Not implemented Service side not implemented
502 Bad Gateway Bad Gateway
503 Service unavailable Service is not available
504 Gateway Timeout Gateway Timeout
505 HTTP Version not supported HTTP protocol version does not support

Response Header Field

Response Header Field name function Description
Accept-ranges The server indicates to the client the server's acceptance of the range request
Age Estimated time (in seconds, non-negative) from the original server to the proxy cache
ETag Entity labels
Location Specifies the URI of the redirect
Proxy-autenticate It indicates the authentication scheme and the parameters on the URL that can be applied to the proxy
Retry-after Notifies the client to try again after a specified time if the entity is temporarily undesirable
Server Indicates the software information that the server uses to process the request
Vary Tells the downstream agent whether to use the cache response or request from the original server
Www-authenticate Indicates the authorization scheme that the client request entity should use

Interactive process

The overall communication is actually the Send/response process, a request in the past, the other party has response content to return, request to send and respond to the way, while the characteristics of HTTP 1.1 is stateless, fast response, a connection is immediately disconnected. HTTP 2.0 is the opposite, perfect the HTTP 1.1 problem, the two connections are reusable, can support parallel send, one-time multiple file delivery, multiple file response, supporting the delivery of file size in binary way, so as to ensure that larger files can be supported, more powerful security than HTTP 1.1, Specific details can be found in the relevant documentation.

URLs and URIs

It is necessary to mention the difference between the two nouns of the URL and the URI. The URL indicates that a www internet resource (tagged with an address) is marked, and his access address is given. And the URI represents a network resource, that's all.

HTTPS

Communication process

Specific steps:

Step 1: The client begins the SSL communication by sending a customer Hello message. The message contains the specified version of SSL supported by the client, the encryption component (Cipher Suite) list (the encryption algorithm used and the key length, etc.).

Step 2: When the server is able to make SSL communication, it responds with the server Hello message. As with the client, the SSL version and the cryptographic components are included in the message. The contents of the encrypted component of the server are filtered from within the received client encryption component.

Step 3: After the server sends the CERTIFICATE message. The message contains a public key certificate.

Step 4: The last server sends the server Hello done message to notify the client that the initial phase of the SSL Handshake negotiation Section ends.

Step 5:ssl After the first handshake is over, the client responds with the customer Key Exchange message. The message contains a random cipher string called Pre-master secret used in communication encryption. The message has been encrypted with the public key in step 3.

Step 6: Then the client continues to send the change Cipher Spec message. The message will prompt the server and the communication after this message will be encrypted with the Pre-master secret key.

Step 7: Send the finished message to the client. This message contains the overall checksum value of all messages connected to date. Whether the handshake negotiation can be successful, the server can correctly decrypt the message as a criterion.

Step 8: The server also sends the change Cipher Spec message.

Step 9: The server also sends finished messages.

Step 10: After the finished message exchange between the server and the client is complete, the SSL connection is established. Of course, communication is protected by SSL. This is where the application layer protocol communication begins, sending an HTTP request.

Step 11: Apply the layer protocol communication, that is, send the HTTP response.

Step 12: Finally the client disconnects. When disconnecting, send close_ notify messages. Do some ellipsis, this step then send the TCP FIN message to close the communication with TCP.

Encryption algorithm

The common encryption algorithms can be divided into three kinds, symmetric encryption algorithm, asymmetric encryption algorithm and hash algorithm.

Symmetric encryption

Encrypt and decrypt encryption algorithms that use the same key. The advantages of symmetric encryption algorithms are the high speed of decryption and the difficulty of cracking when using long keys. Assuming that two users need to encrypt and then exchange data using a symmetric encryption method, the user needs at least 2 keys and is exchanged, and if there are N in the enterprise, the entire enterprise needs NX (n-1) keys, and the generation and distribution of the keys will become the nightmare of the Enterprise Information Department. The security of the symmetric encryption algorithm depends on the storage of the encryption key, but it is impossible for everyone in the enterprise to keep a secret, and they usually leak the key out-if a user uses a key that is obtained by the intruder, the intruder can read all the documents encrypted by the user key. If a single encryption key is shared across the enterprise, the confidentiality of the entire enterprise document will not be discussed.

a common symmetric encryption algorithm: DES, 3DES, DESX, Blowfish, Idea, RC4, RC5, RC6, and AES

Asymmetric encryption

Encrypt and decrypt encryption algorithms that use different keys, also known as Public private key encryption. Assuming that two users want to encrypt the exchange of data, the two sides exchange the public key, using the other side of the public key encryption, the other side can be decrypted with their own private key. If there are n users in an enterprise, the enterprise needs to generate N-pair keys and distribute n public keys. Because the public key can be exposed, the user simply takes care of their private key, so the distribution of the encryption key becomes very simple. Also, because each user's private key is unique, other users can verify that the source of the information is true, in addition to the sender's public key, and that the sender cannot deny that the message was sent. The disadvantage of asymmetric encryption is that the decryption speed is much slower than symmetric encryption, and in some extreme cases, it can be even 1000 times times slower than asymmetric encryption.

Common Asymmetric encryption algorithms: RSA, ECC (for mobile devices), Diffie-hellman, El Gamal, DSA (for digital signatures)

Hash algorithm

The special place of hash algorithm is that it is a one-way algorithm, the user can use the hash algorithm to generate a specific length of the unique hash value of the target information, but not through this hash value to regain the target information. Therefore, the hash algorithm is commonly used in non-reversible password storage, information integrity check and so on.

A common hash algorithm: MD2, MD4, MD5, HAVAL, SHA, SHA-1, HMAC, HMAC-MD5, HMAC-SHA1

Digital certificates and digital signature certificates

A digital certificate is a certificate issued by an authoritative CA that cannot be forged to verify the identity of the sender entity. To solve the above problem, only send party A to find an authoritative CA agency to apply for the issuance of digital certificates, the certificate contains a of the relevant information and a public key, and then the body a, digital certificate and a generated digital signature sent to B, when the man-in-the-middle m is unable to tamper with the body content and forwarded to B, Because M cannot have the private key of this CA, it cannot randomly produce a digital certificate. Of course, if M also applies for a digital certificate of the same CA and replaces the sending of the modified body, M's digital certificate and M's digital signature, when B receives the data, it verifies that the information in the digital certificate m is consistent with the current communication party, and discovers that the personal information in the digital certificate is m not a, which indicates the risk of replacement You can choose to interrupt communication.

Why can't a CA-made certificate be forged? In fact, the CA-made digital certificate also contains the CA's digital signature of the certificate, the receiver can use the public key of the CA to decrypt the digital signature, and use the same digest algorithm to verify that the current digital certificate is legitimate. Making a certificate requires the private key of the corresponding CA authority, so the certificate issued by the CA cannot be illegally forged (the CA's private key disclosure is not considered in the discussion and consideration).

The basis of digital certificate signature is asymmetric encryption algorithm and digital signature, which can not be forged so that its application surface is wide, and the digital certificate is used in HTTPS to guarantee the reliability of the public key transmitted by the service side of the handshake stage.

Digital signature is an application of asymmetric encryption algorithm and digest algorithm, which can ensure that the information can not be tampered with during transmission, and that the data cannot be falsified. When used, the sender uses a digest algorithm to obtain a summary of the publication, and then encrypts the digest using the private key (the encrypted data is a digital signature), and then sends the publication, digital signature, and public key to the receiver. When the receiver receives the content, it prefers to take out the public key to decrypt the digital signature, get the digest data of the body, and then use the same digest algorithm to calculate the summary data, compare the summary of the calculation with the decrypted digest, and if it is consistent, the publication has not been tampered with.

Identity verification

The computer itself cannot determine the identity of the user sitting in front of the monitor. Further, there is no way to confirm who the head of the network is. It can be seen that in order to find out who is accessing the server, you have to let the other client tell. For example, even if the person who is accessing the server claims to be xiaoming, the identity is true. In order to confirm whether Xiaoming really has access to the system, it is necessary to check the "information that the user knows" and "the information that the user will have." Therefore, the following types of validations are required.

    • Basic Authentication: Basic Authentication is a very simple authentication method in HTTP because it is simple, so it is not very secure, but it is still very common. When a client makes a data request to an HTTP server that requires authentication, if it has not previously been authenticated, the HTTP server returns a 401 status code asking the client to enter a user name and password. After the user enters the user name and password, the user name and password are appended to the request information by BASE64 encryption to request the HTTP server again, the HTTP server will determine whether the authentication is successful and respond accordingly according to the authentication information carried by the request header.
    • Digest Certification: Digest certification is designed to address many of the flaws in Basic certification, and user passwords are a key element throughout the certification process.
    • SSL Client Authentication: from the use of user ID and password authentication method, as long as the contents of the correct, can be authenticated is my behavior. However, if the user ID and password are stolen, it is likely that a third party is impersonating. With SSL client authentication, you can avoid this situation. SSL Client authentication is the way that authentication is done by the client certificate of HTTPS. With client certificate authentication, the server can confirm that access is from a logged-on client.
Web attack and defense technology

What are the common Web attack technologies, as follows:

1,XSS Cross-site attack technology: The main reason for attackers to embed malicious script in the Web page, or to change the HTML element attributes to achieve the attack, mainly because the developer of the user's variables directly into the HTML will be directly compiled into the JS, the usual GET request through the URL to pass the parameter, you can be in the URL To get information, workaround: special character filtering.

2,sql injection attacks: mainly by inserting SQL commands into a Web form to submit or entering a query string for a domain name or page request, eventually reaching the rogue server to execute a malicious SQL command, such as a select * from Test where username = "Wuxu" or 1=1, this will allow the user to skip the password directly login, the specific solution:

    • Special character filtering, do not use stitching string method to gather SQL statements.
    • Pre-compiling SQL statements, such as Java's PreparedStatement.
    • When the error message is turned off, an attacker may be trying to get some information from the database, so closing the error message becomes important.
    • The client encrypts the data so that the previously passed parameters are filtered out because of encryption.
    • Control the permissions of the database, such as select only, cannot insert, prevent the attacker from using select * from test; drop tables this operation.

3,os Command Injection attack: The system provides command execution class functions to facilitate the function of the relevant application scenarios. And when unreasonable use of such a function, the simultaneous invocation of a variable without regard to security factors, will execute a malicious command call, exploited by the attack. The main reason is that the server invokes the system command in the way of string connections, such as a= "A.txt;rm-rf *", System ("RM-RF {$a}"), which will bring a painful cost to the service, the specific solution:

    • Less system commands are used in the development of the program, and the parameters for executing the command should not be obtained externally.
    • Parameter special character filtering

4,http Header Injection Attack

5, message header injection attack: It allows a malicious attacker to inject any message header field, BCC, CC, subject, etc., which allows hackers to inject spam from the victim's mail server. The main problem is to use the mail system to pass the bug to attack, resolution: 1, using regular expressions to filter data submitted by users. For example, we can search in the input string (R or N). 2, never trust the user's input. 3. Use external builds and libraries

6, Directory Traversal attack: Directory traversal is a security vulnerability in HTTP that allows an attacker to access a restricted directory and execute commands outside the root directory of the Web server.

7, the remote directory contains the attack, the principle is to inject a user can control the script or code, and let the server execution. For example, in PHP include ($filename), and this filename is passed in by the user, the user can pass a malicious script, thereby causing harm to the service, the solution: When the file contains a function, should not be introduced dynamically, and should have a specific file name, if the dynamic incoming, To ensure that dynamic variables are not controlled by the user

8, Session hijacking: This is an attack method that uses the session ID to log in to the target account by acquiring the user session ID, at which point the attacker is actually using the valid session of the target account. The first step of session hijacking is to obtain a legitimate session ID to disguise as a legitimate user, so it is necessary to ensure that the session identity is not leaked, the popular point is that the user at the time of login, the only user identity of the session ID is hijacked, so that the attacker can use this session ID to log on after the operation, The attackers are mainly obtained by stealing: Using network sniffing, XSS attacks, and other methods. And the first way of network sniffing, we can encrypt the message through SSL encryption, that is, HTTPS to prevent the message is intercepted, and the second way XSS attack, in the first kind has been given, no longer repeat. Also by setting the HttpOnly. By setting the HttpOnly of the cookie to true, you can prevent client script from accessing this cookie, effectively preventing XSS attacks, and setting token authentication. Close the transparent session ID. Transparent session ID refers to the session ID is passed using a URL when the Http request in the browser does not use a Cookie to hold the session ID.

9, Session fixed: Session pinning is a kind of session hijacking, the difference is that the session is fixed is the attacker through some means to reset the target user's SessionID, and then listen to the user session state, the user carries SessionID to log in, the attacker obtains SessionID to conduct the session, Solution: Server Set the user login after the SessionID is not the same as before logon, and the method of session hijacking can also be used in session fixed

10,CSRF Cross-site forgery request attack: In fact, the attacker stole your identity and sent a malicious request in your name.

In general, through the output of such an article, their own HTTP protocol has a further understanding, but also through the writing process to let oneself to a certain knowledge point has a good association and series, accumulate from the point, and then form the surface, and finally there will be a knowledge tree grow up.

The HTTP protocol that I understand

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.