simple understanding of HTTP Basics
Before understanding the HTTP protocol, we first understand the TCP/IP reference Model, the TCP/IP reference model is divided into four layers: the application layer, the transport layer, the network layer, the link layer (data link layer).
Application tier: Provides the services needed for different Web applications.
Transport Layer: Provides end-to-end communication/transport capabilities for application layer entities to ensure sequential delivery of packets and data integrity.
The network layer: handles packets flowing over the network, which contain protocols that involve the logical transmission of packets over the entire network.
Link Layer: Monitors the data exchange and handles the hardware portion of the network connection.
The TCP/IP communication transport stream is as follows:
HTTP encapsulation processing at each layer:
Protocols/services closely related to the HTTP protocol: Ip,tcp,dns
IP protocol is responsible for the transmission of packets, of course, this need to cooperate with the IP address and MAC address, the communication between IP depends on the MAC address, which involves the ARP protocol to resolve the address.
TCP provides a reliable byte stream service that splits large chunks of data to be sent into small packets for easy transmission, and the protocol confirms that packets are delivered to the destination.
DNS service is responsible for resolving domain names
URI (Uniform Resource Identifier) and URL (Uniform Resource Locator)
URI: A string that identifies the name of an Internet resource. Composition: Host name (with port number) + relative path + identifier
URL: A concise representation of the location and access methods of resources available from the Internet, which is the address of standard resources on the Internet. Composition: protocol + host name (with port number) + relative path
Difference: The URI represents the location where the requested resource exists on the Internet, and the URL is a subset of the URI that shows how to access the resource at the same time as the location of the requested resource. Reference resources: Differences between URLs and URIs
Cookies
The HTTP protocol is used for communication between the client and the server through a request-and-response exchange, and it is a stateless protocol that does not save the communication state between the request and the response (the request cannot be processed based on the previous request), but in order to be able to have a saved state function, Introduced the technology of cookies.
Persistent connections
HTTP initial version, each time the HTTP request will be disconnected a TCP connection, this situation in the early transmission of text is very small, but also do not feel how, but with the progress of the times, the need to transfer more and more content, and the content is getting bigger, Disconnecting requests after each connection greatly increases the overhead of the traffic. Fortunately, since http/1.1 and part http/1.0, with a long-lasting connection such a magical thing, it stipulates that as long as either party does not explicitly make a disconnection, then maintain the TCP connection state. During a sustained TCP connection, HTTP requests can be made multiple times to transmit the required content.
http/1.1 maintains a persistent connection by default, there is a connection:keep-alive attribute in the header information of the HTTP, and we can also view the status of this property and HTTP request information through the network panel of the browser development tool:
How to turn off persistent connections : Set the Connection property to close on the response header.
Thanks to the persistent connection, HTTP is pipelined, allowing multiple requests to be sent in parallel simultaneously without waiting for a response one after the other.
Content structure of the HTTP request
The HTTP protocol interacts with information called HTTP messages, and the structure of the HTTP message is shown in the following figure:
Except for the empty line (carriage return, line break), it is roughly divided into the message header and the message body. The header contains the request line (method of request, URI, HTTP version) and status line (response status Code, reason phrase, HTTP version), header field (request and response conditions and attributes), other (undefined header).
Header field
The first field specifies how the client handles the request and how the server handles the response, which can be divided into four types: the request header (the header of the request message), the response header (the header used for the response message), the general header (the header for the request and response), and the entity header (the header used by the Message entity section).
http/1.1 header Field List
Generic header Field
12345678910 |
首部字段名 说明
Cache-Control 控制缓存的行为
Connection 逐跳首部、连接的管理
Date 创建报文的日期时间
Pragma 报文指令
Trailer 报文末端的首部一览
Transfer-Encoding 指定报文主体的传输编码方式
Upgrade 升级为其他协议
Via 代理服务器的相关信息
Warning 错误通知
|
Request Header Field
1234567891011121314151617181920 |
首部字段名 说明
Accept 用户代理可处理的媒体类型
Accept-Charset 优先的字符集
Accept-Encoding 优先的内容编码
Accept-Language 优先的语言(自然语言)
Authorization Web认证信息
Expect 期待服务器的特定行为
From 用户的电子邮箱地址
Host 请求资源所在服务器
if
-Match 比较实体标记(ETag)
if
-Modified-Since 比较资源的更新时间
if
-None-Match 比较实体标记(与
if
-Match相反)
if
-Range 资源未更新时发送实体Byte的范围请求
if
-Unmodified-Since 比较资源的更新时间(与
if
-Modified-Since相反)
Max-Forwards 最大传输逐跳数
Proxy-Authorization 代理服务器要求客户端的认证信息
Range 实体的字节范围请求
Referer 对请求中URI的原始获取方法
TE 传输编码的优先级
User-Agent HTTP客户端程序的信息
|
Response Header Field
12345678910 |
首部字段名 说明
Accept-Ranges 是否接受字节范围请求
Age 推算资源创建经过时间
ETag 资源的匹配信息
Location 令客户端重定向至指定的URI
Proxy-Authenticate 代理服务器对客户端的认证信息
Reter-After 对再次发起请求的时机要求
Server HTTP服务器的安装信息
Vary 代理服务器缓存的管理信息
WWW-Authenticate 服务器对客户端的认证信息
|
Entity header Field
+ View Code
In addition, some of the header fields defined in other RFCs, such as cookies, Set-cookie, and content-disposition, are also often used.
Transfer encoding
When HTTP transmits data, it can transmit the original data or encode it during transmission to increase the transfer rate. Through the transmission of the encoding processing, can effectively handle a large number of access requests. The common content encoding has the following several
· Gzip (GUN Zip)
· Compress (standard compression of UNIX systems)
· Deflate (zlib)
· Identity (not coded)
Multi-Part object collection
The multi-part object collection is adopted in the HTTP protocol, allowing multiple types of entities to be contained within the sending message body. used when uploading files or images, you can set the Content-type property to specify them. Several common forms are as follows:
Text: Used to standardize the presentation of textual information, text messages can be in multiple character sets and or multiple formats
Multipart: Multiple parts used to connect the body of a message form a message that can be of different types of data
Application: Used to transfer application data or binary data
Range request
Implementing this feature requires specifying the scope of the downloaded entity, such as: A 1000-byte file, a 300-3000-byte range of resources, you can set range:bytes=300-3000, you want to fetch 300-3000 bytes and 5000 bytes to the last resource, You can set range:bytes=300-3000,5000-
Content negotiation
The content negotiation mechanism refers to the client and the server to negotiate the content of the response resource, and then provide the most suitable resources for the customer, and the content negotiation will be judged by the language, character set and encoding method of the response resource. The following header fields are involved:
· Accept
· Accept-charset
· Accept-encoding
· Accept-language
· Content-language
Content negotiation technology is divided into three different types
Server-driven negotiation: The service side takes the requested header field as a reference, processes it on the service side, and returns the corresponding resource.
Client-side driver negotiation: The user selects manually through the optional list provided by the browser, or by using the JS script on the Web page itself.
Transparent negotiation: A combination of server-driven negotiation and agent-driven negotiation, when a cache is provided with a series of available representations that make up a response, and the differences in dimensions can be fully cached, the cache becomes capable of performing server-driven negotiation on behalf of the source server for subsequent requests for that resource
Content negotiation can be consulted: content negotiation
http method and Status code
HTTP method
HTTP also contains methods to specify that the requested resource generates some behavior as expected. For these methods, the most used is get and post, everyone must be very familiar with ~
Methods supported by http/1.1 and http/1.0
+ View Code
HTTP status Code
HTTP status code indicates the return result of the client HTTP request, through the status code, the user can know the HTTP request whether there is a problem, the problem is, the following simple list some HTTP status code:
123456 |
状态码类别 状态码性质 1XX 信息性状态码 2XX 成功状态码 3XX 重定向状态码 4XX 客户端错误状态码 5XX 服务器错误状态码 |
Some common status codes are:
123456789101112 |
200 ok 正常处理请求
204 no content 服务端接收请求并成功处理,但返回的响应报文不含实体的主体部分
206 partial content 客户端进行范围请求,服务端成功执行这部分范围的get请求
301 moved permanently 永久性重定向
302 found 临时性重定向
303 see other 表示由于请求对于的资源存在另一个URI,应使用GET方法定向获取请求的资源
304 not modified 客户端发送附带条件的请求,服务端允许访问资源,但为满足条件
401 unauthorized 表示发送的请求需要有通过HTTP认证的认证信息
403 forbidden 请求资源的访问被服务端拒绝
404 not found 服务端无法找到请求的资源
500 internal server error 服务端执行请求时发生错误
503 service unavailable 服务端处于无法处理请求状态
|
HTTP Proxy and cache
Agent
A proxy is an application that has a forwarding function, and the client's request is forwarded to the server, and the response to the server is forwarded to the client. The proxy does not change the URI of the request and is sent directly to the server holding the resource.
Multiple proxy servers can be cascaded during HTTP communication, and the Via header field is appended to mark the host information that passes through.
Cache
A cache is a copy of a resource that is saved within a proxy server or client local disk, and uses caching to reduce access to the source server to save traffic and communication time, or to achieve a better interaction experience.
If the requested resource is already cached, it is returned directly to the client by the cache server, or the client reads directly from the local disk. The cache can be set to a valid time, and when the cache expires, the client/cache server can re-request new resources like the source server.
HTTP Security Upgrade--https
After talking about some of the advantages of HTTP, take a look at the disadvantages of HTTP
· Communication using plaintext (unencrypted), content may be tapped
· The request/response is spoofed without verifying the identity of the communicating party
· Unable to prove the integrity of the message, there is a possibility of tampering
At any corner of the internet there is a risk of eavesdropping on communication content.
According to the mechanism of TCP/IP protocol, communication content may be subject to peep on all communication lines. Even if the communication is encrypted, it will be peered into the communication content, but only after the encryption, it is possible that people can not decipher the correct meaning of the message message, the content of the encrypted message itself will be seen.
In general, eavesdropping is done by collecting packets that flow over the Internet, which can be achieved by grabbing packets and sniffing tools, which makes it possible to steal some of the public WiFi accounts.
This can also be used to encrypt the message body (transmit content) for the plaintext transmission.
For authentication This is possible by installing certificates locally, storing authentication information, etc.
Hash value check, digital signature, etc. for ensuring information integrity, Md5/sha-1
HTTP = = HTTPS
HTTP does not have an encryption mechanism, but can be passed and SSL (Secure Sockets Layer ... Label reading pause) or a combination of TLS (Security Layer Transport Protocol), the use of SSL to establish a secure communication line, you can be on this line of cheerful HTTP communication. Since the combination of the Ssl,http upgrade to HTTPS (or HTTP over SSL), this is not yet a complete https.
Full HTTPS = HTTP + encryption + authentication + integrity Protection
A full HTTPS request
1. Clients send client Hello message to start SSL communication, the message contains the specified version of SSL supported by the client, the list of cryptographic components, etc.
2. When SSL communication is available on the server, the serve Rhello message is used as the answer
3. The server sends the certificate message, the message contains the public key certificate
4. Service side sends server Hello done message notification client, the initial phase of the SSL Handshake negotiation Section ends
5.SSL after the first handshake, clients respond with client Key exchange messages, which contain random cipher strings used in communication encryption
6. The client sends a change Cipher spec message that indicates that the communication after the message is encrypted with a cryptographic key that follows the random cipher string in the previous step
7. The client sends a finished message, which contains the overall checksum value of all messages connected to the present
8. Server sends change Cipher spec message
9. Sending finished messages to the server
After the 10.Finished message exchange is complete, the SSL connection is established.
11. Application layer protocol communication, HTTP
12. Client disconnects, send close notify
WebSocket and http/2.0
WebSocket
WebSocket implements full-duplex communication between the Web client and the server, and once the Web server and the client establish a communication connection between the WebSocket protocol, all subsequent communications are dependent on this proprietary protocol.
WebSocket has the push function, the server can push the data directly to the client, do not have to wait for the client's request, because the websocket keeps the connection state, and the header information is small, so that the traffic is correspondingly reduced.
In order to achieve websocket communication. Need to use the above mentioned HTTP header field upgrade, to inform the service side communication protocol changes, when the successful handshake established WebSocket connection, the communication is no longer using HTTP data frame, and the use of WebSocket independent data frame.
http/2.0
Core Strengths/Features
Multiplexing: Multiple requests are completed concurrently through a TCP connection (http/1.1 pipelined response to multiple requests is blocked, http/2.0 resolves this issue and supports priority and traffic control)
Head Compression: Packet header compression processing for smaller number of traffic
Server-side push: The server can push resources to the client faster
Semantic improvements: Transferring data in binary format
http/2.0 Reference: English version in Chinese and English
The attack technology of web
Server-Targeted active attacks, representative SQL injection and OS command injection, SQL injection refers to the attacker through direct access to the Web application, the attack of SQL code into the service side to execute the database to obtain the required data information or tamper with the database information (the way the SQL statement generated by the vulnerability) OS command attack refers to the purpose of executing an illegal operating system command on the server to achieve the attack.
The server-targeted passive attack, with the following pattern:
1. An attacker induces a user to trigger an already set trap to initiate an HTTP request to send an embedded attack code
2. http that contains the attack code is sent to the server and allows
3. After running the attack code, a security vulnerability Web application becomes an attacker's springboard, resulting in the theft of personal information (the knowledge of the network security class is all back to the teacher ...). At first saw these, a face of a confused force ...)
A client-targeted proactive attack, a representative cross-site Footstep attack (XSS), an attack that runs an illegal HTML tag or JavaScript code in a user's browser through a Web site that has a security vulnerability that can obtain user personal information, etc.
There are HTTP header injection attacks, message header injection attacks, directory traversal attacks, vulnerabilities contained in remote files, etc.
Security vulnerabilities caused by Setup or design
Forced browsing, from files placed in a public directory on the Web server, to the disclosure of personal information/internal file information, by browsing those files that were otherwise involuntary
A vulnerability caused by throwing an error message exposes the system to a point of failure, providing an attacker with a breakthrough
Open redirection, redirection of any URL, allows an attacker to induce a user to a malicious Web site
Security vulnerabilities due to session management negligence
Session hijacking, the attacker gets the user session ID by some means, and uses this session ID to impersonate the user for the purpose of the attack.
Some ways an attacker could obtain a session ID:
· The session ID is inferred from the informal generation method
· Stealing session IDs through eavesdropping or XSS attacks
· Forcibly acquiring session IDs through session pinning attacks
Session fixed attack, the approximate mode is: The attacker visits the site to get an unauthenticated session ID, set traps to force the user to use this session ID to authenticate, once the user triggers the trap and complete authentication, the attacker can use the identity of the user to successfully log on to the site
Cross-site request forgery, where an attacker forces a set of traps to make unexpected information about a completed authenticated user in some state updates
Other security vulnerabilities
Password cracking, access to password, breakthrough authentication (through the network password trial or decryption of the encrypted password), password cracking such as dictionary attacks, rainbow tables, access keys, encryption algorithm vulnerabilities, etc.
Click Hijack, also known as the interface camouflage, mostly with transparent layer elements as a trap to achieve the purpose of attack
Dos attacks that service-side services are stopped (using access requests to overload resources, resource exhaustion to stop services, stopping services through attack security vulnerabilities)
Backdoor procedures, developer Debug programs, developers for their own interests implanted programs, etc.
"Graphic http" Reading notes