HTTP (1.1) Knowledge points
HTTP protocol Concepts
The HTTP protocol is a Hypertext transfer protocol between the client and server side, which communicates through the request and response, and is a stateless protocol (that is, the request and response are not persisted).
How HTTP is requested (* Common)
- Get: Get server-side resources
- Post: The client transmits the data to the server side and obtains the corresponding return data
Put: Client sends file to server
Head: Returns only the response header information
Delete: Deleting files
Options: Get access to a specified resource
Trace: Client tracing traffic to see how the request was tampered with (common in proxy server)
Connect: Require a tunneling protocol to connect to a proxy server
HTTP (1.1) Persistent connection
Maintain TCP connection status, long-term connection improves transmission speed and improves transmission efficiency
Cookies and session
Cookie principle: Control the client by writing cookie information in Request and response message, the cookie will be stored locally based on the Set-cookie field in the server side for the first visit. This cookie is automatically invoked to access the same server in the future, and the cookie under the different servers is stored under its domain name.
Session principle: Because the login information will have the user's personal information, in order to prevent the user information is stolen, then use the session mechanism, session mechanism to save the user information in the server and generate a specific encryption key, return it to the user to protect the cookie in existence, When the server is accessed again, the server side determines the user for this visit through the session and returns the corresponding page to the user. (The session is currency-sensitive and is determined by the server side.)
Many servers that do not need to log on are now given a specific ID to the user based on information such as the user's IP, the main purpose of which is to reverse crawling.
HTTP Code
That is, the status code returned by the response, indicating the condition of the response obtained by a request.
1 start: request is being processed
2 Opening: Request normal processing
Common:
200:request normal processing, response normal return 204:request normal processing, no resources returned
206:request normal processing, its response content is only part (request header has the size specified response)
3 Opening: Request redirected
Common:
301: The URL of the resource has been updated (permanent redirect)
302: The URL of the resource has been updated (temporary redirect)
303: Similar to 302, different requires user get get resources
304: Resource found, but not eligible for request (non-redirect)
4 Start: The server cannot process the request (client error)
Common:
400:request error (syntax error in request message)
403: Access to resources is not allowed
404: Access to a resource does not exist
5 Opening: Server processing error
Common:
500: Internal Resource error
503: The server is overloaded, or is in a downtime repair
In the crawler HTTP code reason, 301,302 refers to the need to have a verification code or need to log in, 403 often refers to IP is blocked, in the case of normal server, 500 common in the robot protocol caused.
HTTP header information
The first word of request
Https
HTTPS = http+ Encryption + authentication + integrity Protection
Html+css+javascript
HTML: Hypertext Markup Language, rendered, parsed to show the Web page
CSS: Specifies the style of HTML presentation, using HTML tags to locate styles within a space
javascript: is a scripting language used in Web pages, and its main purpose is to implement Dynamic HTML
DOM: is an API for manipulating HTML that uses elements in HTML as objects, commonly used javascrpit to work with
Ajax: You can transfer or read data to the server without refreshing the page, standardize rendering with XHTML and CSS, implement dynamic display and interaction of the DOM, XML and XSLT to realize the exchange and processing of data, xmlhttprequests implement asynchronous data reading, JavaScript processing data
Principle: Create a middle tier between the server and the user, making the user actions and server responses asynchronous, and only requests from the Ajax engine are made to the server when the data is read.
Web Basics Grooming (Notes)