HTTP Authoritative Guide 1~6 notes

Source: Internet
Author: User
Tags error status code response write reverse dns

A few days ago read this classic book, basically has been finished, there are many useful information, the following is the first six chapters of the notes: Chapter One
    1. HTTP is the application layer protocol, TCP is applied in the transport layer, IP is in the network layer
    2. TCP provides error-free data transfer, sequentially transmitted, non-segmented (can pass any length of data at any time)
    3. An HTTP client needs to establish a TCP/IP connection with the address and port number before sending a message to the server
    4. The hostname is the URL and can be converted to an IP address via DNS (domain Name service) with a default port number of 80
    5. Steps for the browser to get HTML resources:
      1. Parse out host name
      2. DNS Conversion to IP address
      3. Parse out port number
      4. Establish a TCP connection
      5. Sending HTTP request messages
      6. Receiving HTTP response messages
      7. Close the connection, display the document
    6. The currently commonly used HTTP protocol is version 1.1, and Http-ng is also known as http/2.0, which greatly optimizes performance (tenth chapter)
    7. Other more important applications in the Web:
      1. Proxy: HTTP intermediate entity between client and Server (chapter sixth)
      2. Cache: Save a copy of a frequently used page in a closer proximity to the client (seventh chapter)
      3. Gateways: Special Web servers that connect to other applications (chapter eighth)
      4. Tunneling: A special agent for blind forwarding of HTTP communication messages
      5. Agent Agent: Semi-intelligent Web client that initiates automatic HTTP requests

Chapter II URLs and resources
    The
    1. URL is the resource location that the browser needs to look for information, the location of the resource to represent the resource
    2. URI is a generic resource identifier, the URI consists of a URL and a urn, and the name identifies the resource, regardless of where it is located. The subset of knowledge URLs processed by
    3. http, sometimes not differentiated, refers to URLs. The
    4. URL is divided into the following three sections
      1. http: is a URL scheme that tells Web clients how to access resources, stating that they want to use the HTTP protocol
      2. URL, hostname: refers to the location of the server
      3. after Path: Refers to the resource path, which is the specific resource on the server that the water is requesting.
    5. URL Syntax: scheme://User: password @ Host: port/path; query # fragment
      1. where parameters are required for compliance with the Protocol, used previously; split with path
      2. query is sent to the server. The fragment is handled by the client on its own
    6. relative URL:./: Indicates relative to scenario://hostname location
    7. auto-extend URL
      1. hostname extension: For example, enter Baidu to build a Www.baid u.com
      2. History extension: Stores the URL history visited by a previous user and enters the URL with the URL in the history of the Golden Dome ratio
    8. URL character set and encoding mechanism:
      1. Character set: Using us- The ASCII character set, which uses a 7-bit binary code to be the most keys supplied by the English typewriter with a few non-printable control characters for text formatting and hardware notifications
      2. encoding mechanism: In order to avoid the restriction of safe character set notation, the notation represents the unsafe character, which contains a percent percent , followed by two hexadecimal digits that represent the ASCII code of the character
      3. character restriction: Some characters with special meanings are not recommended, such as%/,. #?;: $ + @ & = etc.
    9. Some common scenarios: http H TTPs mailto FTP rtsp rtspu file news telnet
    10. URL tells the location where the resource is located and is not valid if the resource is moved. The urn solves this problem.

Chapter III HTTP Messages
    1. An HTTP message is a block of data that is sent between HTTP applications.
      1. Terminology: Inflow and outflow to describe the direction of transaction processing. The message flows to the source-side server, which flows back to the user's agent when the work is completed
      2. All messages flow downstream, and all messages are sent upstream of the recipient.
    2. Components of the message:
      1. HTTP messages are simple, formatted chunks of data, each containing a request from the client or a response from the server
      2. They consist of three parts: the starting line that describes the message, the header block containing the attribute, and optionally the body part that contains the data
    3. The syntax of the message:
      1. All messages can be divided into two categories: Request message, Response message
      2. Request Message Format:
Start line: Method request URL Version header block: ...       Entity Body part: ...      3. Response message Format: Start line: Version status Code reason phrase header block: ... Entity Body part: ...
  1. Start line: All HTTP messages start with a starting line, the start line of the request message indicates what to do, and the start line of the response message indicates what happened
      1. The starting line of the request message is called the request line, and the method defined has a get POST DELETE HEAD PUT TRANCE Opinions and other common methods used to tell the server what to do
      2. Status code: Tell the client what's going on
        1. 100~199: Defined 100~101, indicating informational hints
        2. 200~299: Defined 200~206, indicating success
        3. 300~399: Defined 300~305, indicating redirection
        4. 400~499, defined 400~415, indicating client error
        5. 500~599, 500~505 defined, indicating server error
      3. Cause phrase: The status code is paired, and the reason phrase is a readable version of the status code
      4. Version number, in the form of http/x.y, declares the highest supported version
  2. Header: List of key-value pairs
      1. General Header
      2. Request Header: Provide request information
      3. Response Header: Provide response information
      4. Entity Header: Describes the length and content of the subject, or the resource itself
      5. Extension header: New header not defined in specification
  3. Entity body part, the entity's chirography is the load of HTTP message, is the content that needs to transmit, the type that can transmit: picture, video, HTML, software application, credit card affairs, email etc.
  4. Method:
      1. Security methods: The Get and head methods are called security methods, and HTTP requests that use both methods do not produce any action
      2. GET: Typically used to request that a server send a resource
      3. HEAD: Similar to get, but the server returns only the header in the response and does not return the body part of the entity
        1. Knowledge of resources without the availability of voluntary access
        2. See if an object exists by looking at the status code in the response
        3. Test if the resource has been modified by looking at the header
      4. PUT: Writes a document to the server. If some publishing systems allow users to create web interfaces, install them on the Web server using the Put method
      5. POST: Originally used to enter data to the server; it is usually used to support HTML forms and to send the completed data to the server.
      6. Trace: When a client request passes through an intermediary node, the original HTTP request may be modified, and the last station of the Trace method pops up a trace response that can be viewed in the body of the response with the original request message and all intermediate application user programs. Can be used to view the effect of intermediate programs on user requests
      7. Options: Requesting the Web server to inform it of the various features it supports
      8. Delete: Ask the server to delete the resource specified by the request URL, but the client application cannot guarantee that the delete operation must be performed
      9. Extension methods
  5. Status code:
    1. 1XX: Informational Status code: http/1.1 in the introduction of the Protocol
        1. 100,continue: Description received the initial part of the request
        2. 101,switching protocols: Indicates that the server is switching the protocol to the Protocol listed in the update header, as specified by the client
    2. 2XX: Success Status Code
        1. 200,ok: Indicates no problem with the request
        2. 201,created: User requests to create server objects
        3. 202,accepted: The request has been accepted, but the server has not performed any action on it
        4. 203, the Entity header contains information that is not from the source-side server, but from a copy of the resource
        5. 204,no Content: The response message contains several headers and a status line, but there is no body part of the entity
        6. 205,reset Content: The code used primarily for the browser, responsible for informing the browser to clear all form elements in the current page
        7. 206,partial Content: A partial or range request was successfully executed
    3. 3XX: Redirect Status code
        1. 300,multiple Choices: Client requests a URL that actually points to multiple resources
        2. 301,moved permanently: The request URL has been removed, the location header of the response contains the URL where the resource is now
        3. 302,found: Similar to 301, but the client should use the URL given by the location header to temporarily locate the resource, and future requests will still use the old URL
        4. 303,see Other: Informs the client that another URL should be used to obtain the resource
        5. 304,not Modified: Resource not modified, local resource is the latest version
        6. 305,use Proxy: Resources must be accessed through a proxy
        7. 307,temporary Redirect: Similar to 301
    4. 4XX: Client Error status code
        1. 400,bad Request: Informs the client that an error has been sent
        2. 401,unauthorized: Clients need to authenticate themselves before gaining access to resources
        3. 402,payment Required: The status code is reserved
        4. 403,forbidden: The request is rejected by the server, and the main part of the response describes the reason
        5. 404,not Found: The server could not find the requested URL
        6. 405,method not allowed: The requested URL does not support this method
        7. 406,not acceptable: The server does not have a resource that matches the URL of the client's first passenger
        8. 407,proxy authentication Required: Similar to 401, but requires authentication proxy server for resources
        9. 408,request Timeout: The client takes too long to complete the request
        10. 409,conflict: The request may cause some conflicts on the resource
        11. 410,gone: With 404, but the server used to have that resource
        12. 411,length Required: The server requires that the Content-lnegth header be included in the request message
        13. 412,precondition Failed, the client initiated the conditional request, but one of them failed.
        14. 413,request entity Too Large: The principal part sent by the client is too long
        15. 414,request URI Too Long: The request URL sent is too long
        16. 415,unsuuported Media Type: Unable to understand or support the entity content type sent by the client
        17. 416,request Range not satisfiable: The request message lock clear Autumn is a range of the specified resource, and the range is invalid or not satisfied
        18. 417,expectation Failed: The requested expect request header contains an expectation, but the server does not meet the expectation
    5. 5XX, server Error status code
        1. 500,internet Server Error: The server encountered a bug that prevented it from serving the request
        2. 501,not implemented: Client initiated requests that exceed the capabilities of the server
        3. 502,bad Gateway: A server that is used as a proxy or gateway receives a pseudo-response from the request response, such as the inability to connect to the parent gateway
        4. 503,service Unavailable: The server is temporarily unable to service a request
        5. 504,gateway Timeout: Similar to 408, except that the response here comes from a gateway or proxy
        6. 505,http version not supported: The server received a request that uses a protocol edition that it cannot or is unwilling to support
  6. Header: Together with the method determines what the client and server can do
      1. General Header: The most basic information related to a message
        1. Connection: Allow client and server to specify the amount of the request/corresponding connection option
        2. Date: Provides the dates and event flags that describe when the message was created
        3. Mime-version: Gives the MIME version used by the sending side
        4. Universal Cache Header: Cache-control is used to indicate with message delivery cache;
        5. ...
      2. Request Header
        1. Client-ip
        2. From
        3. Host
        4. Referer
        5. ...
        6. Accept header: Tell the server which media types, character sets, encoding methods, languages, etc. can be sent
        7. Conditional Request Header: The client wants to add some restrictions to the request, Expect
        8. Security Request Header: Challenge/Response authentication for request
        9. Proxy request Header
      3. Response Header: Provides some additional information to the client
        1. Negotiation Header
        2. Security Response Header
      4. Entity Header: Provides a wealth of information about the entity and its content
        1. Content Header: Provides specific information about the entity's content
        2. Entity Cache Header: Describes how or when to cache
      5. Extension header

Fourth Chapter Connection Management
    1. TCP/IP is a common set of hierarchical protocols for packet-switched networks that are used globally by computers and network devices.
    2. TCP provides a reliable bit transport pipeline for HTTP, and the bytes that are filled in from a TCP connection are correctly transmitted from the other end in the original order. TCP streaming is sent by a small block of data called an IP packet. When HTTP is transmitting a message, the contents of the message data are transmitted sequentially through an open TCP connection in the form of a stream. After TCP receives the data stream, the data stream is hacked into a small block of data called a segment, and the segment is encapsulated in an IP packet and transmitted over the Internet.
    1. TCP is the port number to keep all of these links running correctly. While the TCP connection is identified by 4 values: Source IP address, source port number, destination IP address, destination port number, no two different connections all 4 values are the same.
    2. Socket sockets:
      1. Server-side:
        1. Creating a Socket Create
        2. Bind Port Bind
        3. To monitor Listen
        4. Wait for connection Accept
        5. Reads the request and processes the Read
        6. Callback Response Write
        7. Close the connection close
      2. Client
        1. Creating a Socket Create
        2. Connect to the server Ip:port on connect (this establishes a TCP connection for a road server)
        3. Connection succeeded and send request write
        4. Receive and process response read
        5. Close the connection close
    3. HTTP latency is mainly due to TCP network delay, the main reasons for the delay of HTTP transactions are as follows:
      1. The client needs to determine the IP address and port number of the Web server based on the URI, and the DNS resolution system may take tens of seconds without caching.
      2. The client sends a TCP connection request to the server and waits for a request to be answered by the server, and each TCP connection will have a delay of up to 2 seconds
      3. After the connection is established, the client needs to send the request and the server needs to process the request. It takes time for the Internet to transmit the request message and the server to process the request message.
      4. The Web server will return the HTTP response, which takes time
      5. In addition, the size of these TCP network latencies depends on the deceleration, the network and server load, the size of the request and response messages, and the distance between the client and the server. Additional technical complexity of TCP protocol
    1. TCP connection Handshake steps
      1. To request a new TCP connection, the client wants the server to send a small TCP packet, and the SYN token is set in the packet, indicating that it is a connection request
      2. The server accepts the request and processes it, callbacks a TCP packet, and the SYN and ACK tokens in this group are set to indicate that they have been accepted
      3. The client callbacks a confirmation message to the server to notify the connection that it was successfully established

7. Delay Acknowledgement: Each TCP segment receives a good segment and sends a small acknowledgment packet back to the sender. If the sender does not receive it within the specified time, the packet is considered corrupted and the data is sent again.

8. Processing of HTTP connections---> Not read

      1. Connection header field can host three different types of labels
        1. The HTTP header field name, which lists only the headers related to this link
        2. Any tag value that describes the non-standard options for this link
        3. Value close, which indicates that the persistent connection needs to be closed after the operation is completed
      2. Serial transaction processing time delay
        1. Disadvantage: Some browsers cannot determine the size of an object until the object is loaded, and the location cannot be determined. So before loading enough objects, the screen is blank and the user experience is low.
        2. Addressing the required technologies: Parallel connections (multiple TCP connections), persistent connections (reuse of TCP connections), piped connections (leveraging shared TCP connections), and multiplexed connections (still in the experimental phase)

9. Parallel Connection: HTTP allows clients to open multiple connections and execute multiple HTTP transactions in parallel

      1. Parallel connections can increase the load speed of a page: The sending request transaction is overlapping, and the delay of the connection is overlapping.
      2. Parallel connections are not necessarily faster: loading multiple objects in parallel can compete for bandwidth when the client's network bandwidth is insufficient, while a large number of connections consume a lot of memory resources
      3. Parallel connections can make people "feel" faster

10. Persistent Connection: http/1.1 allows the HTTP device to keep the TCP connection open after the transaction has ended, and to reuse the existing connection over the future HTTP request. Non-persistent connections are closed after each transaction ends, and persistent connections remain open between different transactions until the client or server decides to close them. ...

11. pipelined Connection: Before the response arrives, you can put multiple requests into the queue, when the first request over the network to the server side, the second, the third request can also start sending, which can reduce the network loopback time, improve performance. Pipe connections have the following limitations:

      1. If the HTTP client cannot confirm that the connection is persistent, the pipe should not be used
      2. HTTP responses must be sent in the same order as the request
      3. The HTTP client must be ready for the connection to close at any time and to re-send all outstanding pipelined requests
      4. HTTP clients should not use pipelining to send requests that produce side effects, such as Post

12. Close the connection:

      1. Content-length and truncation operations: if the actual length does not match the content-length, the receiver should question the correctness of the length; the cache proxy should not cache the response
      2. If a transaction is executed once or multiple times and the result is the same, then the transaction is idempotent, such as Get/head/put/delete/trace/options. And the client should not pipe non-idempotent transactions, such as post, or it will cause some uncertain consequences, to send a non-idempotent transaction needs to wait for the corresponding state of the previous request.
      3. The connection is closed normally, and the TCP connection is bidirectional. The socket Theft close () method to close the TCP connection will shut down both the input and output channels to completely shut down. Call shutdown () to shut down the input or output channel separately, which is called semi-shutdown. A simple HTTP application can only use a full shutdown. When a client or server needs to close a connection abruptly, it should "gracefully shut down the transport connection."
      4. The shutdown input/output is for the server. The output channel of the closed connection is more secure, and the peer entity at the other end of the connection receives a notification after all the trees are read from the buffer, stating that the stream is over. It is dangerous to turn off the connected input channel and most operating systems will treat this as a serious error.

Fifth Web server All Web servers regardless of style, size, can accept requests for resources HTTP request, content back to the client
  1. The actual web server will do the following things
      1. Establish a connection: receive a client connection and close it if you do not want to establish a connection with the client
      2. Accept request: Reads an HTTP request message from the network
      3. Process request: Interpret the request message and take action
      4. Accessing resources: Accessing the resources specified in the message
      5. Build response: Create an HTTP response message with the correct header
      6. Send response: Return the response to the client
      7. Record transaction processing process
  2. Accept client Connections
      1. Processing a new connection: After the Web server receives a TCP connection from a client request, it determines which client is at the other end of the connection, resolves the IP address from the TCP connection, and after the connection is established, the server is ready to listen for data transfer on the new connection. The Web server is free to reject or immediately close any connection.
      2. Client hostname Recognition: The Web server can be configured with reverse DNS to translate the client IP address into a client hostname, but many Web servers restrict that functionality.
      3. Determine the client user through Ident: The server uses the Ident protocol to receive the user name of the client's new connection and resolves the client's response that contains the user name, and can work well within the organization, but there are many reasons not to work well on the public network
  3. Receiving Request messages
      1. Parse the contents of the request message: Parse the request line, find the request method, specify the URI with the version number, etc.; Read the request body, the length is specified by the Content-length header
      2. Internal representation of the message: processing the request message, such as placing the header in a quick query table, to quickly access the specific value of a particular header.
      3. Connected input/output processing structure: Because the request may arrive at any time, the Web server constantly observes that there are no new Web requests
        1. Single-threaded Web server: Only one request is processed at a time, the next connection is processed, but other connections are ignored during processing, which can cause serious performance problems.
        2. Multi-process and multi-threaded Web servers: can be created as needed or reserved for some threads/processes beforehand
        3. Multiplexing I/O servers: In a multiplexed structure, you need to monitor the activity on all connections, and when the state changes, the link is processed in small amounts; after processing is complete, return to the open list and wait for the next state change.
        4. Multiplexed multithreaded Web servers: Combining multithreading with multiplexing to take advantage of multiple CPUs on a computer platform
  4. Processing requests: Once the Web server receives the request, it can process the request based on the method, resource, header, and optional body part
  5. Mapping and access to resources: Web servers are resource servers responsible for sending pre-created content, such as HTML pages, JPEG images, and dynamic content generated by resource generators running on the server
    1. The file system of the Docroot:web server will have a dedicated folder for Web content, called the root of the document, where the Docroot,web server obtains the URI from the request message and attaches it to the back of the document root directory. The Apache server can add documentroot/usr/local/httpd/files as the root directory in the httpd.conf file, but cannot leave the relative URL out of the docroot, such as http:// www.yf403.cn/. /Is not allowed
        1. Virtual managed Docroot: You need to configure a virtualhost block for each virtual Web site, and each virtual server contains DocumentRoot
        2. User's home directory Docroot
    2. Directory list: The Web server can accept requests for a directory URL, and its path can be resolved to a directory. Most servers go back and look for index.html to indicate the default directory. Apache can be set DirectoryIndex to configure the default directory file used by the file name collection, you can use the Apache Directive "options-indexes" to prevent the automatic generation of directory index files
    3. Mapping of dynamic Content resources: The Web server can also map URIs to dynamic resources and map to programs that dynamically generate content on demand.
        1. Apache allows the user to map the URI pathname component to the executable directory, as the following directive indicates that all URIs that start with/cgi-bin/should be executed in the directory/usr/local/etc/httpd/cgi-programs/ Find the appropriate file: scriptalias/cgi-bin//usr/etc/httpd/cgi-programs/
        2. Apache also allows users to use a special file extension to identify the executable file, in this way can put the executable script in any directory, the following Apache configuration Directive Cheuk Ming to execute all the. CGI end of the Web resources: AddHandler cgi-script . CGI
        3. Modern application servers have a more powerful and efficient service-side dynamic content support mechanism, including Microsoft's ASP and Java Servlet
    4. Server-side inclusion (SSI): If a resource is identified as having a server-side inclusion, the server processes the contents of the resource before sending it to the client, such as scanning the content for specific templates, which may be variable names or embedded scripts. This is one way to create dynamic content
    5. Access control
  6. Build response: When the server recognizes the resource, it executes the action and returns the response message
      1. Response entity: Content-type describes the body MIME type; content-length describes the length of the response body; The subject content of the actual message
      2. MIME type: Multiple methods to determine MIME type:
        1. Suffix name
        2. Magic Classification: Scans the contents of each resource and matches a known schema table to determine the MIME type
        3. Display Classification: Configure the server, regardless of file extension and content, forcing a file or directory to use a type
        4. Type negotiation: Configure the Web server to negotiate with the user to decide which format to use
      3. Redirect: The return code is 3XX, and the location response header contains the new address of the content or the URI of the preferred address. Applicable situation:
        1. Permanently deleted resources, 301, with a new URL that updates information such as bookmarks
        2. Temporarily deleted resources, 303 and 307: Resources are temporarily removed or renamed, redirected to a new URL, because it is temporary, so bookmarks are not updated
        3. URL enhancement: The server rewrites the URL with redirection, and the new URL contains the status information, 303/307
        4. Load balancing: Overloaded servers redirect requests to a server that is not heavily loaded, 303/307
        5. Server affinity: The server can redirect the client to the server that contains the client information, 303/307
        6. Canonical directory name: The requested URI does not have a trailing slash, and most servers redirect the client to a slash
  7. Send response: For a non-persistent connection, you need to close your end of the connection after sending the complete message, or for a persistent connection, the Content-length header needs to be properly computed, or the client will not know when the response is over.
  8. Logging: Describing a transaction that has been performed in a log file
The sixth Chapter Proxy Web Proxy Server is the intermediary entity of the network, between the client and the server, sending HTTP messages back and forth between the endpoints. The Web Proxy is the server, and the client needs to accept the request message, return the response message, and play the role of the server, for the server, the proxy needs to send the Web request message, receive the Web Response message, and play the role of the client.
  1. Private and shared proxies: a single client-only proxy is called a private agent, and agents shared by multiple clients are called public proxies
      1. Public proxies: Most are this, easy to manage
      2. Private agents, not very common
  2. Comparison of proxies and gateways:
      1. The agent is connected to two or more applications that use the same protocol
      2. The gateway is connected to two or more endpoints using different protocols, acting as a "protocol translator."
      3. In the actual process, the agent also often has to do some protocol conversion work.
  3. Agent functions: Improve security, improve performance, save money; The proxy server can see and touch all the HTTP traffic that flows, so the agent can monitor traffic and modify it. Here are some examples of usages:
      1. Child filter: Implements the function of the filter
      2. Centralized document access control: Implement a unified access control policy between Web servers and Web resources, and create an audit trail mechanism
      3. Security firewall: Limit which application-layer protocols on a single security node data can flow into or out of an organization, and can check traffic to eliminate viruses
      4. Web caching: Maintains local copies of common documents and provides them on demand to reduce slow and expensive internet traffic
      5. Reverse proxy: Proxies can impersonate a Web server, receive real requests to the Web server, and then initiate communications with other servers to locate the requested content on demand. Can be used to access public content on slow Web servers to improve performance. Known as Server accelerator
      6. Content routers: You can request to a specific Web server based on the state of the Internet traffic and the content type
      7. Transcoding: You can modify the content's principal format before sending the content to the client, and the transparent conversion between the representations of the data is called transcoding.
      8. Anonymous: The identity feature information is automatically removed from the HTTP message, providing a high degree of privacy and anonymity.
  4. Deployment of a proxy server
      1. Export agent: Fixed to the local network exit point, control the flow
      2. Access (ingress) Proxy: Placed on an ISP access point to process aggregate requests from clients
      3. Reverse proxy: Deployed at the edge of the network, used as a Web server substitute, processing requests sent to the server, and, if necessary, requesting resources from the server to improve performance.
      4. Network switching agent: To reduce the congestion of nodes by placing them on the Internet Peer exchange point between networks
  5. Hierarchical structure of agents
      1. Static proxy
      2. Dynamic Proxy: Select parent Agent
        1. Load Balancing
        2. Routing near a geographic location
        3. Protocol/Type Routing
        4. Subscription-based routing
  6. How to obtain the agent's traffic:
      1. Modify client: Manual or automatic proxy settings for the browser
      2. Modify the network: This interception typically relies on switching devices that monitor HTTP traffic, as well as routing devices, and import traffic to the proxy, called the "interception agent"
      3. Modify the DNS namespace: A proxy server that is placed before the Web server and can manually edit the DNS list to determine the appropriate proxy or server
      4. Modify the Web server: Configure some Web servers to send a 305HTTP redirection command to the client, redirecting client requests to a proxy
  7. Proxy settings for clients (note: Network, DNS, server configuration in the 20th chapter)
      1. Manual configuration: If you manually configure the agent in Internet Options for IE, only one proxy server can be set
      2. Pre-configured server: Using PAC file, a small JS program, you can calculate proxy settings, configuration method and a similar
      3. Proxy Auto-configuration: Provides a URI to the proxy auto-configuration file written by JS and runs to decide whether to use a proxy
      4. Agent Discovery for WPAD (Web Proxy autodiscover protocol): automatically detects which configuration server the automatic configuration file should be downloaded from. The algorithm of the Protocol uses the discovery mechanism, and the step-up strategy automatically finds the appropriate PAC file for the browser. WPAD uses a variety of discovery techniques in the order of (DHCP, SLP, DNS-well-known hostname, DNS src record, DNS service URI in TXT record). The client that implements the WPAD protocol needs:
        1. Use WPAD to find the URI of PAC
        2. Gets the PAC file from the specified URI
        3. Executing a PAC file to determine the proxy server
        4. Using a proxy server for requests
  8. Some questions about proxy requests
    1. The proxy URI differs from the server URI, where the client sends a request to the Web server with only a partial URI (no scheme, host, or port), but when the client sends a request to the proxy, the request includes the full URI
        1. This is thought to be inherent in the original HTTP design, the client back directly to a single server dialog, there is no virtual host, there is no rules for the agent, and a single server knows its host name and port
        2. So we're going to send a partial URI to the server, send the full URI to the proxy, and when the client proxy is not set, a partial URI is sent, and the proxy is set to send the full URI
    2. The same problem as the virtual host: "Scenario/HOST/port" missing
        1. The displayed proxy requires that the full URI be used in the request message to resolve
        2. The virtual host Web server requires host header for hosting and port information
    3. The interception agent receives a partial URI: the client does not always know that it is talking to the agent, that the agent is not visible to the client, that the client traffic may pass through a substitute or an interception agent, and that the full URI is not sent in either case
        1. The reverse proxy usually pretends to be a server hostname or IP address as a reverse proxy. The client cannot differentiate between the reverse proxy and the Web server, so it sends a partial URI
        2. Intercept agent: Intercepts the request sent from the client to the server and forwards it. is subject to a partial URI that is sent to the Web server
    4. The agent can handle either proxy requests or server requests
        1. If the full URI is provided, then the proxy should use the full URI
        2. If you provide a partial URI, and you have the host header, you should use the host header to determine the name and port number of the original server
        3. If a part is provided and there is no host header, the original server should be determined in other ways
    5. Modification of URIs during forwarding: ...
    6. Client Auto-Extension and hostname resolution for URIs: Common users do not enter the prefix www or suffix. com case, the browser will automatically expand
    7. Parsing of URIs when there is no proxy: input->dns Search host ' aaa ' and search failures--browser automatically expands to Www.aaa.com->DNS Search host ' www.aaa.com ', return IP address The browser tries to connect until the connection is successfully established
    8. Parsing of Uri when proxy: Enter ' AAA '->dns to search Proxy server's address----Get proxy IP address, and browser to automatically expand www.aaa.com-> browser attempts to connect until successful
    9. URI parsing with intercept proxy: No proxy for client, similar to H
  9. Tracking messages
      1. Via header: Lists information about each intermediary node of the message path, including protocol name (optional), protocol version, node name, comment (optional)
      2. Via Request and Response path: The response path is the opposite of the request path
      3. Via and Gateway: via header record protocol transitions within the gateway
      4. Server and Via header
      5. Via's privacy and security issues: the Via string should avoid using the exact hostname and port number, which may be exploited maliciously and can be compressed
  10. Trace method: The user can track the request packet transmitted by the path agent chain, see which proxies have been passed, and how each agent requests the message to be modified, and can use the Max-forwards parameter to set the maximum number of forwards, when the maximum number of forwards is 0, The trace message is sent to the client, and each time it is forwarded, the parameter is reduced by one.
  11. Agent authentication: Prevents access to the device's content requests until the user provides a valid certificate of permissions to the agent
      1. Restricted content requests reach the proxy server, callback 407 status code, and the proxy Authorization header field that describes how to provide these certificates
      2. When the client receives 407, it collects the required certificate from the local database or prompts the user
      3. When a certificate is obtained, the client sends a new request and provides the required certificate in the header field
      4. If the certificate is valid, the agent sends the request down, otherwise another 407 reply
  12. Interoperability of agents
      1. Processing code does not support headers and methods
      2. Options method: Discover support for optional features clients can determine the capabilities of the server before interacting with the server
      3. Allow Header: Lists the list of methods supported by the request URI, such as Allow:get, HEAD, PUT; the agent cannot modify the Allowed header field

HTTP Authoritative Guide 1~6 notes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.