The whole process of accessing a Web page

Source: Internet
Author: User
Tags ack get ip html form response code domain server

Introduction

Open the browser, enter the URL in the Address bar, enter, the page content appears. What happened to the whole process? What is the principle? The following are collated and summarized.

The whole process can be summed up in several sections:

    1. The domain name resolves to an IP address;
    2. TCP connection to the destination host (three-time handshake);
    3. Send and receive data (browser and destination host start HTTP access process);
    4. Disconnect the TCP connection from the destination host (four waves);
Body

Here is a detailed description of the principle:

1. Domain name resolved to IP address

There are two ways of accessing the destination address:

① is accessed using the destination IP address. Because the IP address is a bunch of numbers inconvenient memory, so there is a domain name of this character type identification.

② use domain name access. Domain name resolution is the conversion process of domain name to IP address, the resolution of the domain name is done by the DNS server.

DNS domain names are resolved using the UDP protocol. The entire domain name resolution process is as follows:

    1. The browser issues a DNS request to the native DNS module, and the DNS module generates the relevant DNS packets;
    2. The DNS module passes the generated DNS messages to the UDP Protocol unit of the Transport layer;
    3. The UDP protocol unit encapsulates the data into a UDP datagram, which is passed to the IP Protocol unit of the Network layer;
    4. The IP protocol unit encapsulates the data as an IP packet with the destination IP address of the DNS server IP address;
    5. The encapsulated IP packet is transmitted to the Protocol unit of the data link layer;
    6. When sending in the ARP cache to query the relevant data, if not, send an ARP broadcast (containing the IP address to be queried, the host that received the broadcast to check their IP, the eligible host will contain its own MAC address of the ARP packets sent to the ARP broadcast host) request, waiting for the ARP response;
    7. After receiving the ARP response, the IP address is written to the ARP cache table with the information corresponding to the next hop MAC address of the route;
    8. After the cache is written, the destination MAC address is populated with the address of the next hop to be routed and forwarded as a data frame;
    9. Forwarding may be carried out several times;
    10. The DNS request arrives at the Data Link Layer Protocol unit of the DNS server;
    11. The Data Link Layer Protocol unit of the DNS server resolves the data frame, passing the internal IP packet to the Network Layer IP Protocol unit;
    12. The IP protocol unit of the DNS server resolves the IP packet and transmits the internal UDP datagram to the Transport Layer UDP Protocol unit;
    13. The UDP protocol unit of the DNS server resolves the received UDP datagram, passing the internal DNS message to the DNS service unit;
    14. The DNS Service unit resolves the domain name to the corresponding IP address and generates a DNS response message;
    15. The DNS response message->udp->ip->mac-> my host;
    16. My host receives the data frame and->ip->udp-> the data frame to the browser;
    17. Writes the result of the domain name resolution to the DNS cache table in the form of a domain name and IP address.

TCP connection to the destination host (three-time handshake)

Sending a TCP connection request message to the destination host;

    1. The SYN flag bit in this TCP message is set to 1, which indicates the connection request;
    2. The TCP packet is via IP (DNS)->mac (ARP)-------the destination host;
    3. The destination host receives the data frame and responds to the request reply message through the IP->TCP,TCP Protocol Unit;
    4. The SYN and ACK flags in this message are set to 1, indicating the connection request response;
    5. This TCP message is via IP (DNS)->mac (ARP)----my host;
    6. My host receives a data frame and responds to the request acknowledgement message via the IP->TCP,TCP Protocol Unit;
    7. The TCP packet is via IP (DNS)->mac (ARP)-------the destination host;
    8. The destination host receives the data frame, through the ip->tcp, the connection establishment completes.

Send and receive data (browser and destination host start HTTP access process)

You cannot begin transmitting data until you have established a connection.

    1. The browser sends a GET method message (HTTP request) to the domain name;
    2. The Get method message is via Tcp->ip (DNS)->mac (ARP), gateway-and-host;
    3. The destination host receives the data frame, and the Ip->tcp->http,http Protocol unit responds to the HTTP protocol format to encapsulate the HTML form data (HTTP response); [Gets the host name that the client wants to access from the request information.] Get the web app that the client wants to access from the request information (the Web application refers to a program that provides browser access, referred to as a web app). Gets the Web resource that the client wants to access from the request information. (Web resources, i.e. various files, pictures, videos, text, etc.) read the Web application under the corresponding host, Web resources. Creates an HTTP response with the Read Web resource data. ]

    4. This HTML data is via TCP->IP (DNS)->mac (ARP)---my host;
    5. My host receives data frames, and the browser displays HTML content as a Web page via the Ip->tcp->http-> browser.
HTTP protocol

HTTP request: The HTTP request consists of three parts: the request line, the message header, the request body

The request line begins with a method symbol, separated by a space, followed by the requested URI and version of the Protocol, in the following format: Method Request-uri http-version CRLF
Where method means the request, Request-uri is a Uniform Resource identifier, http-version represents the HTTP protocol version of the request, CRLF indicates carriage return and newline (except for the CRLF at the end, a separate CR or LF character is not allowed).

There are several ways to request a method (all uppercase), and each method is interpreted as follows:

    • Get request gets the resource identified by the Request-uri
    • Post appends new data to the resource identified by Request-uri
    • HEAD request Gets the response message header for the resource identified by Request-uri
    • PUT Request server stores a resource and uses Request-uri as its identity
    • Delete Request server deletes the resource identified by the Request-uri
    • TRACE requests the server to echo received request information, primarily for testing or diagnostics
    • CONNECT reserved for future use
    • Options request the performance of the query server, or query for resource-related choices and requirements

The HTTP response is also made up of three parts: status line, message header, response body
The status line format is as follows: Http-version Status-code reason-phrase CRLF
Where http-version represents the version of the server HTTP protocol, Status-code represents the response status code sent back by the server, and Reason-phrase represents a textual description of the status code.

The status code consists of three digits, the first number defines the category of the response, and there are five possible values:

    • 1XX: Indication information--Indicates that the request has been received and continues processing
    • 2XX: Success-Indicates that the request has been successfully received, understood, accepted
    • 3XX: Redirect--further action is required to complete the request
    • 4XX: Client Error--Request syntax error or request not implemented
    • 5XX: Server-side error-the server failed to implement a legitimate request

Common status codes, status descriptions, descriptions:

    • $ OK//client request succeeded
    • Bad Request//client requests have syntax errors and cannot be understood by the server
    • 401 Unauthorized//request unauthorized, this status code must be used with the Www-authenticate header field
    • 403 Forbidden//server receives request, but refuses to provide service
    • 404 Not Found//request resource not present, eg: Wrong URL entered
    • Internal Server error//server unexpected errors
    • 503 Server Unavailable//server is currently unable to process client requests and may return to normal after some time

eg:http/1.1 OK (CRLF)

Message header:

The

Common request Header
Accept
Accept request header field is used to specify which types of information the client accepts. Eg:accept:image/gif, indicating that the client wants to accept a resource in GIF image format; accept:text/html, indicating that the client wants to accept HTML text. The
Accept-charset
Accept-charset request header field is used to specify the character set accepted by the client. eg:accept-charset:iso-8859-1,gb2312. If the field is not set in the request message, the default is to accept any character set. The
accept-encoding
accept-encoding request header field is similar to accept, but it is used to specify acceptable content encoding. Eg:accept-encoding:gzip.deflate. If the domain server is not set in the request message, the client is assumed to be acceptable for various content encodings. The
accept-language
Accept-language request header field is similar to Accept, but it is used to specify a natural language. EG:ACCEPT-LANGUAGE:ZH-CN. If the header field is not set in the request message, the server assumes that the client is acceptable for each language. The
Authorization
Authorization request header domain is primarily used to prove that a client has permission to view a resource. When a browser accesses a page, if a response code of 401 (unauthorized) is received from the server, a request containing the authorization request header domain can be sent, requiring the server to validate it.
Host (the header field is required when the request is sent)
the host request header domain is used primarily to specify the Internet host and port number of the requested resource, which is usually extracted from the HTTP URL, eg:
We enter in the browser:/http Www.guet.edu.cn/index.html
In the request message sent by the browser, the Host request header field is included, as follows:
Host:www.guet.edu.cn
Use the default port number 80 here, if a port number is specified, becomes: Host:www.guet.edu.cn: Specifies the port number
User-agent
The user-agent request header field allows the client to tell the server about its operating system, browser, and other properties. This header field is not required.

Common response Headers
Location
The Location response header field is used to redirect the recipient to a new position. Location response header fields are commonly used when changing domain names.
Server
The server Response header field contains the software information that the server uses to process the request. Corresponds to the User-agent request header field. Below is
An example of the server Response header field:
server:apache-coyote/1.1
Www-authenticate
The www-authenticate response header domain must be included in the 401 (unauthorized) response message, the client receives a 401 response message, and when the authorization header domain is sent to the request server to validate it, the service-side response header contains the header domain.
Eg:www-authenticate:basic realm= "Basic Auth test!" You can see that the server is using a Basic authentication mechanism for the requested resource.

HTTP protocol detailed, readable: http://www.cnblogs.com/li0803/archive/2008/11/03/1324746.html

Disconnect TCP connection to destination host (four waves)

TCP Connection Release process:

    1. The browser sends a TCP connection end request message to the destination host, at which point the fin wait state is entered;
    2. The FIN flag bit of this message is set to 1, which indicates the end request;
    3. TCP End Request Packet via IP (DNS)->mac (ARP)---gateway---host;
    4. The destination host receives the data frame and responds to the end response message through the IP->TCP,TCP Protocol Unit;
    5. The current only response, because the destination host may have data to be transmitted, not in a hurry to disconnect;
    6. The ACK flag bit in the message is set to 1, indicating that the end request was received;
    7. After all the data has been sent to the destination data, a TCP connection end request message is sent to my host;
    8. The FIN flag bit of this message is set to 1, which indicates the end request;
    9. TCP End Request message via IP (DNS)->mac (ARP)-I host;
    10. My host received a data frame, through the IP->TCP,TCP Protocol unit response to end the response message, at this time to enter the status of the wait, because do not believe that the network is reliable, if the destination host confiscated can also be re-issued;
    11. The FIN flag bit in the message is set to 1, indicating the end response;
    12. The TCP response message is through the IP (DNS)->mac (ARP), gateway-to-destination host;
    13. Destination host closed connection;
    14. After the time wait has ended, no reply has been received, stating that the purpose is closed properly and my host is also shutting down the connection.

Summarize:

URL to access the Web site when the whole process of network transmission, can be summarized as:

First through the domain name to find the IP, if the cache does not have to request a DNS server, get IP after the start with the destination host three handshake to establish a TCP connection, the connection is established after HTTP access, transmission and acquisition of Web content, after the transfer and the destination host four waves to disconnect the TCP connection.

The whole process of accessing a Web page

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.