b/S architecture
The most common architectural approach.
Advantages:
1. The client uses a unified (where the unification mainly refers to the principle of unification) of the browser, do not need a special network configuration.
2. The server is based on a unified HTTP protocol. There are many servers based on this protocol, such as Tomcat,nginx,jboss. These servers can be used directly.
The complete process for the request to occur
The most important feature of HTTP is a stateless, short-connected communication, usually the next time a request completes a data interaction, and usually corresponds to a business logic. Then this communication is disconnected.
This is done in order to be able to serve more users at the same time, and prevent blocking due to a user's exclusive connection.
The following focuses on the entire process that occurs after the user enters the URL in the browser.
1. The user enters the URL, which is the URL. such as www.google.com.
The 2.DNS domain name server converts it to an IP address.
3. The client initiates a request to the server based on the IP address.
4. The server side may have a load balancer device to evenly distribute all user requests. After the request arrives at the server, it passes through some complex business logic to handle the user request. The requested data may be in three places:
A distributed cache system, file system, or database. From which data is fetched back to the browser.
5. After the data is returned, the browser parses the data and discovers some static resources (CSS,JS or pictures), and then initiates the request again. These static resources are most likely on a CDN, and if so, the CDN server will process the user's request again.
6. Finally see the full page on the browser side.
The above is the approximate process of requesting data.
No matter how the architecture changes, there are three principles that always remain the same:
1) All resources on the Internet are represented by a URL.
2) You must interact with the server based on HTTP.
3) The data display must be done in the browser.
How to initiate a request
The initiating request essentially establishes a socket connection, but this is a special socket connection.
Before establishing the socket, the browser must resolve the IP address according to the domain name of the URL entered in the Address bar, establish a socket link to the remote server based on this IP address and the default port 80, and then the browser will assemble a Get type HTTP request header based on this URL. Sent to the target server via Outputstream.write, the server waits for Inputstring.read to return data and finally disconnects the connection
Port number: 80
So you can fully impersonate your browser to initiate an HTTP request. There is a dedicated open Source toolkit for handling HTTP requests, called HttpClient.
In Linux, you can simply launch an HTTP request via Curl+url. such as: Curl "www.google.com"
Parsing of requests
The most important part of the HTTP request is the HTTP header.
HTTP request Header:
Accept-charset is used to specify that the client accepts a character set accept-encoding acceptable content encoding accept-language Natural language such as Zh-cnhost Specifies the Internet host and port number of the requested resource user-agent the client tells the server that its operating system, browser, and other properties connection the current connection is maintained.
HTTP response header:
server name used by Server Content-type indicates the media type that is sent to the recipient's entity body, such as Content_type:text/html;charset = gbkcontent-encoding: Tells the browser that the server uses the compression encoding Content-language: Describes the natural language used by the resource Content-length : Indicates the length of the entity body keep-alive : time to keep connected
A common state:
200 Client Request succeeded
302 temporary jump, the address of the jump through location
400 client request syntax has errors and cannot be recognized by the server
404 the request resource does not exist 403 the server received the request, but refused to provide service 500 unexpected error occurred on the service side
Browser's cache Settings
On the browser side CTRL+F5 can re-initiate the request without using cached data. The relevant fields in the request header for sending the message tell the server not to cache the data that is up to date.
These two fields are: Pragma:no-cache Cache-control:no-cache
Public: All content is cached, private is set in the response header: content is cached only in the private cache, No-cache is set in the response header: All content is not cached, No-store is set in the request header and in the response header: None of the content is cached in the cache or the temporary Internet files, set must-revalidation/proxy-revalidation in the response header: If the cached content fails, the request must be sent to the server/proxy for re-authentication, Set max-age=xxx in Request header: Cached content will expire after xxx seconds, only available in HTTP1.1, set in request header
Cache-control will overwrite other fields when Cache-control and expires appear at the same time
Expires: generally followed by a datetime, beyond this time set, the cache content will be invalidated, that is, the browser before making a request to check the field of this page, to see if the page expired, expired and re-launched to the browser service
Last-modified: Generally indicates the last modification time of resources on the server, according to this modification time can determine whether the current requested resource is up-to-date, if it is up to date, the server will return 304 status code to inform the browser
Etag: The server assigns a unique number to each page, and then differentiates the number to determine whether the page content is up to date
DNS Domain name resolution process
1. The user enters the URL, and the browser checks that it has no IP address to cache the domain name.
2. If not, the browser checks the operating system cache for the results of this DNS resolution.
3. If the cache does not yet appear, the operating system sends the domain name to Ldns (local DNS), the domain name server.
4. If not in Ldns, you can only go to the root server domain name server to request resolution. This is a rare situation because there are only 13 or so servers around the world.
5. The ROOT name server returns the primary Domain name server (GTLD server) address of the domain to which the local domain name server is being checked. gTLD is an international top-level name server address, such as. com,. cn,. org, and only 13 units worldwide
6.LDNS sends a request to the gtld.
7.gTLD accepts this request and returns the address of the name server domain name that corresponds to this domain name. This name server is usually the name server you are registering with.
8.Name server returns the IP and TTL values for that domain name, and the local DNS server caches this correspondence, which is controlled by the TTL cache time.
9. The parse result is returned to the user. The user caches this correspondence on the local system based on the TTL value cache.
The TTL is the life cycle of the domain name resolution, and the TTL value is "Time to live", which simply means that the DNS records cache time on the DNSF server.
Clear Cache Domain Name
The Local DNS cache is controlled by TTL and is difficult to intervene manually. However, the local machine cache can be cleared with the following command:
Windows: ipconfig /flushdnsLinux: /etc/init.d/nscd restart
How to parse several domain names
Domain name resolution records are mainly divided into: A record, MX record, CNAME record, NS record and TXT record
A record: A represents an address that is used to specify the IP address of the domain name. the domain name can be many to one but not a pair more.
MX record: Mail Exchange, that is, a domain name under a messaging server pointing to its own mail servers.
CNAME record: alias resolution. Set a domain name to one or more individual names.
NS RECORD: Specifies a DNS resolution server for a domain name.
TXT record: Set a text description for a host name or domain name.
CDN Working mechanism
CDN is a content distribution network. By adding a new layer of network architecture to the existing Internet, the content of the website is published to the edge of the network closest to the user, so that users can get the content they need and improve the response speed of the user's website.
Load Balancing
Balance work tasks, allocate to multiple operating units, and accomplish tasks together.
Three types of architectures:
1. Link Load Balancing
Load balancing is done by DNS resolution, where the user accesses the target server directly without having to go through another proxy server, usually with faster access.
Disadvantage: Once a server is hung up, because the user's local DNS cache and DNS, the cache is not updated in a timely manner, resulting in users cannot access the domain name.
2. Cluster load Balancing
Hardware load Balancing: A dedicated hardware device is responsible for forwarding requests, too expensive general company could not afford, but the performance is very good (a penny a cent of the goods AH)
Software load Balancing: consists of multiple proxy servers. Network delay is longer.
3. Operating System load Balancing
Use operating system-level soft interrupts or hardware interrupts to complete.
Dynamic Acceleration
In the process of DNS resolution of CDN, the best path of back source is found through dynamic link detection, then all requests are dispatched to the selected path via DNS dispatch, which accelerates the efficiency of user access.
Back to the source: When a user accesses a URL, if the CDN node being resolved does not have the content of the cached response, or if the cache has expired, it will go back to the source station to obtain. If no one accesses, then the CDN node will not take the initiative to go to the source station.
Web request Process