Sinsing website Architect's note fifth: Cache details

Last Update:2014-09-17 Source: Internet

Author: User

Tags internet cache varnish squid proxy

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

before we introduced the cache technology, but for the simple use of caching technology is not enough, but also need to grasp the performance of the site to improve, and alleviate the huge pressure on the background application due to large-volume access, that is, caching technology.

First of all, what is the cache, the cache is generally the cache, the system's cache and the hardware device, like caching, to temporarily store the data that needs to be processed, because we know that from the cache read much faster than from the hard disk, so the cache server stores more data, The smaller the pressure on the backend application server, the higher the performance will be.

And the purpose of the website cache is to improve the performance of the site, speed up access, reasonable caching of some type of data, can reduce the load of the system. Because the data in the operating memory is much faster than the data on the hard disk, this is the most important way to accelerate the site. If the use of proxy or cache server to achieve, the site without any changes in the case, it can have a significant acceleration effect.

The basic way of caching is to set the page elements of the periodic cache, the cache time can range from a few seconds to a few days, in the cache time, the page only needs to be generated once, and then every time a user visits this page, the site server and database do not need to regenerate the same page, This greatly reduces the burden on the Web server and the database.

Suppose a hotspot page is accessed 10,000 times an hour, and if the page is accessed every time by reading the background database and compiling the generated page over and over again, it is very inefficient to generate 10,000 pages in one hours. If the page is periodically cached for 10 minutes, which is generated every 10 minutes, only six times are required in one hours. Which of these two ways is high efficiency, which efficiency is low, is not at a glance?

There are two main ways to cache the site, the first is the memory cache, the data stored in the server's memory space, this mode is the most efficient, but we can not blindly load all the data into memory, after all, the server resources are limited. The other is the file cache, the data is usually stored in the server's hard disk space, can be stored in a variety of format types of files, such as TXT, CSS, JS, jpg, etc., but note that the server IO processing capacity is limited, when one-time read too large data, the efficiency will be greatly discounted. This requires a reasonable file structure to solve this problem.

Here we introduce squid, it is a high-performance cache server, support FTP, HTTP and other protocols, and squid is a separate, non-modular, IO-driven process to handle all client requests. Squid caches data elements in memory and also caches the results of DNS queries, squid not only supports non-modular DNS queries, but also negatively caches failed requests and supports SSL and access control rules, due to the use of the ICP (Lightweight Internet Cache protocol). Squid enables cascading proxy arrays to maximize bandwidth savings.

Let's talk about its workflow, which consists of a main service program squid, a DNS query program, several rewrite requests, and a program that performs authentication. Squid boot, you can derive a predetermined number of dnsserver processes, each dnsserver process can be a separate DNS query, which greatly reduces the time the server waits for DNS queries.

Squid Proxy server can be divided into the following: the first is the normal proxy server, the standard proxy server is used to cache static Web pages, when the cached page second access, the browser will be directly from the local proxy server to obtain the request data without the back end of the Web server to send the request, This not only saves the bandwidth, but also improves the access speed. This is accomplished by explicitly indicating the IP address and port number of the server on each of the internal hosts ' browsers. When the client on the Internet, each time the request sent to the Squiddialing server processing, the proxy server according to the request to determine whether to connect to the remote Web server to obtain data, if the region has a target file, the file is directly passed to the user. If not, take one and save a copy locally and send the file to the client browser.

The second is the transparent proxy server, the transparent proxy server and the standard proxy server functions exactly the same, but the agent operation on the client's browser is also transparent, that is, do not need to indicate the proxy server IP and port. The transparent proxy server blocks network traffic and filters out HTTP traffic that is accessed externally. If the cache server is requesting information from the client, the cache server sends the data directly to the user, and if there is no client request information on the cache server, the request is made to the remote server, and the remaining operations are identical to the standard proxy server.

The third is the reverse proxy server, which is different from the first two principles, its purpose is to reduce the load of the original Web server. The reverse proxy server assumes a static page request to the original Web server to prevent the original server from loading too high. The reverse proxy server is located between the local Web server and the Internet, handles all requests to the Web server, and if there is a web-requested page on the proxy server, sends the content directly to the user and, if not, makes a request to the Web server, retrieves the data, passes the local cache and sends the user , this approach reduces the number of requests to the Web server and reduces the load on the Web server.

Understanding the above principles, we also need to understand the concept of several cache management, the cache hit, that is, the squid every time the mid-term cache to meet the HTTP request occurs, the cache hit rate, that is, all HTTP requests in the proportion of hits, the cache is lost, That is, when squid does not meet HTTP requests from the cache, there are many reasons for this: for example, when squid receives a request for a particular resource for the first time, a cache loss occurs, and the second reason is that squid clears the cache to free up space for new objects.

Sarg is a SQUID log analysis tool, using HTML format, lists each user access to Internt site information, time occupation information, ranking, number of connections, visits and so on.

Varnish is another high-performance, open-source reverse proxy server and cache server, and its developers are one of the core developers of FreeBSD, which employs a new software architecture and is closely related to the current hardware system. Because our computer's memory in addition to main storage, but also includes the CPU L1 cache, L2 cache, even include L3 cache, hard disk also has cache, and squid's architecture makes it unable to do the best access, but the operating system can do, this part of the work to the operating system to do, This is the design architecture of the varnish cache.

One example of this is the fact that Norway's largest online newspaper company used three varnish servers instead of the original 12 squid servers, and the performance was better enough to see what varnish was good at.

So what are the advantages of varnish compared to squid? The 1th is better stability, both in the completion of the same load of work, squid server failure probability is higher than varnish, that is, squid need to restart frequently. The 2nd is that varnish access is faster, using the Visual page cache technology, where all cached data is read directly from memory. And squid reads the cached data from the hard disk, so the varnish is faster on access. The 3rd is that varnish supports higher concurrent connections, and the 4th is that varnish can manage the cache through the management port, using regular expressions to bulk purge portions of the cache, while Squid does not.

So what is the main drawback of varnish? The 1th is that in high concurrency, the CPU and IO and memory resources are more expensive than squid, and once the varnish process is suspended, crashed, or restarted, the cached data is freed from memory, causing great pressure on the backend.

In short, I personally feel varnish replace squid is just a matter of time, expect it to do better, refueling! For their use, please pay attention to my tutorial.

Sinsing website Architect's note fifth: Cache details

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More