The varnish of accelerating and caching technology

Source: Internet
Author: User
Tags varnish

Varnish

Varnish is a high-performance and open-source reverse proxy server and HTTP Accelerator, with a new software architecture, and now the hardware system closely, compared with the traditional squid, Varnish with higher performance, faster, more convenient management and many other advantages, Many large websites are beginning to try to replace squid with varnish, which promotes varnish's rapid development.

Norway's largest online newspaper, Verdens Gang (vg.no), used 3 Varnish instead of the original 12 Squid, which is a better performance than before, which is Varnish's most successful application case.

First, the installation of varnish

First install the Pcre library, the Pcre library is compatible with regular expressions, if not installed, you will be prompted not to find the Pcre library when installing the varnish2.0 version above. The following is the installation process for Pcre:

#tar zxvf pcre.tar.gz#cd pcre/#./configure--prefix=/usr/local/pcre/#make && make install

Install varnish:

#tar xzvf varnish-3.0.2.tar.gz#cd varnish-3.0.2#export pkg_config_path =/usr/local/pcre/lib/pkgconfig #./configure-- Prefix=/usr/local/varnish#make #make Install

Start varnish:


#varnishd-F/etc/varnish/default.vcl-s file,/var/varnish_cache,1g-t 127.0.0.1:2000-a 0.0.0.0:9082

The meaning of each parameter:

-f Specifies the varnish configuration file location-s specifies how the varnish cache is stored, commonly used in the following ways: "-S file,<dir_or_file>,<size>". -T Address:port set the Telnet management address of varnish and its port-a address:port to represent Varnish's listening address to HTTP and its port

Second, Varnish detailed

(i) About varnish

1. Varnish system Architecture

Varnish mainly runs two processes: the management process and the child process (also called the cache process).

The management process mainly implements the application of new configuration, compiling VCL, monitoring varnish, initializing varnish, and providing a command line interface. The management process will probe the child process every few seconds to determine if it is functioning properly, and management will restart the child process if the child process has not been responded to within the specified length of time.

The child process contains several types of threads, common as:

Acceptor thread: Receive a new connection request and corresponding;

Worker thread: The child process initiates a worker thread for each session, so there may be hundreds of worker threads or more in high concurrency scenarios;

Expiry threads: Purging outdated content from the cache;

Varnish relies on "workspaces (workspace)" To reduce the likelihood that a thread will compete when requesting or modifying memory. There are several different workspaces within the varnish, the most critical of which is the session workspace where the data is managed.

2. Varnish Log

In order to interact with other parts of the system, the child process uses a shared memory log that can be accessed through the file system interface, so if a thread needs to log information, it only needs to hold a lock and then write the data to a memory region in the shared memory. Then release the lock you hold. In order to reduce competition, each worker thread uses the log data cache.

The shared memory log size is generally 90M, which is divided into two parts, the previous part is the counter, and the second half is the data requested by the client. Varnish provides a number of different tools, such as Varnishlog, VARNISHNCSA, or varnishstat, to analyze the information in the shared memory log and to display it in a specified manner.

3. VCL

Varnish configuration Language (VCL) is a Varnish tool for configuring caching policies, a simple programming language based on domain specific that supports limited arithmetic and logical operations, Allow string matching using regular expressions, allow users to use set custom variables, support if judgments, and built-in functions and variables. A cache policy written using VCL is typically saved to a. vcl file, which needs to be compiled into a binary format before it can be called by varnish. In fact, the entire cache strategy consists of several specific subroutines, such as VCL_RECV, Vcl_fetch, and so on, which are executed at different locations (or times), and if the subroutine is not previously customized for a location, varnish will execute the default definition.

The VCL policy is converted to C code by the management process before it is enabled, and then the C code is compiled by the GCC compiler into a binary program. When the compilation is complete, management is responsible for connecting it to the varnish instance, the child process. It is because the compilation work is done outside of the child process that it avoids the risk of loading the malformed VCL. As a result, the cost of varnish configuration changes is very small, it can maintain several old versions of the configuration that are still in the reference, but also allows the new configuration to take effect immediately. The compiled old version configuration is usually discarded when the varnish is restarted, and can be done using the Varnishadm vcl.discard command if manual cleanup is required.

4. Varnish back-end storage

Varnish supports a number of different types of back-end storage, which can be specified with the-S option at varnishd startup. The types of back-end storage include:

(1) File: Store all cached data with a specific file and map the entire cache file to the memory area via the operating system's MMAP () system call (if conditions permit);

(2) malloc: Use the malloc () library call to request a specified size of memory space to the operating system at varnish startup to store cached objects;

(3) Persistent (experimental): The same function as file, but can persist data (that is, restart varnish data will not be purged), is still in the test period;

Varnish cannot track whether a cached object is stored in a cache file, and thus does not know if the cache file on the disk is available, so the file storage method clears the data when the varnish is stopped or restarted. The advent of the persistent method has made up for this, but persistent is still in beta, for example, it is not yet possible to effectively handle a situation where the overall size of the cache object exceeds the cache space, so it only works with scenes with huge cache space.

Choosing to use the appropriate storage method helps improve the system, and from an empirical standpoint, it is recommended to use malloc when the memory space is sufficient to store all cached objects, whereas file storage will perform better performance. However, it is important to note that Varnishd actually uses more space than is specified with the-s option, which in general requires more than 1K of storage space for each cache object, which means that for scenarios where 1 million cache objects are used, The cache space it uses will exceed the specified size of about 1G. In addition, in order to save data structure and so on, varnish itself will take up a small amount of memory space.

When specifying the type of cache used for VARNISHD, the-s option can accept the following parameter formats:

Malloc[,size] or

File[,path[,size[,granularity]] or

persistent,path,size {Experimental}

The granularity in file is used to set the cache space allocation unit, which is the default unit of bytes, and all other sizes will be rounded.

(ii) HTTP protocol and Varnish

1. Cache-Related HTTP header

The HTTP protocol provides multiple headers for page caching and cache invalidation, the most common of which are:

(1) Expires: Used to specify the expiration date/time of a Web object, usually in GMT format; the future of this setting should not be too long, the length of a year is sufficient for most scenarios; it is often used to specify cache cycles for purely static content such as javascripts style sheets or pictures;

(2) Cache-control: Used to define the cache instructions that all caching mechanisms must follow, which are specific directives, including public, private, no-cache (which means they can be stored, but cannot be used in response to client requests before re-verifying their validity), No-store, Max-age, S-maxage, must-revalidate, etc.; the time set in Cache-control overrides the time specified in expires;

(3) Etag: The response header, used to define a version identifier for a Web resource in a response message;

(4) Last-mofified: The response header, which responds to a client request for a last-modified-since or If-none-match header to notify the client of the last modification time of the Web object it requested;

(5) If-modified-since: Conditional request Header, if the requested Web content has changed after the time specified at this header, the server responds to the changed content, otherwise the response 304 (not Modified);

(6) If-none-match: Conditional request header; The Web server defines an ETag header for a Web content that can be fetched and saved at the client's request (that is, the tag) And then in subsequent requests, the If-none-match header is appended with its approved tag list and the server side verifies that its original content has a label that can match a label in this list, and if so, responds 304, otherwise returns the original content;

(7) Vary: Response header, the original server according to the request source of different responses may vary the header, most commonly used is vary:accept-encoding, The encoding format used to inform the caching mechanism that its contents may appear different from the Accept-encoding-header header identifier when requested by the user;

(8) Age: The cache server can send an additional response header that specifies the validity period of the response, which is usually determined by the browser based on this header, and if the response message header also uses the Max-age directive, then the cache's effective duration is "max-age minus age";

(iii) Varnish state engine

VCL is used to let the administrator define the caching policy, and the defined policy will be parsed by the varnish management process, converted to C code, compiled into a binary program, and connected to the child process. There are several so-called states in the varnish, where the policies defined by VCL can be appended to complete the corresponding caching processing mechanism, so VCL is often referred to as a "domain-specific" language or state engine, and "domain-specific" refers to some data appearing only in a particular state.

1. VCL State Engine

In the VCL state engine, states are correlated, but isolated from each other, each engine uses return (x) to exit the current state and instruct varnish to enter the next state.

When varnish begins processing a request, it first needs to parse the HTTP request itself, such as getting the request method from the header, verifying that it is a legitimate HTTP request, and so on. When these basic analyses are finished, you need to make the first decision whether Varnish is looking for the requested resource from the cache. The implementation of this decision needs to be done by the VCL, simply, by the Vcl_recv method. If the administrator does not have a custom vcl_recv function, varnish will execute the default VCL_RECV function. However, even if the administrator has customized the VCL_RECV, it will still execute the default VCL_RECV function if the custom VCL_RECV function is not specified for its terminating operation (terminating). In fact, varnish is strongly recommended to have varnish perform the default VCL_RECV to handle possible vulnerabilities in custom VCL_RECV functions.

2. VCL syntax

The VCL design is reference to the C and Perl languages, so it is easy to understand for those with C or Perl programming experience. The basic syntax is described below:

(1)//, #或/* Comment * * for comments

(2) Sub $name definition function

(3) No loops supported, built-in variables

(4) Use termination statement, no return value

(5) Domain-specific

(6) Operator: = (Assignment), = = (equivalent comparison), ~ (Pattern matching),! (inverse), && (Logic and), | | (Logical OR)

The VCL function does not accept parameters and does not have a return value, so it is not a real function, which also limits the data passing within the VCL to be hidden inside the HTTP header. The return statement of the VCL is used to return control from the VCL state engine to the

Varnish, not the default function, which is why VCL only terminates the statement without returning a value. Also, for each domain, you can define one or more termination statements to tell varnish what to do next, such as querying the cache or not querying the cache.

3. The built-in function of VCL

VCL provides several functions to implement string modifications, add bans, restart the VCL state engine, and turn control back to varnish.

Regsub (Str,regex,sub)

Regsuball (str,regex,sub): These two are used to search for a specified string based on a regular expression and replace it with the specified string, but Regsuball () can replace the string in Str with the one that the regex matches to Sub,regsub () only

replace it once;

Ban (expression):

Ban_url (regex): Bans all cached objects whose URLs can be matched by a regex;

Purge: The selection of an object from the cache and its associated variants are deleted, which can be done through the HTTP protocol purge method;

Hash_data (str):

Return (): Returns control to varnish at the end of a VCL domain, and instructs varnish how to perform subsequent actions; the commands that can be returned include: lookup, pass, pipe, Hit_for_pass, Fetch, Deliver and hash etc;

A specific domain may return only certain instructions, not all of the instructions listed previously;

Return (restart): Re-run the entire VCL, that is, start processing again from VCL_RECV, each reboot will increase the value of the Req.restarts variable, and the Max_restarts parameter is used to limit the maximum number of restarts.

4, VCL_RECV

VCL_RECV is the first subroutine to be executed after varnish completes decoding the request message to the basic data structure, which typically has four main uses:

(1) Modify the client data to reduce the cache object differences, such as deleting the URL in the www. characters;

(2) Choosing the cache policy based on client data, such as caching only specific URL requests, not caching post requests, etc.;

(3) Execute URL rewrite rules for a web application;

(4) Select the appropriate backend Web server;

You can use the following termination statement, which is the indicated action returned by return () to varnish:

Pass: bypassing the cache, i.e. not querying the content from the cache or not storing the content in the cache;

Pipe: Instead of checking the client or doing anything, it establishes a dedicated "pipeline" between the client and the back-end server and transmits the data directly between the two, and the subsequent transfer of the data in the Keep-alive connection will also pass this

The pipeline is delivered directly and does not appear in any log;

Lookup: Finds the object requested by the user in the cache, and if the cache does not have its requested object, subsequent operations are likely to cache the requested object;

Error: A response message is synthesized by varnish itself, typically in response to an error class message, redirect class information, or the backend Web server health state check class information returned by the load balancer;

VCL_RECV can also achieve a certain sense of security through sophisticated strategies to nip certain attacks in the cradle. At the same time, it can check out some spelling errors and fix them.

The varnish default VCL_RECV is specifically designed to implement a secure caching strategy, which mainly accomplishes two functions:

(1) Only the recognized HTTP methods are processed, and only the get and head methods are cached;

(2) do not cache any user-specific data;

For security reasons, it is generally not necessary to use the return () termination statement in a custom vcl_recv, but to process it by default VCL_RECV and make the appropriate processing decisions.

The following is a custom usage example:

Sub Vcl_recv {if (req.http.user-agent ~ "IPad" | | req.http.user-agent ~ "IPhone" | | req.http.user-agent ~ "Android") {Set req.http.x-device = "mobile";} else {Set req.http.x-device = "desktop";}}

The VCL in this example creates a X-device request header whose value may be mobile or desktop, so the Web server can perform different types of responses based on this to improve the user experience.

5, Vcl_fetch

As mentioned earlier, Vcl_fetch is a cache decision based on the server-side response, as opposed to VCL_RECV, which is based on the client's request to make a cache decision. Pass operations returned in any VCL state engine will be processed by Vcl_fetch

。 There are many built-in variables available in Vcl_fetch, such as the BERESP.TTL variable that is most commonly used to define the cache length of an object. The instructions returned to Arnish via return () are:

(1) Deliver: Caches this object and sends it to the client (via Vcl_deliver);

(2) Hit_for_pass: This object is not cached, but can cause subsequent requests for this object to be sent directly to vcl_pass for processing;

(3) Restart: Restart the entire VCL and increase the restart count, and the error message will be returned after the maximum number of reboots of the max_restarts limit is exceeded;

(4) Error code [reason]: Returns the specified error codes to the client and discards the request;

The default Vcl_fetch discards the cache for any response that uses the Set-cookie header.


















The varnish of accelerating and caching technology

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.