High-performance Open source HTTP accelerator varnish

Last Update:2015-04-12 Source: Internet

Author: User

Tags varnish

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Varnish Introduction

The varnish is a high-performance, open-source HTTP accelerator. The varnish project was the first version released in 2006, 0.9. More than eight years ago, this document has also mentioned varnish is not stable, it was written in 2007, after Varnish development team and netizens hard work, now the varnish is very robust. Many portals have deployed varnish and have reacted well, even more stable than squid, more efficient and less resource-intensive. believe that in reverse proxy, web acceleration, varnish has enough capacity to replace squid.

System Architecture for Varnish

Varnish mainly runs two processes: the management process and the child process (also called the cache process).

The management process mainly implements the application of new configuration, compiling VCL, monitoring varnish, initializing varnish and providing a command line interface. The management process will probe the child process every few seconds to determine if it is functioning properly, and management will restart the child process if the child process has not been responded to within the specified length of time.

The child process contains several types of threads, common as:
Acceptor thread: Receives a new connection request and responds;
Worker thread: The child process initiates a worker thread for each session, so there may be hundreds of worker threads or more in high concurrency scenarios;
Expiry threads: Purging outdated content from the cache;

Varnish's Log

In order to interact with other parts of the system, the child process uses a shared memory log that can be accessed through the file system interface, so if a thread needs to log information, it only needs to hold a lock and then write the data to a memory region in the shared memory. Then release the lock you hold. In order to reduce competition, each worker thread uses the log data cache.

The shared memory log size is generally 90M, which is divided into two parts, the previous part is the counter, and the second half is the data requested by the client. Varnish provides a number of different tools, such as Varnishlog, VARNISHNCSA, or varnishstat, to analyze the information in the shared memory log and to display it in a specified manner.

Vcl

Varnish configuration Language (VCL) is a Varnish tool for configuring caching policies, a simple programming language based on domain specific that supports limited arithmetic and logical operations, Allow string matching using regular expressions, allow users to use set custom variables, support if judgments, and built-in functions and variables. A cache policy written using VCL is typically saved to a. vcl file, which needs to be compiled into a binary format before it can be called by varnish. In fact, the entire cache strategy consists of several specific subroutines, such as VCL_RECV, Vcl_fetch, and so on, which are executed at different locations (or times), and if the subroutine is not previously customized for a location, varnish will execute the default definition.

The VCL policy is converted to C code by the management process before it is enabled, and then the C code is compiled by the GCC compiler into a binary program. When the compilation is complete, management is responsible for connecting it to the varnish instance, the child process. It is because the compilation work is done outside of the child process that it avoids the risk of loading the malformed VCL. As a result, the cost of varnish configuration changes is very small, it can maintain several old versions of the configuration that are still in the reference, but also allows the new configuration to take effect immediately. The compiled old version configuration is usually discarded when the varnish is restarted, and can be done using the Varnishadm vcl.discard command if manual cleanup is required.

Varnish back-End storage

Varnish supports a number of different types of back-end storage, which can be specified with the-S option at varnishd startup. The types of back-end storage include:
(1) File: Store all cached data with a specific file and map the entire cache file to the memory area via the operating system's MMAP () system call (if conditions permit);
(2) malloc: Use the malloc () library call to request a specified size of memory space to the operating system at varnish startup to store cached objects;
(3) Persistent (experimental): The same function as file, but can persist data (that is, restart varnish data will not be purged), is still in the test period;

Varnish cannot track whether a cached object is stored in a cache file, and thus does not know if the cache file on the disk is available, so the file storage method clears the data when the varnish is stopped or restarted. The advent of the persistent method has made up for this, but persistent is still in beta, for example, it is not yet possible to effectively handle a situation where the overall size of the cache object exceeds the cache space, so it only works with scenes with huge cache space.

Choosing to use the appropriate storage method helps improve the system, and from an empirical standpoint, it is recommended to use malloc when the memory space is sufficient to store all cached objects, whereas file storage will perform better performance. However, it is important to note that Varnishd actually uses more space than is specified with the-s option, which in general requires more than 1K of storage space for each cache object, which means that for scenarios where 1 million cache objects are used, The cache space it uses will exceed the specified size of about 1G. In addition, in order to save data structure and so on, varnish itself will take up a small amount of memory space.

When specifying the type of cache used for VARNISHD, the-s option can accept the following parameter formats:
Malloc[,size] or
File[,path[,size[,granularity]] or
persistent,path,size {Experimental}

The granularity in file is used to set the cache space allocation unit, which is the default unit of bytes, and all other sizes will be rounded.

Varnish State engine

VCL is used to let the administrator define the caching policy, and the defined policy will be parsed by the varnish management process, converted to C code, compiled into a binary program, and connected to the child process. There are several so-called states in the varnish, where the policies defined by VCL can be appended to complete the corresponding caching processing mechanism, so VCL is often referred to as a "domain-specific" language or state engine, and "domain-specific" refers to some data appearing only in a particular state.

1. VCL State Engine

In the VCL state engine, states are correlated, but isolated from each other, each engine uses return (x) to exit the current state and instruct varnish to enter the next state.

When varnish begins processing a request, it first needs to parse the HTTP request itself, such as getting the request method from the header, verifying that it is a legitimate HTTP request, and so on. When these basic analyses are finished, you need to make the first decision whether Varnish is looking for the requested resource from the cache. The implementation of this decision needs to be done by the VCL, simply, by the Vcl_recv method. If the administrator does not have a custom vcl_recv function, varnish will execute the default VCL_RECV function. However, even if the administrator has customized the VCL_RECV, it will still execute the default VCL_RECV function if the custom VCL_RECV function is not specified for its terminating operation (terminating). In fact, varnish is strongly recommended to have varnish perform the default VCL_RECV to handle possible vulnerabilities in custom VCL_RECV functions.

2. VCL syntax

The VCL design is reference to the C and Perl languages, so it is easy to understand for those with C or Perl programming experience. The basic syntax is described below:
(1)//, #或/* Comment * * for comments
(2) Sub $name definition function
(3) No loops supported, built-in variables
(4) Use termination statement, no return value
(5) Domain-specific
(6) Operator: = (Assignment), = = (equivalent comparison), ~ (Pattern matching),! (inverse), && (Logic and), | | (Logical OR)

The VCL function does not accept parameters and does not have a return value, so it is not a real function, which also limits the data passing within the VCL to be hidden inside the HTTP header. The return statement of the VCL is used to return control from the VCL state engine to varnish, not the default function, which is why the VCL only terminates the statement without returning a value. Also, for each domain, you can define one or more termination statements to tell varnish what to do next, such as querying the cache or not querying the cache.

3. The built-in function of VCL

VCL provides several functions to implement string modifications, add bans, restart the VCL state engine, and turn control back to varnish.

Regsub (Str,regex,sub)
Regsuball (str,regex,sub): These two are used to search for a specified string based on a regular expression and replace it with the specified string, but Regsuball () can replace the string in str with the Regex match to Sub,regsub () Replace only once;
Ban (expression):
Ban_url (regex): Bans all cached objects whose URLs can be matched by a regex;
Purge: The selection of an object from the cache and its associated variants are deleted, which can be done through the HTTP protocol purge method;
Hash_data (str):
Return (): Returns control to varnish at the end of a VCL domain, and instructs varnish how to perform subsequent actions; the commands that can be returned include: lookup, pass, pipe, Hit_for_pass, Fetch, Deliver and hash, but a specific domain may return only certain instructions, not all of the instructions listed previously;
Return (restart): Re-run the entire VCL, that is, start processing again from VCL_RECV, each reboot will increase the value of the Req.restarts variable, and the Max_restarts parameter is used to limit the maximum number of restarts.

This article is from the "Linux Learning path" blog, so be sure to keep this source http://xslwahaha.blog.51cto.com/4738972/1631392

High-performance Open source HTTP accelerator varnish

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More