From: http://bbs.chinaunix.net/viewthread.php? Tid = 575138
Preface:
I have maintained several squid servers at work. I have read Duane Wessels (he is also the founder of squid) for many times. His original title is "Squid: the definitive guide ", published by o'reilly. I have translated it into Chinese in my spare time, hoping to help Chinese squid users. For common Internet users, squid can be used as a contemporary server, while squid acts as a Web Accelerator for large sites such as Sina and Netease. Both roles play exceptionally well. The open-source world is as beautiful as the stars, and squid is one of the dazzling stars.
Please contact me if you have any questions about this edition. My email is: yonghua_peng@yahoo.com.cn Peng Yonghua
Bytes --------------------------------------------------------------------------------------
7. disk cache Basics
7.1 cache_dir command
The cache_dir command is one of the most important commands in the squid. conf configuration file. It tells squid how to store the cache file to the disk. The cache_dir command takes the following parameters:
Cache_dir scheme directory size L1 L2 [Options]
7.1.1 parameter: Scheme
Squid supports many different storage mechanisms. The default (original) is UFS. Depending on the operating system, you can choose different storage mechanisms. In./configure, you must use the -- enable-storeio = List Option to compile additional code for other storage mechanisms. In Chapter 8.7, I will discuss aufs, diskd, Coss, and null. Now I will only discuss the UFS mechanism, which is consistent with Aufs and diskd.
7.1.2 parameter: Directory
This parameter is the file system directory. Squid stores the cache object file in this directory. Normally, cache_dir uses the entire file system or disk partition. It usually does not mind whether multiple cache directories are placed in a single file system partition. However, I recommend that you set only one cache directory in each physical disk. For example, if you have two useless disks, you can do this:
# Newfs/dev/da1d
# Newfs/dev/da2d
# Mount/dev/da1d/cache0
# Mount/dev/da2d/cache1
Then add the following lines in Squid. conf:
Cache_dir ufs/cache0 7000 16 256
Cache_dir ufs/cache1 7000 16 256
If you do not have an idle hard disk, you can also use an existing file system partition. Select a partition with a lot of free space, such as/usr or/var, and create a new directory below. For example:
# Mkdir/var/squidcache
Add the following line in Squid. conf:
Cache_dir ufs/var/squidcache 7000 16 256
7.1.3 parameter: Size
This parameter specifies the cache directory size. This is the maximum space of the cache_dir directory that squid can use. It may be difficult to calculate a reasonable value. You must leave sufficient free space for the temporary file and the swap. State log (See Chapter 13.6 ). I recommend mounting an empty file system to run DF:
% DF-K
Filesystem 1k-blocks used avail capacity mounted on
/Dev/da1d 3037766 8 2794737 0%/cache0
/Dev/da2d 3037766 8 2794737 0%/cache1
Here you can see that the file system has approximately 2790m of available space. Remember, UFS retains part of the minimum free space, which is about 8%, which is why squid cannot use all 3040m space.
You may try to allocate 2790m to cache_dir. If the cache is not busy and you rotate logs frequently, this may be feasible. However, for security reasons, we recommend that you retain 10% of the space. These extra spaces are used to store squid swap. State files and temporary files.
Note that the cache_swap_low command also affects the space used by squid. I will discuss its upper and lower limits in Chapter 7.2.
The bottom line is that you should estimate the cache_dir size conservatively at the beginning. Set cache_dir to a smaller value and allow the cache to be fully written. After squid runs for a period of time, the cache directory will be filled up, so that you can re-evaluate the cache_dir size settings. If you have a lot of free space, you can easily increase the size of the cache directory.
7.1.3.1 inodes
Inodes (I node) is the basic structure of UNIX file systems. They contain information about disk files, such as license, owner, size, and timestamp. If your file system runs beyond the I node limit, you cannot create new files, even if there is space available. The system running beyond the I node is very bad, so before running squid, you should confirm that there are enough I nodes.
Programs that create new file systems (such as newfs or mkfs) retain a certain number of I nodes based on the total space. These programs usually allow you to set the I-node ratio of the disk space. For example, read the-I option in the newfs and mkfs manuals. The ratio of disk space to I nodes determines the file size that the file system can actually support. Most Unix systems create an I node every 4 kb, which is usually sufficient for squid. Research shows that for most cache proxies, the actual file size is about 10 KB. You may start with 8 KB per I node, but this is risky.
You can use the DF-I command to monitor system I nodes, for example:
% DF-ik
Filesystem 1k-blocks used avail capacity iused ifree % iused mounted on
/Dev/ad0s1a 197951 57114 125001 31% 1413 52345/
/Dev/ad0s1f 5004533 2352120 2252051 51% 129175 1084263 11%/usr
/Dev/ad0s1e 396895 6786 358358 2% 205 99633 0%/var
/Dev/da0d 8533292 7222148 628481 92% 430894 539184 44%/cache1
/Dev/da1d 8533292 7181645 668984 91% 430272 539806 44%/cache2
/Dev/da2d 8533292 7198600 652029 92% 434726 535352 45%/cache3
/Dev/da3d 8533292 7208948 641681 92% 427866 542212 44%/cache4
If I node is used (% iused) less than space (capacity), it is good. Unfortunately, you cannot add more I nodes to an existing file system. If you find that the operation has exceeded the I node, you must stop squid and recreate the file system. If you do not want to do this, cut the cache_dir size.
7.1.3.2 relationship between disk space and process size
Squid's disk space usage directly affects its memory usage. Each object on the disk requires a small amount of memory. Squid uses memory to index disk data. If you have added a new cache directory or the disk cache size, make sure you have enough free memory. If the size of the squid process reaches or exceeds the physical memory capacity of the system, squid performance can be greatly reduced.
Each object in the Squid cache directory consumes 76 or 112 bytes of memory, depending on your system. The memory is allocated in the structure of storeentry, MD5 Digest, and LRU Policy node. Small commands (for example, 32-bit) systems, like those based on Intel Pentium, take 76 bytes. The 64-bit CPU instruction system, each of which is 112 bytes. By reading the memory management documents of Cache Management, you can find out how much memory these structures consume in your system (see chapter 14.2.1.2 ).
Unfortunately, it is difficult to accurately predict how much additional memory is needed for a given amount of disk space. It depends on the actual response size, which is based on time fluctuations. In addition, squid allocates memory for other data structures and targets. Do not assume that your estimation is correct. You should always monitor the size of squid processes. If necessary, consider reducing the cache size.
7.1.4 parameters: L1 and L2
For the UFS, aufs, and diskd mechanisms, squid creates a level-2 directory tree under the cache directory. The L1 and L2 Parameters specify the number of level 1 and level 2 directories. The default values are 16 and 256. Figure 7-1 shows the file system structure.
Figure 7-1. cache directory structure based on UFS storage mechanism
(Thumbnail)
Some people think that squid depends on the special values of L1 and L2 and performs better or worse. This sounds unrelated, that is, small directories are faster than large directories. In this way, L1 and L2 may be large enough, so that there are fewer files in the L2 directory.
For example, if your cache directory stores 7000 MB and the actual file size is 10 KB, you can store 700,000 files in cache_dir. Use 16 L1 and 256 L2 directories, with a total of 4096 level-2 directories. The result of 700,000/4096 is that there are about 170 files in each level-2 directory.
If the L1 and L2 values are small, the process of using squid-Z to create the swap directory will be faster. In this way, if your cache file is small, you may need to reduce the number of L1 and L2 directories.
Squid assigns a unique file number to each cache target. This is a 32-bit integer that uniquely identifies files in the disk. Squid uses a relatively simple algorithm to convert a file number to a path name. This algorithm uses L1 and L2 as parameters. In this way, if you change L1 and L2, you change the ing from file number to path name. Changing these parameters for non-empty cache_dir results in unavailability of existing files. After the cache directory is activated, you never need to change the L1 and L2 values.
Squid assigns a file number in the cache directory sequence. An algorithm (for example, storeufsdirfullpath () used to map each L2 file to the same level-2 directory. Squid uses the reference location to do this. This algorithm makes HTML files and embedded images more likely to be stored in the same level-2 directory. Some people want squid to evenly store the cache file in each level-2 directory. However, when the cache is initially written, you can find that only a few directories at the beginning contain some files, for example:
% Cd/cache0; Du-K
2164./00/00
2146./00/01
2689./00/02
1974./00/03
2201./00/04
2463./00/05
2724./00/06
3174./00/07
1144./00/08
1./00/09
1./00/0
1./00/0 B
This is completely normal and don't worry.
7.1.5 parameter: Options
Squid has two cache_dir options that depend on different storage mechanisms: Read-only label and max-size value.
7.1.5.1 read-only
The read-only option indicates that squid continues to read files from cache_dir, but does not write a new target to it. It looks as follows in the squid. conf file:
Cache_dir ufs/cache0 7000 16 256 read-only
If you want to migrate the cache file from one disk to another, you can use this option. If you simply add a cache_dir and delete the other, Squid's hit rate will decrease significantly. When the old directory is read-only, you can still get the cache hit from it. After a period of time, you can delete the read-only cache directory from the configuration file.
7.1.5.2 max-size
With this option, you can specify the maximum target size stored in the cache directory. For example:
Cache_dir ufs/cache0 7000 16 256 max-size = 1048576
Note that the value is in bytes. In most cases, you do not need to add this option. If you do, try to store all cache_dir rows in the order of max-size (from small to large ).
7.2 disk space Benchmark
The cache_swap_low and cache_swap_high commands control the replacement of objects stored on the disk. Their value is the percentage of the Maximum Cache volume, which is derived from the total size of all cache_dir. For example:
Cache_swap_low 90
Cache_swap_high 95
If the total disk usage is lower than cache_swap_low, squid will not delete the cache target. If the cache volume increases, squid will gradually Delete the target. In a stable state, you find that the disk usage is always close to the cache_swap_low value. You can view the current disk usage on the storedir page of the cache manager (See Chapter 14.2.1.39 ).
Please note that changing the cache_swap_high may not have much effect on Squid's disk usage. In earlier versions of squid, this parameter plays an important role; however, this is not the case now.
7.3 object size limit
You can control the maximum and minimum sizes of cache objects. Responses larger than maximum_object_size are not cached on the disk. However, they are still proxies. The logic behind this command is that you don't want to waste space for a very large response. These spaces can be used better by many small responses. The syntax is as follows:
Maximum_object_size size-Specification
The following are some examples:
Maximum_object_size 100 KB
Maximum_object_size 1 MB
Maximum_object_size 12382 bytes
Maximum_object_size 2 GB
Squid checks the response size in two different ways. If the response contains the Content-Length header, squid compares this value with the maximum_object_size value. If the former is greater than the latter, the object cannot be cached immediately and will not consume any disk space.
Unfortunately, not every response has a Content-Length header. In this case, squid writes the response to the disk as data from the original server. After the response is complete, squid checks the object size. In this way, if the object size reaches the maximum_object_size limit, it will continue to consume disk space. The total cache size increases only when squid reads the response.
In other words, the active or transmitted target does not affect the internal cache size value of squid. This is advantageous because it means that squid will not delete other targets in the cache, unless the target cannot be cached and affects the total cache size. However, this also has a disadvantage. If the response is very large, squid may run beyond the free space of the disk. To reduce the chance of such a situation, you should use the reply_body_max_size command. A response that reaches the limit of reply_body_max_size is immediately deleted.
Squid also has a minimum_object_size command. It allows you to set a minimum limit on the size of the cached object. Responses smaller than this value will not be cached in disk or memory. Note that this size is compared with the response Content Length (for example, the response body size), which is included in the HTTP header.
7.4 allocate objects to the cache directory
When squid wants to store a cacheable response to a disk, it calls a function to select the cache directory. Then it opens a disk file in the selected directory for writing. If the open () call fails because of some rationale, the response will not be stored. In this case, squid does not try to open another disk file in another cache directory.
Squid has two cache_dir selection algorithms. The default algorithm is lease-load. The alternative algorithm is round-robin.
The least-load algorithm, just as its name means, selects the cache directory with the smallest workload. The load concept depends on the storage mechanism. For the aufs, Coss, and diskd mechanisms, the load is related to the number of pending operations. For UFS, the load remains unchanged. When the cache_dir load is equal, the algorithm uses the free space and maximum target size as additional selection conditions.
The selection algorithm also depends on the max-size and read-only options. If squid knows that the target size exceeds the limit, it skips the cache directory. It also skips any read-only directories.
The round-robin algorithm also uses load as a metric. It selects a cache directory with a load less than 100%. Of course, the storage targets in this directory do not exceed the size limit and are not read-only.
In some cases, squid may fail to select the cache directory. This may happen if all cache_dir values are full or the actual target size of all directories exceeds the max-size limit. In this case, squid does not write the target to the disk. You can use the cache manager to track the number of failures in the cache directory selected by squid. On the store_io page (Chapter 14.2.1.41), find the create. select_fail line.
7.5 replacement policy
The cache_replacement_policy command controls the disk cache replacement policy of squid. Squid2.5 provides three different replacement strategies: least recently used (LRU), greedy dual size (gdsf), and least frequently used dynamic aging (lfuda ).
LRU is the Default policy, not squid, which is applicable to most other cache products. LRU is a popular choice because it is easy to execute and provides excellent performance. On a 32-bit system, LRU uses less memory than others (12 to 16 bytes per target ). In 64-bit systems, all policies use 24 bytes per target.
In the past, many researchers have proposed LRU. Other typical policies are designed to improve other features of the cache, such as response time, hit rate, or byte hit rate. However, the investigator's improvement results may also be misleading. Some research uses unrealistic small cache targets. Other research shows that the selection of replacement policies becomes less important when the cache size increases.
If you want to use the gdsf or lfuda policy, you must use the -- enable-removal-policies option in./configure (See Chapter 3.4.1 ). Martin arlitt and John Dilley at the HP lab wrote gdsf and lfuda algorithms for squid. You can read their documents online:
Http://www.hpl.hp.com/techreports/1999/HPL-1999-69.html
I also discussed these algorithms in the book "Web caching" published by o'reilly.
The cache_replacement_policy command value is unique, which is very important. Unlike most other commands in Squid. conf, the position of this command is very important. The value of the cache_replacement_policy command is actually used when squid is used to parse the cache_dir command. By setting a replacement policy in advance, you can change the cache_dir replacement policy. For example:
Cache_replacement_policy LRU
Cache_dir ufs/cache0 2000 16 32
Cache_dir ufs/cache1 2000 16 32
Cache_replacement_policy heap gdsf
Cache_dir ufs/cache2 2000 16 32
Cache_dir ufs/cache3 2000 16 32
In this case, the first two cache directories use the LRU replacement policy, and the next two cache directories use gdsf. Remember, if you have decided to use the config option of the cache manager (See Chapter 14.2.1.7), the features of this replacement policy directive are very important. The cache manager only outputs the value of the last replacement policy and places it before all the cache directories. For example, you may find the following in Squid. conf:
Cache_replacement_policy heap gdsf
Cache_dir ufs/tmp/cache1 10 4 4
Cache_replacement_policy LRU
Cache_dir ufs/tmp/cache2 10 4 4
But when you select Config from the cache manager, you get:
Cache_replacement_policy LRU
Cache_dir ufs/tmp/cache1 10 4 4
Cache_dir ufs/tmp/cache2 10 4 4
As you can see, the heap gdsf settings in the first two cache directories are lost.
7.6 Delete cache objects
In some cases, you must manually delete one or more objects from the Squid cache. These situations may include:
+ Your user complaints always receive outdated data;
+ Your cache is "poisoned" due to a response ";
+ The Squid cache index becomes faulty after disk I/O errors or frequent crash and restart;
+ You want to delete some large targets to release space for new data;
+ Squid always caches responses from the local server. Now you don't want it to do this.
Some of the above problems can be solved by forcing the web browser to reload. However, this is not always reliable. For example, some browsers load other programs to display some class types. The program may not have a reload button or even understand the cache.
If necessary, you can always use the squidclient program to reload the cache target. Simply use the-r option before the URI:
% Squidclient-r http://www.lrrr.org/junk>;/tmp/foo
If you set the ignore-reload option in the refresh_pattern command, you and your users cannot force the cache to respond to updates. In this case, you 'd better clear these cache objects with errors.
7.6.1 delete an object
Squid accepts a client request to delete a cache object. The purge method is not one of the official HTTP request methods. Unlike Delete, squid forwards the latter to the original server. The purge request requires squid to delete the target submitted in the URI. Squid returns 200 (OK) or 404 (not found ).
The purge method is somewhat risky because it deletes the cache target. Squid disables the purge mode unless you have defined the corresponding ACL. Normally, you only allow purge requests from the local machine and a few trusted hosts. The configuration looks as follows:
ACL adminboxes SRC 127.0.0.1 172.16.0.1 192.168.0.1
ACL purge method purge
Http_access allow adminboxes purge
Http_access deny purge
The squidclient program provides an easy way to generate a purge request, as follows:
% Squidclient-M purge http://www.lrrr.org/junk
Instead, you can use other tools (such as Perl scripts) to generate your own HTTP requests. It is very simple:
Purge http://www.lrrr.org/junk HTTP/1.0
Accept :*/*
Note that a separate URI does not uniquely indicate a cache response. Squid also uses the original request method in the cache keyword. If the response contains different headers, it can also use other request headers. When you publish a purge request, squid uses the original get and head request methods to find the cache target. In addition, squid will delete all variants in the response, unless you specify the variants to be deleted in the corresponding header of the purge request. Squid only deletes the variants of the get and head requests.
7.6.2 delete a group of Objects
Unfortunately, squid does not provide a good mechanism to immediately delete a group of objects. This requirement usually occurs when someone wants to delete all objects belonging to the same original server.
For many reasons, squid does not provide this function. First, squid must traverse all cached objects and perform linear search, which consumes a lot of CPU and consumes a long time. When squid is being searched, users will face performance degradation issues. Second, squid maintains the MD5 Algorithm for the URI in the memory. MD5 is a one-way hash, which means, for example, you cannot determine whether a given MD5 hash is generated by a URI containing the "www.example.com" string. The only method is to re-calculate the MD5 value from the original Uri and check whether they match. Because squid does not retain the original Uri, it cannot perform this re-calculation.
So what should we do?
You can use the data in access. log to obtain the URI list, which may be in the cache. They are then used by squidclient or other tools to generate purge requests, for example:
% Awk '{print $7}'/usr/local/squid/var/logs/access. log/
| Grep www.example.com/
| Xargs-N 1 squidclient-M purge
7.6.3 delete all objects
In extreme cases, you may need to delete the entire cache or at least a cache directory. First, you must confirm that squid is not running.
One of the easiest ways for squid to forget all cached objects is to overwrite the swap. State file. Note that you cannot simply delete the swap. State file, because squid then needs to scan the cache directory and open all target files. You cannot simply truncate swap. State to 0. Instead, you should put a single byte in it, for example:
# Echo ''>;/usr/local/squid/var/Cache/swap. State
When squid reads the swap. State file, it gets an error because the record here is too short. The next row is read to the end of the file. Squid completes the reconstruction process without loading any target metadata.
Note that this technology will not delete cache files from disks. You just make squid think its cache is empty. When squid is running, it adds a new file to the cache and may overwrite the old file. In some cases, this may cause your disk to be used in excess of free space. If this happens, you must delete the old file before restarting squid again.
One of the ways to delete a cache file is to use RM. However, it usually takes a long time to delete all files created by squid. To enable quick start of squid, You can rename the old cache directory, create a new directory, start squid, and delete the old directory at the same time. For example:
# Squid-K shutdown
# Cd/usr/local/squid/var
# Mv cache oldcache
# Mkdir Cache
# Chown nobody: Nobody Cache
# Squid-z
# Squid-S
# Rm-RF oldcache &
Another technology is simply running newfs (or mkfs) on the cache file system ). This can only be run when your cache_dir uses the entire disk partition.
7.7 refresh_pattern
The refresh_pattern command indirectly controls the disk cache. It helps squid determine whether a given request is a cache hit or is treated as a cache loss. Loose settings increase your cache hit rate, but also increase the chance for users to receive outdated responses. On the other hand, conservative settings reduce the cache hit rate and out-of-date response.
The refresh_pattern rule applies only to responses with no definite expiration time. The original server can use the Expires header or the cache-control: Max-age command to specify the expiration time.
You can place any number of refresh_pattern rows in the configuration file. Squid looks for them in order to match the regular expression. When squid finds a match, it uses the corresponding value to determine whether a cache response is alive or expired. The refresh_pattern syntax is as follows:
Refresh_pattern [-I] Regexp min percent Max [Options]
For example:
Refresh_pattern-I/. jpg $30 50% 4320 reload-into-IMS
Refresh_pattern-I/. PNG $30 50% 4320 reload-into-IMS
Refresh _pattern-I/. htm $0 20% 1440
Refresh_pattern-I/. html $0 20% 1440
Refresh_pattern-I. 5 25% 2880
The Regexp parameter is a case-sensitive regular expression. You can use the-I option to make them case insensitive. Squid checks the refresh_pattern row in order. When one of the regular expressions matches the URI, it stops searching.
The min parameter indicates the number of minutes. It is the minimum time limit for outdated responses. If the time for a response to reside in the cache does not exceed this minimum limit, it will not expire. Similarly, the max parameter is the maximum time limit for response survival. If the time for a response to reside in the cache exceeds this limit, it must be refreshed.
The response between the minimum and maximum time limits faces the LM-factor algorithm last modified by squid. For such a response, squid calculates the response age and the last modified coefficient, and then compares it as a percentage value. A simple response age is the amount of time that has elapsed since the original server was created or after the last response was verified. The source age is different between the last-modified and date headers. Lm-factor is the ratio of response age to source age.
Figure 7-2 demonstrates the LM-factor algorithm. Squid caches a target for three hours (based on the date and last-modified headers ). The LM-factor value is 50%, and the response will survive in the next 1.5 hours. After that, the target will expire and be treated as obsolete. If the user requests the cache target during the survival period, squid returns an unconfirmed cache hit. If a request occurs during the expiration time, squid forwards the confirmation request to the original server.
Figure 7-2 Calculate the expiration time based on LM-Factor
(Thumbnail)
It is important to understand the order in which squid checks different values. The following is a simple description of squid's refresh_pattern algorithm:
+ If the response age exceeds the max value of refresh_pattern, the response expires;
+ If lm-factor is less than the refresh_pattern percentage value, the response will survive;
+ If the response age is less than the min value of refresh_pattern, the response will survive;
+ In other cases, the response expires.
The refresh_pattern command also has a few options, causing Squid to violate the HTTP protocol specification. They are as follows:
Override-Expire
This option causes squid to check the min value before checking the Expires header. In this way, a non-zero min time enables squid to return an unconfirmed cache hit even if the response preparation expires.
Override-lastmod
This option causes squid to check the min value before lm-factor percentage.
Reload-into-IMS
This option allows squid to send a request using the no-Cache command in the confirmation request. In other words, before forwarding a request, squid adds an IF-modified-since header to the request. Note that this only works when the target has a last-modified timestamp. The incoming requests retain the no-Cache command so that it can reach the original server.
Ignore-Reload
This option causes squid to ignore any no-Cache commands in the request.