Web cache acceleration Based on reverse proxy--design of a cacheable CMS system

Source: Internet
Author: User
Tags date comparison header html page include connect domain name server port
web| Cache | design

For a Web site with a daily visit to millions, the speed is quickly becoming a bottleneck. In addition to optimizing the application of the content publishing system itself, if you can not need real-time update of the dynamic page output to the static page to publish, the speed of the promotion will be significant, because a dynamic page speed is often slower than static pages 2-10 times, and static Web content if can be cached in memory, The speed of access will be even more than the original Dynamic Web page has a 2-3-magnitude increase.


Comparison of dynamic cache and static cache
Site planning based on reverse proxy acceleration

The implementation of reverse proxy acceleration based on Apache Mod_proxy
Fast implementation of reverse proxy based on squid
Cache-oriented page design
The backend Content Management system's page output complies with the cacheable design, thus can give the performance question to the foreground cache server to solve, thus greatly simplifies the CMS system itself the complexity degree.

Comparison of static and dynamic caching

There may be 2 forms of static page caching: The main difference is whether CMS is responsible for the cache update management of the associated content.

Static cache: Is the new content released at the same time immediately generate the corresponding content of the static page, such as: March 22, 2003, the administrator through the Background Content management interface input an article, immediately generated http://www.chedong.com/tech/2003/03/22/ 001.html this static page, and update the links on the related index pages synchronously.


Dynamic caching: After the new content is published, does not generate the corresponding static page, until the corresponding content issue request, if the foreground cache server can not find the corresponding cache, to the background Content Management Server to make a request, the background system will generate a static page of content, the user first time to visit the page may be slower, But the next thing is to access the cache directly.

If you go to ZDNet and other foreign sites will find that they use the vignette content management system has such a page name: 0,22342566,300458.html. In fact, the 0,22342566,300458 here is separated by commas with multiple parameters:
When a page is not found on the first visit, it is equivalent to generating a doc_type= 0&doc_id=22342566&doc_template=300458 query on the server side.
And the query results are generated by the cached static page: 0,22342566,300458.html

Disadvantages of static caching:

Complex trigger update mechanism: These two mechanisms are very applicable when the content management system is simpler. But for a Web site with a more complex relationship, the logical reference relationship between pages becomes a very, very complex issue. The most typical example is a piece of news at the same time in the news page and related 3 News topics, in the static caching mode, every new article, in addition to this news content itself page, but also need the system through the trigger to generate multiple new related static pages, The triggering of these related logics will often become one of the most complex parts of the content management system.
Bulk update of old content: the content published by static caching is difficult to modify for the contents of previously generated static pages, so that when a user accesses an old page, the new template is not effective at all.
In dynamic caching mode, each dynamic page needs only care, and other related pages are automatically updated, which greatly reduces the need to design related page update triggers.


There used to be a similar way of doing small applications: After the first visit, the database's query results are stored locally as a file, and the next request checks for cached files in the local cache directory to reduce access to the background database. Although this can also carry a relatively large load, but such a content management and cache management system is difficult to separate, and data integrity is not very good to save, content updates, the application needs to be the corresponding content of the cache files deleted. But such a design in the cache of files often need to be cached directory to do a certain distribution, otherwise a directory of files node than 3000,RM * will be wrong.


At this time, the system needs to work again, the complex content management system decomposed into: content input and caching of these 2 relatively simple system implementation.


Background: Content management system, focus on the content of the release to do, such as: Complex workflow management, complex template rules, etc. ...
Foreground: page cache management can be implemented using the caching system


So after the division of labor: Content Management and Cache Management 2, no matter what the optional space is very large: software (such as the front 80 port using squid for the background of 8080 content Release Management system to cache), caching hardware, even to Akamai such as professional service providers.

Cache-oriented site planning

A web-Accelerated HTTP acceleration scheme using SQUID for multiple sites:

The original plan for a site might be this:
200.200.200.207 www.chedong.com
200.200.200.208 news.chedong.com
200.200.200.209 bbs.chedong.com
200.200.200.205 images.chedong.com

Cache-Oriented Server design: All sites are directed to the same ip:200.200.200.200/201 via external DNS (2 for redundant backups)
Working principle:
When an external request comes in, the settings cache is resolved according to the configuration file. In this way, the server request can be forwarded to the internal address we specify.

In dealing with multiple virtual host steering: Mod_proxy is simpler than squid: you can turn different services to different ports on multiple IP backgrounds.
And squid can only by disabling DNS resolution, and then based on the local/etc/hosts file based on the requested domain name for the address forwarding, the background multiple servers must use the same port.

With reverse proxy acceleration, we not only get performance improvements, but also gain additional security and configuration flexibility:

    • Increased configuration flexibility: You can control the DNS resolution of the backend server on your own internal server, and when you need to make migration adjustments between servers, you do not have to modify the external DNS configuration in a large amount, only to modify the internal DNS Implementation service adjustment.
    • Increased data security: All backend servers can be conveniently protected within the firewall.
    • Background application design Complexity reduced: originally for efficiency often need to build a dedicated image server images.chedong.com and load a higher application server bbs.chedong.com separation, in reverse proxy acceleration mode, all foreground requests are by caching the server: In fact, are static pages, so that the application design will not consider the picture and application itself separated, but also greatly reduce the background content distribution system design complexity, because the data and applications are stored together, also facilitate the file Maintenance and management of the system.

An accelerated implementation of reverse proxy caching based on Apache Mod_proxy

Apache contains the Mod_proxy module, which can be used to implement proxy server, reverse acceleration for backend server

To install Apache 1.3.x compile time:
--enable-shared=max--enable-module=most

Note: In Apache 2.x, Mod_proxy has been separated into mod_proxy and Mod_cache: There are also mod_cache different implementations based on file and memory

Create/var/www/proxy to set the Apache service user writable

Mod_proxy Configuration Sample: Anti-phase proxy cache + Cache
Set up the front desk of the www.example.com reverse proxy backend www.backend.com 8080-port service.
Modification: httpd.conf
<virtualhost *>
ServerName www.example.com
ServerAdmin admin@example.com

# Reverse Proxy setting
proxypass/http://www.backend.com:8080/
proxypassreverse/http://www.backend.com:8080/

# cache Dir Root
CacheRoot "/var/www/proxy"
# Max Cache Storage
CacheSize 50000000
# Hour:every 4 Hour
Cachegcinterval 4
# Max page expire Time:hour
Cachemaxexpire 240
# Expire time = (now-last_modified) * Cachelastmodifiedfactor
Cachelastmodifiedfactor 0.1
# Defalt Expire Tag:hour
Cachedefaultexpire 1
# force complete after precent of content retrived:60-90%
Cacheforcecompletion 80

Customlog/usr/local/apache/logs/dev_access_log combined
</VirtualHost>

Fast implementation of reverse proxy based on squid

Squid is a more dedicated proxy server with a much higher performance and efficiency than Apache mod_proxy.
If you need a combined format log patch:
Http://www.squid-cache.org/mail-archive/squid-dev/200301/0164.html

Squid's compilation:
./configure--enable-useragent-log--enable-referer-log--enable-default-err-language=simplify_chinese-- enable-err-languages= "Simplify_chinese 中文版"--disable-internal-dns
Make
#make Install
#cd/usr/local/squid
Make dir cache
Chown Squid.squid *
Vi/usr/local/squid/etc/squid.conf

In/etc/hosts: Join an internal DNS resolution, such as:
192.168.0.4 www.chedong.com
192.168.0.4 news.chedong.com
192.168.0.3 bbs.chedong.com

---------------------Cut is here----------------------------------
# Visible Name
Visible_hostname cache.example.com

# cache Config:space use 1G and memory use 256M
Cache_dir Ufs/usr/local/squid/cache 1024 16 256
CACHE_MEM 256 MB
Cache_effective_user Squid
Cache_effective_group Squid


Http_port 80
Httpd_accel_host Virtual
Httpd_accel_single_host off
Httpd_accel_port 80
Httpd_accel_uses_host_header on
Httpd_accel_with_proxy on

# accelerater My domain
ACL Acceleratedhosta dstdomain. example1.com
ACL Acceleratedhostb dstdomain. example2.com
ACL ACCELERATEDHOSTC dstdomain. example3.com
# Accelerater HTTP protocol on port 80
ACL Acceleratedprotocol Protocol HTTP
ACL Acceleratedport Port 80
# Access Arc
ACL all src 0.0.0.0/0.0.0.0

# Allow requests when they are to the accelerated machine and to the
# Right Port and right protocol
http_access Allow Acceleratedprotocol Acceleratedport Acceleratedhosta
http_access Allow Acceleratedprotocol Acceleratedport acceleratedhostb
http_access Allow Acceleratedprotocol Acceleratedport ACCELERATEDHOSTC

# Logging
Emulate_httpd_log on
Cache_store_log None

# Manager
ACL manager Proto Cache_object
Http_access Allow manager all
CACHEMGR_PASSWD Pass All


----------------------Cut is here---------------------------------

To create a cache directory:
/usr/local/squid/sbin/squid-z

Start squid
/usr/local/squid/sbin/squid

Stop squid:
/usr/local/squid/sbin/squid-k shutdown

To enable a new configuration:
/usr/local/squid/sbin/squid-k Reconfig

Truncated/round-robin log by crontab 0 dots per day:
0 0 * * * (/usr/local/squid/sbin/squid-k rotate)

Cached dynamic page Design

What kind of pages can be better cached by caching the server? If the HTTP header of the returned content has "last-modified" and "Expires" related declarations, such as:
Last-modified:wed, May 2003 13:06:17 GMT
Expires:fri June 2003 13:06:17 GMT
The front-end cache server caches the generated pages locally: Hard disk or memory until the page expires.

So, a cacheable page:

    • Page must contain last-modified: Tag
      The general pure static page itself will have last-modified information, dynamic pages need to be forced by the function plus, for example in PHP:
      Always modified now
      Header ("last-modified:"). Gmdate ("D, D M Y h:i:s"). "GMT");

    • You must have expires or cache-control:max-age tags to set the expiration time for the page:
      For static pages, the cache cycle is set according to the MIME type of the page by the Apache Mod_expires: for example, the image defaults to 1 months, and the HTML page defaults to 2 days.
      <ifmodule mod_expires.c>
      Expiresactive on
      Expiresbytype image/gif "Access plus 1 month"
      Expiresbytype text/css "Now plus 2 Day"
      ExpiresDefault "now plus 1 day"
      </IfModule>

      For dynamic pages, you can directly write HTTP-returned header information, such as for the news home index.php can be 20 minutes, and for a specific news page may be 1 days after expiration. For example: After adding 1 months in PHP to expire:
      Expires one month later
      Header ("Expires:". Gmdate ("D, D M Y h:i:s", time () + 3600 * 24 * 30). " GMT ");

    • If the server has an HTTP based authentication, there must be a cache-control:public tag that allows the foreground

Cache modification for ASP application The following common functions are first added to the common include file (for example, include.asp):

<%
' Set Expires Header in minutes
Function Setexpiresheader (ByVal minutes)
' Set Page last-modified Header:
' Converts date (19991022 11:08:38) to HTTP form (Fri, Oct 1999 12:08:38 GMT)
Response.AddHeader "Last-modified", Datetohttpdate (now ())

' The Page Expires in Minutes
Response.Expires = minutes

' Set cache control to Externel applications
Response.CacheControl = "Public"
End Function

' Converts date (19991022 11:08:38) to HTTP form (Fri, Oct 1999 12:08:38 GMT)
Function datetohttpdate (ByVal oledate)
Const Gmtdiff = #08:00:00#
Oledate = Oledate-gmtdiff
Datetohttpdate = Engweekdayname (oledate) & _
"," & Right ("0" & Day (Oledate), 2) & "" & Engmonthname (oledate) & _
"& Year (oledate) &" "& Right (" 0 "& Hour (Oledate), 2) & _
":" & Right ("0" & Minute (Oledate), 2) & ":" & Right ("0" & Second (Oledate), 2) & "GMT"
End Function

Function engweekdayname (DT)
Dim out
Select case Weekday (dt,1)
Case 1:out= "Sun"
Case 2:out= "Mon"
Case 3:out= "Tue"
Case 4:out= "Wed"
Case 5:out= "Thu"
Case 6:out= "Fri"
Case 7:out= "Sat"
End Select
Engweekdayname = out
End Function

Function engmonthname (DT)
Dim out
Select case Month (DT)
Case 1:out= "the"
Case 2:out= "Feb"
Case 3:out= "Mar"
Case 4:out= "APR"
Case 5:out= ' may '
Case 6:out= "June"
Case 7:out= "June"
Case 8:out= "Aug"
Case 9:out= "Sep"
Case 10:out= "OCT"
Case 11:out= "Nov"
Case 12:out= "Dec"
End Select
Engmonthname = out
End Function
%>

Then in the specific pages, such as index.asp and news.asp "Top" Add the following code: HTTP Header

<!--#include file= ". /include.asp "-->
<%
' page will be set to expire after 20 minutes
Setexpiresheader (20)
%>


How can I check the cache of the current site page (cacheablility)? You can refer to the tools on the following 2 sites:
http://www.ircache.net/cgi-bin/cacheability.py


Attachment: Squid performance test test

Phpman.php is a PHP based man page server, each man page needs to invoke the back of the Man command and many page formatting tools, the system load is relatively high, providing the cache friendly URL, The following are the performance test data for the same page:
Test environment: Redhat 8 on Cyrix 266/192m Mem
Test program: Use Apache AB (Apache Benchmark):
Test condition: Request 50 times, concurrent 50 connections
Test project: Direct via Apache 1.3 (80 port) vs Squid 2.5 (8000 ports: Accelerate 80 ports)

Test 1:80-Port dynamic output without cache:
Ab-n 100-c HTTP://WWW.CHEDONG.COM:81/PHPMAN.PHP/MAN/KILL/1
This is apachebench, Version 1.3d < $Revision: 1.1 $> apache-1.3
Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright (c) 1998-2001 the Apache Group, http://www.apache.org/

Benchmarking localhost (be patient) ... done
Server software:apache/1.3.23
Server Hostname:localhost
Server port:80

Document Path:/PHPMAN.PHP/MAN/KILL/1
Document length:4655 bytes

Concurrency Level:5
Time taken for tests:63.164 seconds
Complete requests:50
Failed requests:0
Broken Pipe errors:0
Total transferred:245900 bytes
HTML transferred:232750 bytes
Requests per second:0.79 [#/sec] (mean)
Time per request:6316.40 [MS] (mean)
Time/request:1263.28 [MS] (mean, across all concurrent requests)
Transfer rate:3.89 [Kbytes/sec] Received

Connnection Times (MS)
Min MEAN[+/-SD] Median max
Connect:0 29 106.1 0 553
processing:2942 6016 1845.4 6227 10796
waiting:2941 5999 1850.7 6226 10795
total:2942 6045 1825.9 6227 10796

Percentage of the requests served within a certain time (MS)
50% 6227
66% 7069
75% 7190
80% 7474
90% 8195
95% 8898
98% 9721
99% 10796
100% 10796 (Last Request)

Test 2:squid Cache Output
/home/apache/bin/ab-n50-c5 "HTTP://LOCALHOST:8000/PHPMAN.PHP/MAN/KILL/1"
This is apachebench, Version 1.3d < $Revision: 1.1 $> apache-1.3
Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright (c) 1998-2001 the Apache Group, http://www.apache.org/

Benchmarking localhost (be patient) ... done
Server software:apache/1.3.23
Server Hostname:localhost
Server port:8000

Document Path:/PHPMAN.PHP/MAN/KILL/1
Document length:4655 bytes

Concurrency Level:5
Time taken for tests:4.265 seconds
Complete requests:50
Failed requests:0
Broken Pipe errors:0
Total transferred:248043 bytes
HTML transferred:232750 bytes
Requests per second:11.72 [#/sec] (mean)
Time per request:426.50 [MS] (mean)
Time/request:85.30 [MS] (mean, across all concurrent requests)
Transfer rate:58.16 [Kbytes/sec] Received

Connnection Times (MS)
Min MEAN[+/-SD] Median max
Connect:0 1 9.5 0 68
Processing:7 83 537.4 7 3808
Waiting:5 81 529.1 6 3748
Total:7 84 547.0 7 3876

Percentage of the requests served within a certain time (MS)
50% 7
66% 7
75% 7
80% 7
90% 7
95% 7
98% 8
99% 3876
100% 3876 (Last Request)

Conclusion: No Cache/cache = 6045/84 = 70
Conclusion: The server speed can be increased by 2 orders of magnitude for pages that may be cached, because squid puts cached pages in memory (so there is little hard drive I/O operations).

Section:

    • A large number of visits to the site should be as far as possible dynamic Web pages generated static page as a cache release, even for search engines such as dynamic applications, caching mechanism is very important.
    • Define the cache update policy using the HTTP header in a dynamic page.
    • Leverage cache servers For additional configuration and security
    • Log is very important: Squid log default does not support combined log, but for the need referer log of this patch is very important:http://www.squid-cache.org/mail-archive/ Squid-dev/200301/0164.html

Resources:

HTTP Proxy Caching
Http://vancouver-webpages.com/proxy.html

cacheable Page Design
Http://linux.oreillynet.com/pub/a/linux/2002/02/28/cachefriendly.html

Related RFC documents:

    • RFC 2616:
      • Section (Caching)
      • Section 14.9 (Cache-control header)
      • Section 14.21 (Expires header)
      • Section 14.32 (Pragma:no-cache) is important if your are interacting with http/1.0 caches
      • Section 14.29 (last-modified) is the most common validation method
      • Section 3.11 (Entity Tags) covers the extra validation method

Cache check:
Http://www.web-caching.com/cacheability.html

Cache design elements:
Http://vancouver-webpages.com/CacheNow/detail.html

Several documents on Zope that use the Apache Mod_proxy mod_gzip acceleration
http://www.zope.org/Members/anser/apache_zserver/
Http://www.zope.org/Members/softsign/ZServer_and_Apache_mod_gzip
http://www.zope.org/Members/rbeer/caching



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.