First of all, I would like to thank KK, the technical partner of CERD for sharing his views and experiences on the technical details, configuration parameters, and system improvement of the website. In the face of the impact of high traffic and high concurrency, how did Cert achieve this?
The application server uses the J2EE architecture technology, built with free resin2.1.17, and uses a self-developed Cache System for load balancing. The Webserver uses the heavyweight Apache and lightweight Lighttpd. Dynamic Content is handled by Apache, and static content (such as video, CSS, and JS) is handled by Lighttpd.
The following are the main configuration parameters and suggestions:
I. For Java application server, we recommend using resin2.1.17. If you have money, you can buy a resin3 license. Otherwise, resin3 is slower than resin2. You can also consider using glassfish, which has a small performance difference with resin, tomcat still does not work.
If the cache requires a particularly large memory size, we recommend that you use a 64-bit operating system. In theory, the 32-bit JDK uses a maximum of 4G memory. In fact, it can only use 3G memory. 3g also needs to be divided into two parts: native, and the remaining part is defined by the xmx parameter, which is not well configured.
Errors often occur. The system will pause for 6 to 10 seconds, and even the JVM will crash due to insufficient native memory (in this case, some Linux memory is available). Here I apply some of my configuration experience, as follows: (the following content cannot be reviewed by csdn)
1: Serial garbage collection, which is the default configuration. It takes 0.1 million seconds to complete the 153 request. The JVM parameter configuration is as follows: (omitted)
This configuration usually does not appear to be a major problem within 24 hours after resin is started. The website can be accessed normally. However, it is found that full GC is executed more and more frequently in nearly 24 hours, there is a full GC every three minutes. Every full GC system will pause for about 6 seconds. As a website, it may be too long for users to wait for 6 seconds, so this method needs to be improved. Maxtenuringthreshold = 7 indicates that if an object has been moved 7 times in the rescue space, it will be placed in the old generation. gctimeratio = 19 indicates that Java can use 5% of the time for garbage collection, 1/(1 + 19) = 1/20 = 5%.
2: Parallel collection. It takes 0.1 million seconds to complete 117 requests. The configuration is as follows: (omitted)
I have tried a variety of combined configurations for parallel collection, and it seems useless. When resin is started for about three hours, it will pause for over 10 seconds. It may also be because the parameter settings are not good enough. maxgcpausemillis indicates the maximum pause time of GC. The system is normal when resin is just started and full GC is not executed, but once full GC is executed, maxgcpausemillis is useless at all. The pause time may exceed 20 seconds. I don't care about what will happen in the future. Restart resin and try other recycling policies.
3: Concurrent recovery: 60 seconds after 0.1 million requests are completed, which is twice faster than parallel recovery. This is 2.5 times the performance of the default recovery policy. The configuration is as follows: (omitted)
Although this configuration will not be able to connect to the system for 10 seconds, the system will be restarted for about 3 hours and will not be able to connect to the system for 5 seconds every few minutes. Check the GC. log, it is found that there is a promotion failed error when executing parnewgc, which leads to the execution of full GC, resulting in system pause and frequent, once every few minutes, so it has to be improved. Usecmscompactatfullcollection is used to compress the memory after full GC is executed to avoid memory fragmentation. cmsfullgcsbeforecompaction = n indicates that memory compression is performed after N full GC operations.
4: incremental recovery. It takes 0.1 million seconds to complete 171 requests. The configuration is as follows: (omitted)
It seems that the collection is not very clean, and it also has a great impact on the performance. It is not worth trying.
5: The I-CMS mode of concurrent recovery, and incremental recovery is almost the same, the completion of 0.1 million requests took 170 seconds.
Configuration: (omitted)
Using the parameters recommended by Sun, the recovery effect is not good. There is still a pause, and frequent pauses will occur within a few hours. What sun-recommended parameters are still not good.
6: Incremental low pause collector. What kind of train recycle is also called? I don't know which system it belongs to. It takes 0.1 million seconds to complete the 153 request. Configuration: (omitted)
This configuration does not work well and affects performance, so I did not try it.
7: In contrast, concurrent recovery is better and the performance is relatively high. As long as the promotion failed error can be solved when parnewgc (parallel recovery of young generation) is solved, everything is easy to do. I have checked many articles, the cause of the promotion failed error is that the CMS cannot be recycled (by default, the CMS will be executed only when it accounts for about 90% of the old generation ), the old generation does not have enough space for GC to move some living objects from the young generation to the old generation, so full GC is executed. Cmsinitiatingoccupancyfraction = 70 indicates that CMS is executed when the old generation accounts for about 70% of the total, so that full GC will not occur. Softreflrupolicymspermb is also useful in my opinion. The official explanation is softly.
Reachable objects will remain alive for some amount of time after the last time they were referenced. the default value is one second of lifetime per free megabyte in the heap. I don't think it is necessary to wait 1 second, so it is set to 0. The configuration is as follows:
8: Promotion failed may also be caused by insufficient rescue space in 7th configuration. I simply remove the rescue space and adjust the maintenance vorratio and maxtenuringthreshold to implement the following settings, in short, make appropriate adjustments based on the actual situation. The following is my final configuration, which is very stable. If it is changed to a 64-bit system, the memory can be increased. However, it is necessary to restart resin every three minutes, and the perm space will always be full.
Ii. Apache configuration and tips.
When installing Apache, several parameters are provided as appropriate. Other optimization parameters are available on the Internet, but it is estimated that the performance is not much worse, as shown below:
./Configure -- prefix =/usr/local/apache2.2.10 -- enable-so -- enable-Deflate -- enable-Rewrite -- enable-expires
I have tested it. The mod_mem_cache provided by Apache is not very good. If the maximum memory usage is less than MB, the problem may occur. The memory usage suddenly drops, so
We recommend that you do not use Apache for image servers. It would be better to use Lighttpd or nginx. I have never configured Lighttpd + memcached, and nginx + memcached can
But it cannot be set to the cache. Therefore, I use Lighttpd + mod_mem_cache (Lighttpd must be patched). I will introduce it in detail later.
For simple anti-DDoS configuration of Apache, you need to find a mod_evasive20 module from the Internet and install it with/usr/local/Apache/bin/apxs-CIA mod_evasive20.c, in general, there is no need to configure this.
<Ifmodule mod_evasive20.c>
Doshashtablesize 10000
Dospagecount 2
Dossitecount 50
Dospageinterval 1
Dossiteinterval 1
Dosblockingperiod 10
Dosemailnotify webmaster@xxx.com
Doslogdir/var/log/mod_dosevasive.log
</Ifmodule>
Install the mod_security module to configure Apache to defend against SQL injection attacks.
<Ifmodule mod_security.c>
Secfilterengine on
Secfiltercheckurlencoding on
Secfilterforcebyterange 32 126
Secfiltercheckunicodeencoding on
Secserverresponsetoken off
Secauditengine relevantonly
Secauditlog logs/audit_log
Secfilterdebuglog logs/modsec_debug_log
Secfilterdebuglevel 0
Secfilterdefaultaction "Deny, log, status: 406"
Secfilter/etc/* passwd
Secfilter/bin/* Sh
Secfilter "/././"
Secfilter "<(|/n) * script"
Secfilter "<(. |/n) +>"
Secfilter "Delete [[: Space:] + from"
Secfilter "insert [[: Space:] +"
Secfilter "Select. + from"
Secfilter "Union [[: Space:] + from"
Secfilter "drop [[: Space:]"
Secfilterselective "http_user_agent | http_host" "^ $"
</Ifmodule>
Increase the configuration of the maximum number of Apache connections. If preworker is used. The difference between preworker and worker will not be discussed. You can directly set serverlimit for apache2.0 or later. You need to modify the source code before apache2.0.
The serverlimit of takes effect. We recommend that you use apache2.0 for the Application Server. The performance is not much inferior.
<Ifmodule mpm_prefork_module>
Serverlimit 20000 # serverlimit is said to be placed on the first line
Startservers 50
Minspareservers 50
Maxspareservers 100
Maxclients 10000
Maxrequestsperchild 10000
</Ifmodule>
The configuration of compressed transmission is very important for websites. It does not compress about 20% of the transmission size. That is to say, a user can access a website five times faster, however, the image cannot be compressed.
<Ifmodule mod_deflate.c>
Setoutputfilter deflate
Deflatecompressionlevel 3
Deflatefilternote input instream
Deflatefilternote output outstream
Deflatefilternote ratio Ratio
Logformat' "% H % L % u % t/" % R/"%> S % B/" % {Referer} I/"/" % {User-Agent} I /"% R" % {outstream} n/% {instream} n (% {ratio} n %) 'deflate
# Netscape 4.x has some problems...
Browsermatch ^ Mozilla/4 gzip-only-text/html
# Netscape 4.06-4.08 have some more problems
Browsermatch ^ Mozilla/4/. 0 [678] No-Gzip
# MSIE masquerades as Netscape, but it is fine
# Browsermatch/bmsie! No-gzip! Gzip-only-text/html
# Note: Due to a bug in mod_setenvif up to Apache 2.0.48
# The above RegEx won't work. You can use the following
# Workaround to get the desired effect:
Browsermatch/bmsi [e]! No-gzip! Gzip-only-text/html
# Don't compress images, java scripts and Style Sheets
Setenvifnocase request_uri /.(? : GIF | jpe? G | PNG | JS | CSS) $ no-gzip dont-vary
# Make sure proxies don't deliver the wrong content
# This needs mod_headers but it's very important
# So I don't add a ifmodule around it
# Header append vary User-Agent Env =! Dont-vary
# Customlog logs/deflate_log.log deflate
# Customlog "|/usr/local/cronolog/sbin/cronolog/usr/local/apache2.0.59 _ 2/logs/www.shedewang.com. access. log. % Y % m % d "deflate Env =! Imag
</Ifmodule>
Apache configures mod_mem_cache. This module is not easy to use. We recommend that you do not use it.
<Ifmodule mod_cache.c>
# Cacheforcecompletion 100
Cachedefaultexpire 3600
Cachemaxexpire 86400
Cachelastmodifiedfactor 0.1
Cacheignorenolastmod on
<Ifmodule mod_mem_cache.c>
Cacheenable MEM/
Mcachesize 2000000
Mcachemaxobjectcount 10000
Mcacheminobjectsize 1000
Mcachemaxobjectsize 512000
Mcacheremovalalgorithm LRU
</Ifmodule>
</Ifmodule>
Cacheenable: Start mod_cache, followed by two parameters. The first parameter specifies the cache type, which should be one of MEM (memory cache) or disk (disk cache). The second parameter specifies the URI path for using the cache, if you cache the entire website (or virtual host), simply specify it as the root directory.
Cacheforcecompletion: This value specifies the percentage of content generation actions to be completed when the HTTP request is canceled. The default value is 60 (% ).
Cachedefaultexpire: Specifies the default expiration seconds of the cache. The default value is one hour (3600 ).
Cachemaxexpire: specifies the maximum expiration seconds of the cache. The default value is one day (86400 ).
Cachelastmodifiedfactor: used to calculate the expire date from the last modified information in the response.
Calculation method: expire period (expiration time) = time interval since the last update * cachelastmodifiedfactor
And expire date = Current Time + expire Period
However, the expiration time cannot exceed the specified value of cachemaxexpire.
Configure the mod_expires Module
Mod_expires can reduce the number of duplicate requests by about 10%, so that duplicate users cache the results of specified page requests locally and do not send requests to the server. This is especially useful on the image server.
Installation configuration of mod_expires:
<Ifmodule mod_expires.c>
# Turn on the module for this directory
Expiresactive on
# Cache Common Graphics for 3 days
Expiresbytype image/jpg "access plus 365 days"
Expiresbytype image/GIF "access plus 365 days"
Expiresbytype image/JPEG "access plus 365 days"
Expiresbytype image/PNG "access plus 365 days"
</Ifmodule>