as the size of the website becomes larger, the number of visits increases, and the Web server becomes overwhelmed, the visitors will complain about the speed of the page opening. The simplest solution is to increase the cache. There are many caches of Web servers that can be placed between a database and a Web application, or between a Web application and a Web server, and can be placed between a Web server and a user's browser, or even directly on the browser side.
The simplest of these is that there is minimal configuration between the database and the Web application, and the fastest, because IO is the biggest bottleneck for modern computer systems. The common approach is to use a memcache or k-v NoSQL database for caching.
so comfortable after a period of time, the response rate of the website will fall down, if you do not want to increase the server, then you need to do is the page output cache.
because the database and the Web application side of the cache just cache the keys and values in the database, access to a large number of words have a hit rate problem, and a page often contains a lot of data that needs to be queried from the database, even if the database cache all hit, which also need a query process, immediately it quickly. Another problem is that for Web applications, the process of getting data to the output page requires some logical operations and a template rendering process, which can take a while.
If the result of a page response, that is, the HTML page, the overall cache, the Web server directly output the cache results, so that the speed is basically the same as the direct output of static pages, the load on the server side will be greatly reduced.
for page output caching, or some people call the page static, there are several main issues:
the processing of the page personalization section. Now most of the sites have login to the function, for the same page, before and after the login, different user login will have different display, such as not logged in when the display "Login or registration", login after the display "Welcome User XXX", which will cause most of the site page can not be directly cached.
In fact, this problem is in the page personalization and non-personalization part of the composition of the time issue. Traditional Web site This personalization part is also non-personalization part is placed directly on the Web application, so that the different user output page, the cache can not be done. If you put this process after the page cache program, the result of the same cache is no problem.
The common approach is to use SSI, ESI, and CSI.
SSI (server Side Include), which is now supported by most Web servers, can use the <!--#include file= "/user_info.shtml" in the page instead of the original individual parts. Then on the Web server will be looking for this. shtml file, which will be composited with the page. There is a problem, however, that SSI can only compose static files and pages, which makes the page output cache less meaningful.
the ESI (Edge Side Include) is designed to cope with this situation, and it can use a similar <esi:include src= "Http://example.com/1.html"alt="Http://bak.example.com/2.htmlThe /> XML syntax, which invokes a dynamic page during page composition, synthesizes the content of the dynamic page with the existing Web page. This makes it possible to synthesize a user's personalization information with the cached paging file. The biggest problem with ESI is that the existing Web server does not support this. Nginx needs to compile the third-party module to support ESI, the varnish native support ESI in the proxy cache server.
The CSI (Client Side include) or browser Side include, is the synthesis of personalized information and non-personalized information on the user's browser side. To be blunt is to use Ajax to get personalized information asynchronously, or to display personalization information directly using an IFRAME. In fact, the closer the process is to the user's browser, the better it can be from the Web server to the proxy server cache to the CDN, which can be used to cache the exact same pages. However, there are some problems with the use of Ajax, such as the browser does not support or open JavaScript can not access the page normally, many sites now have to consider the mobile client, which does not support the majority of JavaScript. And a lot of use of Ajax words to the user's browser is also a burden, and will cause page loading very "lag" feeling.
the concrete approach depends on the actual environment to decide.
the rest of the way to do this is to cache the page.
using proxy cache server is a good idea, such as squid and relatively new varnish, they can be erected in the front of the Web server as a proxy cache server, the page cache. When the page needs to be updated, it can send a purge request and a specific page URL, you can enable the proxy cache server to access the Web server, regenerate a new page, and then invalidate the existing page cache. In this process, when a new user access request is reached, the proxy cache server still uses the previous cache, and the new cache page generation ends before the expired page is invalidated. This makes it not necessary to worry that all user requests in the update cache process will go directly to the Web server, causing a lot of stress. One of the problems, is to trigger the update action of the user, for example, he released a reply caused by the page update, if the page update request and page cache update two action is asynchronous, if his internet speed is fast enough to see his post is not displayed, resulting in repeated postings and other actions. The problem will be dealt with later.
If you do not want to engage in another proxy server, Nginx fastcgi cache is a good thing, you can directly cache the fastcgi page down. With a third-party module cache purge, you can access the/purge/url by the time the cache expires. In addition Nginx+proxy Cache+cache purge can also reach the role of squid and varnish above.
there is another way to use Nginx Rewrite, save the cache as a static file, first check whether the static page exists, the presence of rewrite to this, does not exist to access the Web application. This also requires the addition of a separate process to generate cached pages. When the Web application triggers the update to send a signal to this process, the cache process receives this signal to go to the Web server side or Web application-side request page (in order to do loose coupling and later server separation is best directly from the Web server side), the request generated page file directly overwrites the remote cache of static files, This allows the cache to be updated, and the previous cache file can still be used by the visitor during the cache update process.
There is also a problem, when many users at the same point in time to trigger a page update action, which will cause this page to update frequently, and these updates are unnecessary, this is called a surprise group problem.
to prevent this event from occurring, one solution is to use the queue.
When a POST request is submitted, a message is sent to the cache update system to record the type, ID (e.g., post update, updated ID) of the update action that occurred on the General page. The cache update system pre-preprocessed the message when it receives the message, calculates the page that affects the update based on the profile (the page for this post, the page for the post list, and so on).
The URL of this page is inserted into the queue, the updater takes the URL at the other end of the queue, filters through the entire queue every time the URL is taken, and deletes the same URL queue message out of the queue. This allows you to ensure that the update is on the page and that all update events are appropriate. After removal, the cache is updated by sending the appropriate purge request based on the cache system used or by generating a new cache page to overwrite the old brute.
this de-duplicated queue implementation can be implemented with existing modules, such as a Python queue module, inheriting the classes in this module and writing down several methods to implement such a deduplication queue. You can also use an existing queue system, or even use a database directly to implement a queue. The advantage of using a database is that when the entire system is down, the messages in the queue are still there after the system restarts, and the caches of those pages can continue to be updated. If you use the in-memory queue directly, the system down restarts will cause some unhandled queue messages to be lost, resulting in the cache of these pages not being updated. The probability of this situation and the impact of the emergence of the system needs to be considered according to the actual situation, forum system if the posting is not very frequent, occasionally a post or reply does not appear is not very serious problems.
a good system should be designed for failure situations. If there is a problem with the cache system and no update requests are sent to the cache server, then all update actions will not be appropriate, causing the user to repeatedly submit the data, and some problems may occur. It might be a good idea to have the user directly access the Web application if it is possible to stop the cache server (or if the server can be killed directly by a large number of direct accesses without caching), if it is a problem.
Web caching and static of e-commerce software shop++