The last time we used livejournal as an example, we analyzed in detail how a small website develops into a large-scale website performance optimization solution step by step to solve the performance problems caused by load growth in development, at the same time, these problems are fundamentally avoided or solved when designing the website architecture.
Today, let's take a look at some of the common solutions for large-scale access and high load in website design. We will focus on the following aspects:
1. Front-end load
2. Business logic layer
3. Data Layer
In the LJ performance optimization article, we mentioned that grouping servers is a solution to solve load problems and achieve unlimited scaling. We usually adopt LDAP-like solutions, which are used in mail servers, personal websites, and blog applications, and similar Active Directory solutions under windows. Some applications (such as blogs or personal webpages) require that the user be located in the server group when the second-level domain name is resolved. At this time, the request has not been applied yet, we need to solve this problem in DNS. In this case, you can use bind dlz, a plug-in of bind to replace the text parsing configuration file of BIND. It supports multiple data storage methods, including LDAP and bdb, to better solve this problem.
Another DNS-related problem is the common north-south interconnection problem. The bind9 built-in view function can parse different results based on different IP sources, in this way, users in the south are resolved to servers in the south, and users in the north are resolved to servers in the north. In this process, we will encounter two problems. One is to obtain the North-South IP address distribution list, and the other is to ensure smooth communication between North-South servers. There is a stupid solution to the first problem. Retrieve all visitor IP addresses from the log, write a script, Ping them from the north and south servers, and analyze the results, you can get a rough and accurate list. Of course, the best way is to get this list from the carrier (Update: see this article ). There are many solutions to the next problem. The best solution is to lease a dual-line data center with the same machine, Dual IP address, and simultaneous access between the North and the South. The difference is that data centers are located in the north and south, through a large number of tests to find out the two data centers with smooth intermediate communication, the latter usually costs less, but the effect is poor, resulting in inconvenient maintenance.
In addition, DNS Server Load balancer is also a widely used load balancing method that distributes access to multiple front-end servers through multiple parallel a records, this method is usually used for applications with a majority of static pages. Many front-ends of the content of several portals use this method.
After the user is located in the correct server group, the application takes over the user's request and starts to process it along the defined business logic. These requests mainly include two types of static files (images, JS scripts, CSS, etc.) and dynamic requests.
Static requests generally use squid for cache processing. Different cache configuration schemes can be used based on the application scale, either a level-1 cache or a multi-level cache, in general, the cache hit rate can reach about 70%, which can effectively improve server processing capabilities. The deflate module of Apache can compress and transmit data to improve the speed. The cache module after version 2.0 also has built-in cache for disks and memory, without the need for reverse proxy.
Currently, dynamic requests are generally handled in two ways. One is static, and the page is static again when the page changes. Currently, a large number of CMS and BBS use this scheme, with the addition of cache, provides fast access speed. This is usually a suitable solution for applications with few write operations.
Another solution is dynamic cache. All accesses are still processed by the application, but the memory is used more when the application is processed, rather than the database. Generally, database access operations are extremely slow, and memory access operations are fast, at least an order of magnitude gap. memcached can be used to achieve this solution, the successful memcache can even reach a cache hit rate of more than 90%. I used 2 MB of memory 10 years ago. At that time, I had a funny description of a parent-child conversation:
Son: Dad, I want 1 GB of memory.
Dad: No, son. Not even for your birthday.
Today, the cost of large memory is sufficient. Google uses a large number of PCs to build clusters for data processing, and I have always felt that using a large memory PC can solve front-end and even intermediate load problems at a low cost. Because PC hard disks have a short life, slow speed, and slow CPU usage, they are used to make the Web Front-end cheaper and give full play to the advantage of large memory. If it is broken, you only need to replace it, there is no data migration problem.
The following is the application design. When designing applications, we should try our best to design a database that supports scalability. The database can be dynamically added and the memory cache is supported. This is the lowest cost. Another application design method is middleware, such as ice. The advantage of this solution is that front-end applications can be designed relatively simple. The data layer is transparent to front-end applications, provided by ice, and the distributed database design is implemented at the backend, using ICE encapsulation for front-end applications, this design has lower requirements for each part of the design and better layers of business. However, due to the introduction of middleware, more layers are divided, the implementation cost is also relatively high.
In terms of database design, you can use clusters and group databases. At the same time, the principles of database optimization should be applied as much as possible in details. The database structure and data layer applications should be designed to avoid temporary table creation and deadlock. The principles of database optimization are common on the Internet. You can solve the problem by Google. In terms of database selection, you can choose Oracle and MYSQL based on your habits. Not oracle can solve all the problems, nor MySQL can represent small applications. The best choice is suitable.
The above are all performance design solutions based on software. In fact, the good combination of hardware can also effectively reduce the time cost and development and maintenance costs, but we will not proceed here.
The Design of website architecture is an overall project. During the design, performance, scalability, hardware cost, time cost, and so on must be taken into account. Based on the business positioning, capital, and time, it is difficult for people to design appropriate solutions based on conditions. But if they want to practice more, they will eventually establish a website design concept suitable for themselves to guide the website design work, lay a good foundation for the development of the website.