A small website can use the simplest HTML static page. With some images, all the pages are stored in a directory, such websites have simple requirements on system architecture and performance. With the increasing diversity of Internet services, website-related technologies, after years of development, have already been subdivided into many very detailed aspects. Especially for large websites, the technologies used are widely used, from hardware to software,Programming LanguageDatabase, Webserver, firewall, and other fields all have high requirements, which are not comparable to the original simple HTML static website.
For large websites, such as portal websites, in the face of a large number of user visits and high concurrency requests, the basic solutions are concentrated in the following aspects: use high-performance servers, high-performance databases, high-efficiency programming languages, and high-performance WEB containers. To some extent, these solutions mean a greater investment.
1. HTML static
As we all know, the most efficient and least consumed HTML pages are purely static html pages, so we try our best to make the pages on our website adopt static pages, this simplest method is actually the most effective method. However, for websites with a large amount of content and frequent updates, we cannot manually implement them all, so we have a common information publishing system CMS, news channels such as the portals we often visit, and even other channels, are managed and implemented through the information publishing system, the information publishing system can automatically generate static pages based on the simplest information input. It can also provide channel management, permission management, automatic crawling, and other functions. For a large website, having an efficient and manageable CMS is essential.
In addition to portal and information publishing websites, websites with high interaction requirementsCommunityFor type websites, static as much as possible is also a necessary means to improve performance, will post in the community,ArticleReal-time static operations and re-static operations when there are updates are also a lot of use strategies, such as the mop hodgedge is the use of such policies, Netease community and so on.
At the same time, HTML static is also a method used by some cache policies. For applications that frequently use database queries but have little content updates in the system, you can consider using HTML static. For example, the public setting information of the Forum in the forum. Currently, mainstream forums can manage the information in the background and store it in the database.ProgramCalling, but the update frequency is very small. You can consider static content during background updates to avoid a large number of database access requests.
2. image server Separation
As we all know, for Web servers, images, whether Apache, IIS or other containers, consume the most resources. Therefore, it is necessary to separate images from pages, this is basically a strategy adopted by large websites. They all have independent image servers and even many image servers. This architecture reduces the pressure on the server system that provides page access requests and ensures that the system will not crash due to image problems.
Different configuration optimizations can be performed on the application server and image server. For example, Apache can provide as few loadmodules as possible when configuring contenttype, ensures higher system consumption and execution efficiency.
3. Database clusters and database tables are hashed
Large websites have complex applications, and these applications must use databases. In the face of a large number of accesses, database bottlenecks will soon become apparent. At this time, a database will soon fail to satisfy applications, therefore, we need to use a database cluster or database table hash.
In terms of database clusters, many databases have their own solutions, and Oracle and Sybase all have good solutions. The commonly used MySQL Master/Slave is also a similar solution, you can refer to the corresponding solutions to implement the database.
As the database cluster mentioned above is limited by the DB type used in terms of architecture, cost, and expansion, we need to consider improving the system architecture from the perspective of applications, database table hashing is a common and most effective solution.
We install business and application or function modules in the application to separate the database. Different modules correspond to different databases or tables, then, according to a certain policy, conduct a smaller database hash for a page or function, such as a user table and table hash by user ID, in this way, the system performance can be improved at a low cost and the scalability can be improved.
Sohu's Forum adopts this architecture to separate the database of Forum users, settings, posts, and other information, and then hash the databases and tables of posts and users according to sections and IDs, the simple configuration in the configuration file allows the system to add a low-cost database at any time to supplement the system performance.
4. Cache
The word cache has been used in many areas. The cache in website architecture and website development is also very important. Here we will first describe the two most basic caches. Advanced and distributed caching are described later.
For architecture caching, anyone familiar with Apache can know that Apache provides its own cache module, or use the plus squid module for caching, both methods can effectively improve Apache's access response capabilities.
The memory cache provided on Linux is a common cache interface that can be used in Web development, for example, during Java Development, you can call memorycache to cache and share data, and some large communities use this architecture. In addition, when using Web language development, various languages basically have their own cache modules and Methods. php has a pear cache module, and Java has more ,. net is not very familiar with, I believe there must be.
5. Images
Images are often used by large websites to improve performance and data security. The image technology can solve the differences in user access speed caused by different network access providers and regions, for example, the difference between Chinanet and EduNet has prompted many websites to set up image sites in CERNET to regularly update or update data in real time. In terms of image details, I will not elaborate too deeply here. There are many professional off-the-shelf solutions and product options. There are also low-cost software implementation ideas, such as rsync on Linux and other tools.
6. Server Load balancer
Server Load balancer is a high-end solution for large websites to handle high-load access and a large number of concurrent requests.
Server Load balancer has been developing for many years. There are many professional service providers and products to choose from. I personally have some solutions, including two architectures for your reference.
(1) layer-4 hardware switching
The layer-4 Exchange uses the header information of the layer-3 and layer-4 information packets to identify business flows based on the Application interval and distribute the business flows of the entire interval segment to appropriate application servers for processing.
The layer-4 switching function is like a virtual IP address pointing to a physical server. Its transmission services are subject to a variety of protocols, including HTTP, FTP, NFS, telnet, or other protocols. These services require complex load balancing based on physical serversAlgorithm. In the IP address world, the service type is determined by the TCP or UDP port address of the terminal. The application interval in the layer-4 switch is jointly determined by the source and terminal IP addresses, TCP and UDP ports.
In the field of hardware layer-4 switching products, there are some well-known products to choose from, such as Alteon and F5. These products are expensive, but value for money, it provides excellent performance and flexible management capabilities. "Yahoo China" was originally close to 2000 servers, and only three or four Alteon servers were used.
(2) layer-4 software exchange
After learning about the principle of the hardware layer-4 switch, the four-layer switch based on the OSI model came into being. Such a solution achieves the same principle, but has a poor performance. However, it is easy to meet a certain amount of pressure. Some people say that the software implementation method is actually more flexible, and the processing capability depends entirely on the familiarity of your configuration.
We can use LVS, which is commonly used in Linux for software layer-4 Switching. LVS is a Linux virtual server. It provides a real-time disaster response solution based on heartbeat to improve system robustness, at the same time, it provides flexible virtual VIP configuration and management functions to meet a variety of application needs at the same time, which is essential for distributed systems.
A typical load balancing strategy is to build a squid Cluster Based on layer-4 software or hardware exchanges. This idea is adopted on many large websites, including search engines, this architecture is low-cost, high-performance, and highly scalable. It is easy to increase or decrease nodes in the architecture at any time.
For large websites, each method mentioned above may be used at the same time. This is a simple introduction. You need to be familiar with and understand many details during the implementation process. Sometimes a Small squid parameter or Apache parameter setting has a great impact on system performance.
7. Latest: CDN acceleration technology
What is CDN?
The full name of CDN is content delivery network. The purpose is to add a new network architecture to the existing Internet to publish website content to the "edge" closest to the user's network, so that users can obtain the desired content nearby, improves the response speed for users to access the website.
CDN is different from images because it is more intelligent than images, or it can be used as a metaphor: CDN = more intelligent images + cache + traffic diversion. Therefore, CDN can significantly improve the efficiency of information flow in the Internet. It comprehensively solves problems such as low network bandwidth, large user traffic, and unevenly distributed outlets, and improves the response speed for users to visit websites.
CDN type features
CDN is implemented in three types: image, cache, and leased line.
The mirror site is the most common one. It allows direct content publishing and is suitable for static and quasi-dynamic data synchronization. However, the cost for purchasing and maintaining new servers is high. You must also set up image servers in various regions and assign professional technicians for management and maintenance. For large websites, the bandwidth cost for updates is also greatly increased.
High-speed cache, low cost, suitable for static content. Internet statistics show that more than 80% of users frequently access 20% of website content. Under this rule, the cache server can process static requests of most customers, the original server only needs to process about 20% of non-cache requests and dynamic requests, which greatly accelerates the response time of customer requests and reduces the load on the original server.
CDN services generally place cache servers on key nodes nationwide.
Leased lines allow users to directly access the data source to achieve dynamic data synchronization.
CDN instance
For example, when a user accesses a website, the website uses the Global Load Balancing Technology to direct the user's access to the nearest normal working cache server, directly responds to user requests.
When a user accesses a website that already uses the CDN service, the biggest difference between the resolution process and the traditional resolution method is that the authorized Domain Name Server of the website does not respond to the resolution request of the local DNS in the traditional round-robin mode, instead, we fully consider the location where the user initiated the request and the network at that time to decide to direct the user's request to the node cache server with a relatively light load closest to the user.
By combining user locating algorithms and server health check algorithms, You can redirect your requests to the nearest Cache Server distributed on the "edge" of the network, ensure that user access can receive more timely and reliable responses.
Because a large number of user accesses are directly responded by the CDN node Cache Server distributed on the edge of the network, this not only improves the user access quality, but also effectively reduces the load pressure on the source server.
Appendix: service description of a CDN service provider
Gcdn Acceleration
After gcdn acceleration is adopted, the system will add a gcdn server between the browsing user and your server. When you browse a user's access to your server, the static data, such as sliced and multimedia data, will be directly read from the gcdn server, greatly reducing the amount of exchange for reading static data from the master server.
VPN high-speed compression channel for VIP virtual hosts, using high-speed compression of China Telecom <==> China Netcom, China Telecom <==> International (HK), China Netcom <; ==> International (HK) and other cross-network leased line channels, smart multi-line, automatic retrieval of the fastest path, fast dynamic real-time concurrent response speed, real-time synchronization of dynamic scripts of the website, it has a more obvious acceleration effect on dynamic websites.
Each network operator (China Telecom, China Netcom, China tietong, CERNET) has a gcdn server on your server. No matter where you are from, gcdn can display your server as quickly as possible! In addition, we will back up your data in real time to make your data safer!
Four suggestions for High-concurrency Website access-website creation experience tutorial
Separating hard disk read/write splitting, separating functions from presentation, encapsulating basic functions into categories, and ensuring the scalability during architecture design are crucial to the construction of large websites.
The ever-expanding Internet scale, increasing user groups, and the rise of Web put forward new requirements for website construction. It must be high-performance and highly scalable, and support highly concurrent access.
Detach hard disk read/write
If the read/write performance of the website's hard disk is the bottleneck for improving the performance of the entire website, you can consider separating the Read and Write Functions of the hard disk for optimization. On the hard disk used for writing, if we increase the hard disk I/O, it will certainly increase the failure rate of the entire file system, because it is the sum of failure rates of all drives. Hard Disk I/O and failure rate cannot have both. The hard disk used for reading can be used as a common server hard disk to reduce the overhead.
Balanced CPU and I/O consumption not only can fully utilize server resources, but also support temporary overload. in the event of emergencies and sharp increase in access traffic, the result is that the overall performance of the system is degraded, rather than the crash immediately.
Functions and presentations must be separated
In the post-operation of the website, there will certainly be many changes in demand. If every change needs to be modifiedSource codeSo, the development of this website can be said to have failed.
The most important thing is that functions and presentations must be separated. The core functions are written in scripting language, and the front-end displays HTML with special tags. This not only speeds up development, but also facilitates future maintenance and upgrade. For the front-end template, you usually need to extract the header and tail of the page, and split the main part of the page according to the module or function, which can effectively reduce the pressure on the server.
Encapsulation makes development more efficient
At the functional block level, If JSP is used, basic functions such as database connection and session governance should be encapsulated into classes. If PHP is usedCodeIn explicit encapsulation, each function block is encapsulated into a function, file, or class.
At a higher level, the website can be divided into the presentation layer, logic layer, and persistent layer for encapsulation, so that when the architecture of a layer changes, it will not affect other layers. The recently popular MVC Architecture splits the entire website into three parts: model, view, and controller. In addition, there are many excellent code frameworks available for use, such as JSP structs and spring, PHP. MVC and studs. With the ready-made code framework, website development can get twice the result with half the effort.
Scalability to deal with sudden increases in traffic
A large website must consider the possible capacity expansion in the future when designing the architecture. For mobile websites, sudden increases in traffic are huge. On the website's primary storage server, the configuration file is used to specify the ID range of the data files stored on each storage disk. When the current server needs to read a data file, first obtain the inventory and directory address of the data by asking the interface on the primary storage server, and then read the actual data file. If you need to add a cabinet, you only need to modify the configuration file, and the foreground program will not be affected at all.