Small websites, such as personal websites, can be implemented using the simplest HTML static pages. With some images for beautification, all the pages are stored in a directory, such a website
The requirements for architecture and performance are very simple. With the increasing diversity of Internet services, website-related technologies have been subdivided into many aspects after years of development, especially for large websites, technologies used
It involves a wide range of areas, from hardware to software,Programming LanguageDatabases, webservers, firewalls, and other fields all have high requirements.
Proposed.
Large websites, such as portal websites. In the face of a large number of user access and high concurrency requests, the basic solution focuses on the following aspects: use high-performance servers, high-performance databases, high-efficiency programming languages, and high-performance WEB containers. However, in addition to these aspects, it is impossible to fundamentally solve the high load and high concurrency problems faced by large websites.
The solutions provided above also mean a greater investment to a certain extent, and these solutions have bottlenecks and do not have good scalability, I will talk about some of my experiences from the perspectives of low cost, high performance, and high scalability.
1. HTML static
As we all know, the most efficient and least consumed HTML pages are purely static html pages, so we try our best to make the pages on our website adopt static pages, this simplest method is actually the most
Valid method. However, for websites with a large amount of content and frequent updates, we cannot manually implement them all. As a result, the common information publishing system CMS appears, such as the portal sites we visit.
News Channels and even other channels are managed and implemented through the information publishing system. The information publishing system can achieve the simplest information input to automatically generate static pages, channel Management and permission management
For a large website, it is essential to have an efficient and manageable CMS.
In addition to portal and information publishing websites, websites with high interaction requirementsCommunityFor type websites, static as much as possible is also a necessary means to improve performance, will post in the community,ArticleReal-time static, and re-static when there are updates is also a lot of use strategy, such as the mop hodgedge is the use of such a strategy, Netease community and so on.
Meanwhile, HTML static is also a method used by some caching policies. For applications that frequently use database queries but have little content updates in the system, HTML static can be used, for example
Public configuration information of forums in the forum. Currently, mainstream forums can be managed in the background and stored in the database.ProgramCall, but the update frequency is small, you can consider
This part of content is static when it is updated in the background, which avoids a large number of database access requests.
2. image server Separation
As you know, for Web servers, images, whether Apache, IIS or other containers, consume the most resources. Therefore, it is necessary to separate images from pages.
They all have independent image servers and even many image servers. This architecture reduces the pressure on the server system that provides page access requests, and ensures that the system does not
Slice crashes. Different configuration optimizations can be performed on the application server and image server. For example, Apache can provide as few support as possible when configuring contenttype.
Loadmodule ensures higher system consumption and execution efficiency.
3. Database Cluster and database table hash
Large websites have complex applications, and these applications must use databases. In the face of a large number of accesses, database bottlenecks will soon become apparent. At this time, a database will soon fail to satisfy applications, therefore, we need to use a database cluster or database table hash.
In terms of database clusters, many databases have their own solutions, and Oracle and Sybase all have good solutions. The commonly used MySQL Master/Slave is also a similar solution, you can refer to the corresponding solutions to implement the database.
As the database cluster mentioned above is limited by the DB type used in terms of architecture, cost, and expansion, we need to consider improving the system architecture from the perspective of applications, database and table hash are common
The most effective solution. We install the business and application or function modules in the application to separate the database. Different modules correspond to different databases or tables, and then follow certain policies to access a page.
Or use a smaller database hash function, such as user tables and table hash by user ID. This can improve system performance at a low cost and improve scalability. Sohu's Forum uses this
This architecture separates Forum users, settings, posts, and other information from databases, and then hashes databases and tables for posts and users according to sections and IDs, finally, you can perform simple configuration in the configuration file.
Allows the system to add a low-cost database at any time to supplement the system performance.
4. Cache
The word cache has been used in many areas. The cache in website architecture and website development is also very important. Here we will first describe the two most basic caches. Advanced and distributed caching are described later.
For architecture caching, anyone familiar with Apache can know that Apache provides its own cache module, or use the plus squid module for caching, both methods can effectively improve Apache's access response capabilities.
Cache for website program development, memory provided on Linux
Cache is a common cache interface that can be used in Web development. For example, when Java is used for development, memorycache can be called to cache and share data.
This architecture is used by the community. In addition, when using Web language development, various languages basically have their own cache modules and Methods. php has a pear cache module, and Java has more
. NET is not very familiar, and I believe it will certainly exist.
5. Images
Images are often used by large websites to improve performance and data security. The image technology can solve the differences in user access speed caused by different network access providers and regions, such as Chinanet and
The difference between EduNet has prompted many websites to set up mirror sites in CERNET, and regularly update or update data in real time. In terms of image details and technologies, I will not elaborate too deeply here, but there are many professional
Optional solutions and products. There are also low-cost software implementation ideas, such as rsync on Linux and other tools.
6. Server Load balancer
Server Load balancer is the ultimate solution for large websites to solve high-load access and a large number of concurrent requests.
Server Load balancer has been developing for many years. There are many professional service providers and products to choose from. I personally have some solutions, including two architectures for your reference.
Hardware layer-4 Switching
The layer-4 Exchange uses the header information of the layer-3 and layer-4 information packets to identify business flows based on the Application interval and distribute the business flows of the entire interval segment to appropriate application servers for processing. The layer-4 switching function is like
Virtual IP address, pointing to the physical server. Its transmission services are subject to a variety of protocols, including HTTP, FTP, NFS, telnet, or other protocols. These services are complex based on physical servers.
Load BalancingAlgorithm. In the IP address world, the service type is determined by the TCP or UDP port address of the terminal. The application interval in the layer-4 switch is determined by the source and terminal IP addresses, TCP and UDP ports.
Yes.
In the field of hardware layer-4 switching products, there are some well-known products to choose from, such as Alteon and F5. These products are expensive, but value for money, it provides excellent performance and flexible management capabilities. Yahoo China used three or four Alteon servers on nearly 2000 servers.
Layer-4 software exchange
After learning about the principle of the hardware layer-4 switch, the four-layer switch based on the OSI model came into being. Such a solution achieves the same principle, but has a poor performance. However, it is easy to meet a certain amount of pressure. Some people say that the software implementation method is actually more flexible, and the processing capability depends entirely on the familiarity of your configuration.
For layer-4 software exchange, we can use LVS, which is commonly used in Linux. LVS is a Linux virtual
Server, which provides a real-time disaster response solution based on heartbeat to improve the robustness of the system and provide flexible virtual VIP configuration and management functions.
Multiple application requirements are required, which is essential for distributed systems.
A typical load balancing strategy is to build a squid Cluster Based on layer-4 software or hardware exchanges. This idea is adopted on many large websites, including search engines, this architecture is low-cost, high-performance, and highly scalable. It is easy to increase or decrease nodes in the architecture at any time. I have prepared a special detail for this architecture and will discuss it with you.
There are also some difficult issues, such as user data synchronization and parallel processing of heterogeneous systems.
For large websites, each method mentioned above may be used at the same time. I will introduce it more easily here. You need to be familiar with and understand many details during the implementation process, sometimes a very small squid parameter or Apache parameter setting will have a great impact on the system performance. I hope you can discuss it together to make it easier.