High concurrency and high traffic website architecture (notes)

Source: Internet
Author: User
Tags website server website performance

1. Glossary
1.1 concurrent visits

Concurrent access volume = average access volume per unit time (times/second) x average processing time per single request (S/Time)

1.2 Server Load balancer
Server Load balancer has two meanings:
1) a large amount of concurrent access or data traffic is distributed to multiple node devices for separate processing, reducing the user's waiting time for response;
2) the operation of a single heavy load is distributed to multiple node devices for parallel processing. After each node device finishes processing, the results are summarized and returned to the user, which greatly improves the system processing capability.
Generally, Server Load balancer is divided based on different layers of the network (layer-7 network. The second layer of Server Load balancer refers to the use of multiple physical links as a single aggregation Logical Link. This is the trunking technology, which is not an independent device, it is a common technology used by switches and other network devices. Modern Server Load balancer technology usually operates on the layer 4 (Transport Layer) or Layer 7 (Application Layer) of the network. This is the Server Load balancer Technology for network applications, it is completely isolated from vswitches and servers and becomes an independent technical device.

2 network layer architecture
2.1 image website Technology
An image website is a website that has the same name on several servers,Have their own URLs respectivelyWebsites on these servers are called image websites.
Advantage: website backup
Disadvantages: 1) the content of multiple servers needs to be updated at the same time; 2) the user selects the server independently, which is not necessarily the best; 3) the user selects a lack of controllability and cannot balance the load most effectively.

2.2 CDN (Content Delivery Network)
Advantages: 1) website backup; 2) The website replaces the user-selected content server, enhancing controllability
Disadvantages: 1) because multiple image servers need to be synchronized when the content is updated, it is only applicable to websites with less frequent content updates or less real-time requirements. Secondly, when an image website is transferred, DNS in different regions will be updated after a while, and the controllability is still insufficient.

2.3 Distributed Application Layer Design
Sina podcast provides an interface for the player to query the video file address. When you open the video playback page, the player first connects to the query interface to obtain the optimal image server address of the video file, and then downloads the video file to the server.
Advantages: 1) website backup; 2) fully controllable image websites
Disadvantage: it is not applicable to servers with frequent content updates.

3. Switching layer architecture
Layer-4 Switching-> hardware implementation and software implementation

4. Server Optimization
4.4.1 overall server performance considerations
Common factors that affect server processing speed include network connection, hard disk read/write, memory space, and CPU speed. To maximize the efficiency of a server, the key is to eliminate the bottleneck.

Network:
4.2 socket Optimization
Adjustable kernel parameters that affect TCP/IP stack performance...

Disk read/write:
4.3 hard disk cache
Hard Disk-level cache means that content that needs to be dynamically generated is temporarily cached on the hard disk, and the same request is not dynamically generated within an acceptable latency range. In Linux, squid is generally used for hard disk cache.
Squid is a high-performance Proxy Cache Server. The principle is that when a user wants to download (Browse) a web page through a browser, the browser requests squid to obtain the page for it. Squid then connects to the original server where the page is located and retrieves the page from the server. Squid then returns the page to the client browser and saves a copy in the local cache directory of squid. When a user needs the same page, squid can simply read its copy from the cache, instead of requesting the original server again.

4.4 read/write splitting
If the read/write performance of the website's hard disk is a bottleneck for improving the performance of the entire website, you can consider separating the Read and Write Functions of the hard disk for optimization. On a dedicated hard disk, we can use software raid in Linux, while on a dedicated hard disk for reading, we can use a general server hard disk.

Memory:
4.5 memory cache
Memory-level cache means that content that needs to be dynamically generated is temporarily cached in the memory, and the same request is not dynamically generated within an acceptable latency range. Memcached is a good choice in Linux.

CPU:
4. Balanced CPU and IO
In a server cluster, when we find that the CPU and IO utilization on some machines differ greatly, you can consider replacing some CPU-consuming processes on the server with IO-consuming processes to achieve a balanced purpose.

5 Application Layer Optimization
5.1 website server program selection
Apache: the preferred web server in the Open Source Field. It is powerful and reliable, and is suitable for most applications. However, its strength is sometimes cumbersome, and the configuration file is complex and daunting. In high concurrency, the efficiency is not very high.
Lighttpd is a lightweight web server, but it is a rising star. Based on the single-process multiplexing technology, the static file response capability is much higher than that of Apache. Lighttpd also supports PHP well and supports other languages such as Python through FastCGI.
Put Lighttpd in front of squid to form a processing chain of Lighttpd-> squid-> Apache. Lighttpd is used to process static content. If the content in Squid contains this request and does not expire, it is directly returned to Lighttpd. Apache handles new or expired page requests. This architecture can reduce the stress on web applications and distribute different processing on multiple computers.

5.2 database selection
MySQL is the first choice for website development in Linux.

5.3 server-side script parser Selection
Currently, three common server scripts are available: ASP (Active Server Pages), JSP (Java Server Pages), and PHP (Hypertext Preprocessor ).
In Linux, we have many other options: Python (used by Google) and Perl. The advantage of using these less common scripting languages is that they do not have the advantage of other scripts for some special applications; the disadvantage is that there is less relevant information.

5.4 configurability
No matter what technology is used during the development of a large website, the configurability of the website is necessary.
First, and most importantly, functions and presentations must be separated.
Once again, the core functional scripts must be separated from server-related configuration content (such as database connection configuration and script header file path) and code.
Finally, try to modify the configuration file to take effect in real time to avoid restarting the service program after modifying the configuration file.

5.5 encapsulation and intermediate layer
At the functional block level, JSP is encapsulated based on the features of the language. If you use PHP, You need to explicitly encapsulate each functional block in the script code into a function, a file, or a class.
At a higher level, websites can be divided into presentation layer, logic layer, and persistence layer for encapsulation respectively.

6. resizing and fault tolerance
6.1 resizing
File System resizing
Database System Expansion

6.2 Fault Tolerance
For large-area network interruptions: the website must maintain data in the main user distribution area.
For server errors: Generally, redundancy design is used to avoid them.
For storage servers (mainly responsible for writing servers): Raid (redundant disk array) can be used; for databases (mainly responsible for writing master databases), dual master databases can be used.
For the front-end that provides services: You can use a layer-4 switch cluster. multiple servers provide services at the same time, which not only reduces the traffic pressure, but also serves as backups for each other.
In the Application Layer Program: A user-friendly error design should be considered.

7 Summary and prospects
7.1 conclusion
For a high-concurrency and high-traffic websiteBottleneckWill cause the decline of website performance.
1) At the Internet level, distributed design should be used to shorten the network distance between websites and users, and prevent Website access failures in the case of network accidents.
2) at the LAN level, Server clusters should be used to support larger traffic volumes and redundant backup.
3) at the single server level, the operating system, file system, and application layer software should be configured to balance the consumption of various resources and eliminate system performance bottlenecks.
4) at the application layer, various caches can be used to improve program efficiency and reduce server resource consumption.
5) In addition, the application layer program should be properly designed to prepare for future demand changes and resizing.
At each layer, fault tolerance needs to be considered.Eliminate single point of failureThe website service is not affected regardless of application layer program errors, server software errors, server hardware errors, or network errors.

7.2 Outlook
In the current Linux environment, there is a famous lamp (Linux + Apache + MySQL + PHP/perl/Python) website construction solution, but it is only for small and medium websites. There is no complete and cost-effective solution for large commercial websites with high concurrency and high traffic. In addition to hardware investments such as servers, hard disks, and bandwidth, you also need to spend a lot of budget and time on software solutions.
With the continuous development of the Internet, Web2.0 is destined to achieve a new, efficient, and low-cost solution. This solution should include transparent third-party CDN network acceleration services, low-price layer-4 or higher-level network switching devices, optimized the network performance of the operating system, optimized read/write performance, distributed, A highly reliable file system integrates HTTP servers with memory, hard disks, and other cache levels, a more efficient server-side script parser, and an application-layer design framework that encapsulates most of the details.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.