A comprehensive introduction to practical knowledge of database and server architecture Load Balancing for website performance optimization

Source: Internet
Author: User
Tags website performance

This article focuses on website performance optimization from the perspective of [Server Load balancer, server architecture, and database expansion "and puts forward some performance optimization suggestions for the reference of netizens who need to build medium and large websites.
1. Separation of Web server and DB Server
For small websites or B/S projects, the same physical host can be used as both web server and DB server because there are not many concurrent online users. However, both of them occupy a large amount of CPU, memory, and disk I/O. It is recommended that the two use different server hosts to provide services to distribute the pressure and improve load capacity. In addition, if the two are in the same CIDR Block, try to use the Intranet private IP instead of the Public IP or host name.
Basically, no matter what software or hardware is used to process requests from multiple users at the same time, CPU is usually consumed. However, for databases, the CPU does not necessarily consume much, but the memory and disk I/O are used more than the web server. Therefore, it is generally recommended that the Web server use a normal PC, but make good use of a little CPU; and the DB server cannot be hasty, so we should try to buy advanced servers, it also requires a RAID 5 or 6 disk array (hardware raid, the performance is far better than the raid of the operating system or software), and more than 4 GB of memory. Of course, if the operating system and database use the best 64-bit version, such as upgrading to a 64-bit SQL Server and a 64-bit Windows server, the memory can be configured to 64 GB, in older PCs, some peripheral hardware drivers may not support 64-bit operating systems and software.
If the number of online users continues to increase, you can add multiple web servers and DB servers) cluster, high-availability cluster high-availability (HA), and database cluster for larger distributed deployment.
Deployment Plan ):
Three-tiered distribution (three-level distribution) (hardware, physical layer of different hosts ):
Three-layered services application (layer-3 service application) (software and code layer ):
Tiered distribution (hierarchical distribution ):
Http://msdn.microsoft.com/en-gb/library/ms978701 (En-CN). aspx
Deployment patterns:
2. Load Balance)
Server Load balancer has been developing for many years. There are many professional service providers and products to choose from. Basically, they can be divided into "software" and "hardware" solutions:
(1) Hardware:
The hardware solution is called Layer 4 switch (Layer 4 switch), which can distribute business flows to suitable AP servers for processing. Well-known products such as Alteon and F5. Although these hardware products are much more expensive than the software solutions, they provide far better performance and a convenient and easy-to-manage UI interface, for quick configuration by management personnel. It is said that when Yahoo China was close to 2000 servers, only three Alteon servers were used.
(2) software:
Apache is a well-known HTTP server. Its Two-way proxy/reverse proxy function can also achieve the HTTP load balancing function, but its efficiency is not very good. The other haproxy is purely used to handle Server Load balancer and has a simple cache function.
In terms of the built-in Server Load balancer function of the operating system, UNIX is supported by Sun Solaris, while Linux has common LVS (Linux virtual server ), microsoft Windows Server 2003/2008 has NLB (Network loadbalance ).
LVS uses ipvsadm, an IP-based Load Balancing Program, to achieve Load Balancing for all TCP/IP communication protocols. Because it is supported by Linux kernel, the efficiency is quite good and the CPU resources used are quite low, but the disadvantage is that ipvsadm cannot analyze network packet data above Layer 4.
As for Windows Server NLB, the principle is that no matter how many servers are there, all share a "cluster IP", such as 1's Active Load balancer, and Virtual Server 1 (Web Server) of Server Load balancer and virtual DB server are allocated according to the load balancing type (active/active, active/standby ,...), the user can only see a single IP Address outside. As for how many servers are there, the user does not need to know (like the concept of cluster and cloud computing ).

Figure 1 distributed user vs server farm (Web server farm)

Figure 2 the red arrow shows the Failover architecture (HA), which has different functions than load balance.
In 2, there are four real servers and three real dB servers, forming a web and DB server cluster ). We can see that the top Virtual Server 1 itself does not have any data services (such :. net code, images... and so on). Only one function is to redirect the user's connection request to the four real servers below. This method of distributing service loads to real servers is called load balance.


Currently, there does not seem to be a way to automatically calculate the "LOAD" of the host, such as calculating the percentage of CPU usage to determine which real server to lose the request; currently, the round-robin (round-robin) method is generally followed, or some weight settings are added.
If we set load balance on the server and execute ASP. net program, it should be noted that the session storage location is on which web server memory, to avoid the occurrence of a user form filled for a long time, wait until he submits, it has been switched to another Web server host by Windows Server NLB. In this case, you can consider storing the session state in SQL Server.
What is the performance of the load balance using software? Basically using software, the performance will not be 1 + 1 = 2, but it can usually improve availability, that is, HA (high-availability cluster high-availability), that is, failover mode, for example, Virtual Server 2 at the top of 2 can automatically replace Virtual Server 1 with its IP address when Virtual Server 1 is down or cannot provide services. Therefore, HA refers to the provision of "uninterrupted services", and the load balance discussed in this post refers to the provision of "services that can withstand high loads". The two refer to not the same thing. MIS personnel should consider whether both hardware resources, costs and budgets are required based on the company's hardware resources.

Load-balanced cluster (Load Balancing cluster ):
Server clustering (server cluster ):
Http://msdn.microsoft.com/en-gb/library/ms998414 (En-CN). aspx
Installing network load balancing (NLB) on Windows Server 2008:
Linux Load Balancing Support & Consulting:
Load Balancing (WCF, which is not directly related to this Article ):
3. Display and function layering
In large websites, the front-end display (HTML, script) and background business logic and database access (for future scalability and easy source code maintenance (. net/C #, SQL), cut into multiple layers.
According to Martin Fowler's statement in P of EAA: layer refers to the logical separation, and tier refers to the physical separation ). If your asp.. NET platform uses a "virtual" layer (n-layer) to cut the UI-bll-Dal, which usually has no performance problems; however, if "physical" layers (n-tier) are used, that is, 2 and 3, each AP server may be responsible for different business logic (sales, inventory, logistics, manufacturing, accounting ,...), their source code is stored on different physical hosts and can be operated independently. In this case, we must consider the coordination and cooperation between each AP server and the poor performance of calling Web Service (XML) and performance issues in executing distributed transactions.

Figure 3 "physical" Tiering. Various business logic may exist on multiple physical hosts
Speaking of Distributed Transaction, Microsoft's Enterprise Services, COM +, WCF, and WF use ms dtc on the operating system to coordinate transactions, since ms dtc and these applications are in different processes, in the communication will encounter serialization, deserialization action, but also the integration of all the appdomain transactions and resources on different hosts, inevitably, performance will be dragged down.
Web applications: N-tier vs. N-layer:
4. Data Big Table splitting
You can split large or historical data tables based on certain logic. If the daily data volume is very large, you can store it on a daily basis, and use a "summary table" to record a summary value for the day. You can also split a large table into multiple tables first, then, use the "index table" for association processing to avoid performance problems caused by querying large tables [1].
You can also use table partitioning to store data on different files and then deploy the data on an independent physical server to increase I/O throughput and improve read/write performance.
In addition, I have mentioned in this series "30-minute Happy Learning SQL Performance Tuning" that if there are too many fields in a data table (different from the number of records I just mentioned ), it should be vertically cut into two or more data tables and can be connected one to multiple primary keys with the same name, such as orders and order details data tables of northwind. To avoid loading too much data during Data Access Scanning with clustered index, or locking each other for too long during data modification.

Article to learn it Network: http://www.xueit.com/asp.net/show-4592-2.aspx


5. image server Separation
For Web servers, users' requests to images consume the most system resources. Therefore, you can deploy independent image servers or even multiple image servers based on the size and project features of your website.
6. read/write splitting
It is very inefficient to perform read and write operations on the database at the same time. A good practice is to create two database servers with identical structures based on the Read and Write pressure and requirements. The data on the server responsible for writing will be, regularly copy data to the read server.
7. scalability to deal with Burst Traffic
When designing the architecture, large websites must consider the future capacity expansion [1]. For mobile websites, sudden increases in traffic are huge. On the website's primary storage server, the configuration file is used to specify the ID range of the data files stored on each storage disk. When the current server needs to read a data file, first obtain the inventory and directory address of the data by asking the interface on the primary storage server, and then read the actual data file. If you need to add a cabinet, you only need to modify the configuration file, and the foreground program will not be affected.
8. Cache
Cache is a temporary container for databases or objects in the memory. Using the cache can greatly reduce database reading and provide data in the memory. For example, you can add a "data cache layer" between the Web server and DB server to create a copy of frequently requested objects in the memory. In this way, data can be provided without accessing the database. For example, if 100 users request the same copy of data and need to query the database for 100 times, only one request is required. The rest can be obtained from the cached data, in addition, the reading speed and webpage response speed will be greatly improved.
There are many caching products that can be divided into hardware or software caches, such as ASP. net built-in caching functions, third-party manufacturers' cache suites, Hibernate and nhib.pdf also have session and sessionfactory caching mechanisms, and Oracle's cache group technology, in addition, I previously introduced in the article "using IIS 7, arr and velocity to build high-performance large websites", Microsoft's latest official distributed cache technology, velocity, and proxy server) it can also be used as a webpage cache:
Client <----> proxy server <----> destination Server

In.. Net Class Library provides cachedependency and aggregatecachedependency.. Net cache objects (such as dataset), and establish an association with one or more physical files (such as XML files) or tables in the database. When any XML file is modified or removed, its associated dataset will also be removed from the memory. Of course, it can also be automatically removed at the specified time in your program.
The biggest change in the cache after ASP. NET 2.0 is that the cachedependency class has been rewritten by Microsoft. We can also inherit it through the custom class and then rewrite it to achieve the following functions:
• Request from Active Directory to invalidate the cache (the cache is automatically removed)
• Request from MSMQ or MQSeries to invalidate the cache
• Invalidate cache from requests in Web Service
• Create cachedependency for Oracle
• Others

In addition, SQL Server also has a sqlcachedependency (Cache dependency) that can be used to monitor whether data in the data table has changed, that is, to avoid the old data found during the cache, if the data does not change, the user will always retrieve data from the cache. Once the data changes, the data in the cache will be automatically updated. When sqlcachedependency is enabled, a new aspnet_sqlcachetablesforchangenotification table is generated in SQL server by using the aspnet_regsql.exe tool and input commands. As shown in figure 4, each record of this table is displayed, all represent one of the tables you want to listen to. The changeid field on the rightmost side has the value for the system to determine. net requests should be provided by the cache in the memory or re-query the database.

Figure 4 automatically add the table to be listened to after sqlcachedependency is enabled

In addition, I am talking about "What should I do if my website performance is getting worse ?」 This article also mentions the following content:
(4) using programs or software for caching
Use programs for caching, such as ASP. NET's built-in cache mechanism from the 1.x era, or use third-party auxiliary software and framework.

(5) use hardware for caching or buffering and install AP Server
He also added a group of application servers to the original web server and database server architecture as the cache data source of the Web server.
After the revision, the search speed of the new website has increased a lot. In the previous daily statistics, the processing speed of more than 0.5 million pieces of data exceeds 3 seconds. After the revision, less than 10 queries per week within 3 seconds.
(6) cache with hardware)
During the heyday, the number of blogs from the United States reached 0.8 million times per day. This number is not very high. It is a piece of cake for programmers, but I am a half-hanging engineer. Due to limited knowledge, the program may not be well written and is frequently warned by host suppliers, it is required to improve the website system performance. Finally, I decided to develop the cache system. After the cache system goes online, it reads and writes the database from 0.8 million times a day to 0.16 million times a day.
Peter. Z. Lu
Middleware can have many options:
Ncache, coherence, velocity, memcache...

In addition, there are distributed cache systems such as memcached and cacheman. The former can be used based on Linux and Win32 platforms. By maintaining a huge hash table in the memory, it can store image, video, file, and database retrieval results, and supports multiple servers, ASP. the built-in caching mechanism of net is only applicable to individual servers. The latter is said to be the work of Sriram Krishnan, a member of Microsoft's popfly project team, and may become a formal product of Microsoft in the future.

Article to learn it Network: http://www.xueit.com/asp.net/show-4592-3.aspx


9. Distributed System Data Structure-taking MySpace as an Example
A popular article "viewing Distributed System Data Structure Changes from MySpace databases" was published on the Internet, referring to the large community website MySpace, use Windows Server, SQL Server, and ASP. NET technology. Today, the number of user visits per month is as high as 50 billion, and more than 0.2 billion users are registered. The following sections only focus on this article:
First-generation architecture-add more Web Servers
When there were 0.5 million registered users in MySpace, the website only used two Dell dual-CPU and 4 GB memory Web servers (Distributed user requests), a DB Server (where all data is stored ).
Second-generation architecture-adding database servers
It runs on three database servers. One is used to update data (copied to the other two) and the other two are used to read data. Because there are many people who view the web page, fewer users need to write data. When the number of users and access volume increase, the hard disk will be installed.
Later, the I/O of the database server became a bottleneck and it was designed in Vertical Split mode. different functions of the website, such as logon, user information, and blog, move to different database servers to share the access pressure. To add new features, you need to invest in new database servers.
When the number of registered users reaches 2 million, the storage device interacts with the database server directly and switches to the San (storage region Network), a high-bandwidth, specially designed network system, A large number of disk storage devices can be connected together. MySpace connects the database to San. However, when the number of users increases to 3 million, the vertical segmentation policy becomes difficult to maintain. Later, the architect upgraded the host to an expensive server with 34 CPUs, but the load was not enough.
Third-generation architecture-switch to Distributed Computing Architecture
Architects move MySpace to a distributed computing architecture. It physically distributes many servers, and the overall logic must be the same as that of a single machine. In terms of databases, you can no longer split applications as you did in the past, but support different databases separately. Instead, you must regard the entire site as an application. This time, the database is no longer split by site function and application. MySpace starts to split its users by 1 million groups, and then stores all the data of each group into independent SQL Server instances. Later, each MySpace database server actually runs two SQL Server instances, that is, each server will serve about 2 million users.
Fourth-generation architecture-added data cache Layer
When the number of users reaches 900-10 million, MySpace encountered a storage bottleneck again, and later introduced a new San product, but the current requirements of the site, it has exceeded San's I/O disk storage system and its maximum speed of reading and writing data.
When the user reaches 17 million, a data cache layer is added, which is located between the Web server and the database server. Its only function is to create a copy of frequently requested data objects in the memory. In the past, every user queries a database and requests a database. Now, when any user requests a database, the cache layer keeps a copy, when other users access the database again, they do not need to request the database. In this way, data can be provided without accessing the database.
Fifth-generation architecture-go to operating systems and database software supporting 64-bit Processors
When the number of users reaches 26 million, it is converted to SQL Server 2005, which is still in Beta but supports 64-bit processors. After upgrading to 64-bit SQL Server 2005 and Windows Server 2003, each MySpace server is equipped with 32 GB of memory, and then upgraded to 64 GB.

Looking at the data structure change of the distributed system from the MySpace database:
[1] Liang jian. Net: deep experience and practical needs of. net, chapter 2, Author: Li Tianping
[2] coming out of the software workshop, Author: Zhu
[3] multiple books, network files, and msdn
Author: wizardwu
My homepage Personal Data
Contact me for my flash drive
Article to learn it Network: http://www.xueit.com/asp.net/show-4592-4.aspx


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.