I have previously written three articles about website systems, ASP. NET performance optimization, from SQL statements, database design, ASP, and IIS 7 suites to explore this performance issue. This post is a series of the fourth, sorting out some of the books and articles I have read, change from the "Load balancing, server architecture, database expansion" Point of view, put forward some performance optimization suggestions, for the construction of large-scale web site needs of netizens as a reference.
The three posts I wrote earlier:
(a) 30 minutes to happily learn SQL performance Tuning
Http://www.cnblogs.com/WizardWu/archive/2008/10/27/1320055.html
(b) The performance of the website is getting worse?
Http://www.cnblogs.com/WizardWu/archive/2009/01/03/1367527.html
(iii) Building large, high-performance Web sites with IIS 7, ARR and Velocity
Http://www.cnblogs.com/WizardWu/archive/2009/05/16/1458108.html
-----------------------------------------------------
1. Separation of WEB server from DB server
Small website or b/s project, because at the same time the number of online, can let the same physical host, both Web server, and do DB server. However, both of them occupy a lot of CPU, memory, disk I/O, it is best to let the two separate server host to provide services to disperse the pressure, improve load capacity. In addition, if they are in the same network segment, should try to use the intranet Private IP access, and do not use public IP or host name.
Basically running the application on the Web, no matter what soft, hardware, while processing the request of multiple users, usually consumes CPU, but for the database, the CPU is not necessarily a lot of consumption, but memory and disk I/O more than the Web Server. Therefore, it is generally recommended that WEB server with a normal PC, but to use a better CPU, and DB server can not be hasty, should try to buy advanced servers, and to have RAID 5 or 6 of the disk array (hardware RAID, performance far better than the operating system or software RAID better ), and has more than 4 GB of memory. Of course, if the operating system, the database is the best 64-bit version, such as upgrading to 64-bit SQL Server and 64-bit Windows Server, so that the memory can be configured to five GB; but remember, too old PC, some peripheral hardware driver may not support 6 4-bit operating system and software.
If the number of people on the line continues to increase, you can add multiple WEB servers and DB server, with server clusters (cluster), load balancing (load Balancing) clusters, high availability cluster high-availability (HA), data Library cluster to achieve a larger distributed deployment.
Deployment plan (Deployment planning):
Http://msdn.microsoft.com/zh-cn/library/ms978676.aspx
Three-tiered Distribution (third-level distribution) (hardware, physical hierarchy of different hosts):
Http://msdn.microsoft.com/zh-cn/library/ms978694.aspx
Three-layered Services Application (layer three service application) (software, code layering):
Http://msdn.microsoft.com/zh-cn/library/ms978689.aspx
Tiered distribution (graded distribution):
http://msdn.microsoft.com/en-gb/library/ms978701 (ZH-CN). aspx
Deployment Patterns:
Http://msdn.microsoft.com/zh-cn/library/ms998478.aspx
Http://msdn.microsoft.com/en-us/library/ms998478.aspx
-----------------------------------------------------
2, load balancing (load Balance)
Load balancing technology has developed for many years, there are many professional service providers and products to choose from, basically can be divided into "software" and "Hardware" solution:
(1) Hardware:
The hardware solution, called Layer 4 switch (4th Exchange), distributes the business flow to the appropriate AP Server for processing, known products such as Alteon, F5, and so on. These hardware products are much more expensive than software solutions, but value for money can often provide better performance than software, and a convenient, easy-to-manage UI interface for quick configuration by managers. [1] Yahoo China is said to have approached 2000 servers with only three Alteon.
(2) Software:
Apache is a well-known HTTP Server, its bidirectional proxy/reverse Proxy function, can also achieve HTTP load balancing function, but its efficiency is not particularly good. The other HAProxy is purely to handle load balancing and has a simple caching function.
Unix, such as Sun's Solaris, is supported by the operating system's built-in load Balancing feature, and Linux has common LVS (Linux Virtual server), while Microsoft Windows Server 2003/2008 has NLB ( Network loadbalance).
LVS is the use of Ipvsadm, an IP-based load balancing program, to achieve all TCP/IP communication protocols can be load balanced. Because it is supported by Linux Kernel, it is quite efficient and consumes a fairly low CPU resource, but the disadvantage is that IPVSADM cannot analyze network packet data above Layer 4.
As for Windows server NLB, the principle is that no matter how many servers, all share a "cluster of ip", such as 1 of Active Load Balancer, and 2 of Virtual server 1 (Web server), and Vi Rtual DB Server, according to the type of load balancing (active/active, Active/standby 、...), and the user will only see a single IP outside, as to how many servers behind, users do not need to know (like Cluster Concepts of clustering and cloud computing).
Figure 1 Distributed users vs server farms (Web server farm)
Figure 2 The Red arrow is the Failover architecture (HA), and its function differs from the Load Balance
In 2, there are four real server (Web server), and three real DB servers, which form a web and DB server cluster (Cluster). We see that the top Virtual Server 1 itself does not have any data services (for example:. NET code, Pictures ...). And so on), there is only one function, which is to re-direct the user's connection request to the four Real servers below. This way of using a re-orientation (director) to distribute the service load to the Real Server is called load Balance.
There seems to be no way to automatically calculate the host's "load" situation, such as calculating how much CPU has been used to determine which Real Server to throw the request to, and now generally in the form of rotation (Round-robin), Or add some weight to the settings.
If we set up Load Balance on the Server and want to execute the ASP, it is important to note that the Session is stored in the memory of the Web Server in order to avoid the occurrence of a user form filled in more than a long time, until he submits, has been Wi NLB for ndows Server switches the work phase to another WEB server host. At this point, you can consider storing the Session state in SQL Server instead.
What about the performance of the Load Balance with software? Basically with the software, performance must not be 1 + 1 = 2, but usually can improve availability, that is, often heard HA (high availability cluster high-availability), that is Failover mode, such as the top of 2, Red Arrows to the left of Virtual Server 2 automatically replaces your IP address with virtual Server1 when you are able to detect virtual Server 1 outages or cannot provide services. So HA refers to the provision of "non-disruptive services", while the load Balance discussed in this post refers to the provision of "services that can withstand a high load", which is not the same thing. MIS personnel should consider the company's hardware resources, costs and budgets, whether or not to do both.
load-balanced Cluster (Load Balancing cluster):
Http://msdn.microsoft.com/zh-cn/library/ms978730.aspx
Http://msdn.microsoft.com/en-us/library/ms978730.aspx
Server clustering (servers cluster):
http://msdn.microsoft.com/en-gb/library/ms998414 (ZH-CN). aspx
Installing Network Load Balancing (NLB) on Windows Server 2008:
Http://blogs.msdn.com/clustering/archive/2008/01/08/7024154.aspx
Linux Load Balancing Support & Consulting:
http://www.netdigix.com/linux-loadbalancing.php
Load balancing (WCF, not directly related to this article):
Http://msdn.microsoft.com/zh-cn/library/ms730128.aspx
Http://msdn.microsoft.com/en-us/library/ms730128.aspx
-----------------------------------------------------
3. Layering of presentations and functions
Large Web sites, often for future extensibility, source code maintenance convenience, but the foreground display (HTML, Script), and the background of business logic, database access (. net/c#, SQL), cut into multiple layers.
According to Martin Fowler in P's EAA: Layer refers to the hierarchy on "logic" (logical separation), and tier refers to the layering on "physics" (physical separation). If our ASP. NET Platform uses "virtual" layering (N-layer) to cut ui-bll-dal, there is usually no performance problem, but with "physical" layering (n-tier), i.e. 2 and 3, each AP Server may be responsible for a different business Logic (Sales, inventory, logistics, manufacturing, accounting 、...), the respective source code is stored on different physical hosts, and can operate independently, it is necessary to consider each AP Server in coordination with each other, and call Web Service (poor performance of XML), Perform performance issues on distributed transactions.
Figure 3 tiering on "physical", various commercial logic may exist on multiple physical hosts
When it comes to distributed transactions on "physical" hierarchies (distributed Transaction), Microsoft Enterprise Services, COM +, WCF, and WF use MS DTC on the operating system to coordinate transactions, because MS DTC and these applications are respectively Different Process, the communication will encounter serialization, deserialization, but also to consolidate all the AppDomain in the transaction and the resources on the different hosts, inevitably will be slow performance.
Web Applications:n-tier vs. N-layer:
Http://codebetter.com/blogs/david.hayden/archive/2005/07/23/129745.aspx
-----------------------------------------------------
4, Data large table split
Compare larger data tables, or historical data tables with more data, which can be split according to certain logic. If the daily data volume is very large, you can use a daily storage, and then use a "summary table" to record a summary of the day, or you can first split the larger table into multiple tables, and then through the "Index Table" for association processing, to avoid querying large table caused by the performance problems [1].
Alternatively, you can use table partitioning to store data on separate files and then deploy to separate physical servers to increase I/O throughput and improve read and write performance.
In addition, in the series of this article "30 minutes happy to learn SQL performance tuning" once mentioned, if a data table field too much (and the amount of records just mentioned too different), should be cut vertically into more than two data tables, and can be used with the same name of the Primary Key pair more linked together, such as: Orders for Northwind, Order Details data Sheet. To avoid loading too much data when scanning with cluster index (clustered index) when accessing data, or to lock or lock too long when data is modified.
-----------------------------------------------------
5, Image Server separation
For Web server, the user requests for pictures are the most consuming system resources, so the size of the Web site and the characteristics of the project, the deployment of a separate picture server, and even multiple picture servers.
-----------------------------------------------------
6. Read/write separation
At the same time, "read" and "write" The database operation, is a very inefficient way to access. The better practice is to set up two identical database servers according to the pressure and demand of reading and writing, and then copying the data of the server responsible for "writing" to the server responsible for "reading".
-----------------------------------------------------
7, expansion of response to sudden increase in flow
[1] Large sites must consider future capacity expansions when designing the architecture. For the Activity class website, the irregular burst traffic is huge. On the Web site primary storage server, in the form of a configuration file, specify the ID range of the data files stored on each storage enclosure. When the current server needs to read a data, first by asking the interface on the primary storage server to obtain the enclosure and directory address of the data, and then go to the enclosure to read the actual data files. If you need to increase the enclosure, you can only modify the configuration file, the foreground program is completely unaffected.
-----------------------------------------------------
8. Cache
Caching (cache) is a temporary container for databases or objects in memory, and using caching can significantly reduce the read of a database and provide data by memory. For example, we can add a layer of "data cache" between the WEB server and DB server to make a copy of the frequently requested object in memory, so that the database can be supplied with no access to the data. For example, 100 users request the same data, previously need to query the database 100 times, now only 1 times, the rest can be obtained from the cache data, and read speed, Web page reaction speed will be greatly improved.
There are many types of products available for caching, and they can be divided into caches made with hardware or software, such as: ASP. NET built-in caching, third-party vendor cache Suites, Hibernate and NHibernate also have Session and sessionfactory caching mechanisms, Oracle's cache group technology, as well as my earlier article on "Large Web sites that build high performance with IIS 7, ARR, and Velocity", Microsoft's official new generation of distributed cache technology Velocity, in addition to proxy server Also available as a cache for Web pages:
Client <----> Proxy server <----> destination server
In the. NET class Library, there are two classes that provide CacheDependency and aggregatecachedependency, which can be used to cache objects (such as a DataSet) in ASP., and one or more physical files (such as: XML files) or in the database Table to create an association. When any of the XML files are modified or removed, the DataSet associated with it is also removed from memory and, of course, can be automatically removed by the time set in your program.
The biggest change in the cache after ASP. NET 2.0 is that the CacheDependency class has been rewritten by Microsoft, and we can also rewrite it by customizing the class to achieve the following features:
• Disable caching from requests in Active Directory (cache is automatically removed)
• Disable cache from requests in MSMQ or MQSeries
• Disable caching from requests in the Web Service
• Create a cachedependency for Oracle
• Other
In addition, there is a SqlCacheDependency (cache dependency) in SQL Server to listen to whether data in the data table has changed, that is, to avoid the data that users find during the cache is old, so that if the data does not change, the user will always get the data from the cache The data in the cache is automatically updated as soon as the data changes. The way to enable SqlCacheDependency is to create a new aspnet_ in SQL Server by aspnet_regsql.exe This tool with the parameter Input command. Sqlcachetablesforchangenotification table, as shown in 4, each record of this table represents one of the tables you want to listen to, and the rightmost Changeid field, whose value is for system judgment, the user's request to ASP. Should be provided by the in-memory cache, or the query should be re-made to the database.
Figure 4 automatically adding a table for listening after SqlCacheDependency is enabled
Also talk about, I in the "website performance is getting worse how to do?" "This article, too, has mentioned the following:
(4) Caching with programs or software
Using programs to cache, such as ASP. NET from the 1.x era, has built-in cache mechanism, or with some third-party auxiliary software, the Framework.
(5) Using hardware to do cache or buffer, hit the money to install AP Server
He is also in the original Web server, and the database server architecture, a set of application servers, as the Web server cache data source.
After the revision of the new site, the search speed increased many, the previous daily statistics, processing speed of more than 3 seconds of data more than 500,000, and after the revision, more than 3 seconds per week query less than 10 pen.
(6) Caching with hardware (cache)
In its heyday, the flow of data from American blogs reached 800,000 times a day. This figure is not high, for the program Master is a piece of cake, but the author is Dabbler engineer, knowledge is limited also may not write the program well, frequently by the host supplier Letter warning, request to improve the site system performance. Finally, I decided to develop the cache system. After the cache system is online, the database reads and writes, from 800,000 times a day down to 160,000 times a day.
Reply:
Peter.z.lu
Middleware can have many options:
Ncache, Coherence, Velocity, MemCache ...
In addition, there are distributed cache systems like Memcached and Cacheman. The former can be based on Linux and WIN32 platform, by maintaining a huge hash table in memory, can store image, video, file and database retrieval results, and support multi-server, can solve the ASP. NET built-in caching mechanism only for a separate server; the latter is said to be Microsoft's Popfly Project team member Sriram Krishnan's work, in the future may also become Microsoft's official product.
-----------------------------------------------------
9. Distributed system data structure--a case study of MySpace
Circulating a very fiery article on the Internet "from the MySpace database to see the changes in Distributed system data structure", the content refers to MySpace, a large community site, using Microsoft platform Windows Server, SQL Server, ASP. Today, user visits are up to 50 billion per month and more than 200 million users are registered. The following is only an excerpt of the focus of the article:
- First generation architecture-add more WEB servers
When MySpace has 500,000 registered users, the site uses only two Dell dual-CPU, 4 GB of Memory Web server (distributed user requests), a DB Server (all data is stored here).
- Second generation architecture-Increase database server
Run on three database servers, one for updating data (from it to the other two), and another two for reading data, because there are many people looking at the page and fewer people to write. Wait until the number of users and traffic increases, and then add the hard drive.
Later, the database server I/O became a bottleneck, in accordance with the vertical segmentation mode design, the site's different functions, such as: login, user information and blog, moved to different database servers to share the access pressure; To add new functionality, put a new database server in place.
When the registered user reaches 2 million, it also switches to the SAN (storage area network) from the storage device directly interacting with the database server, a high-bandwidth, specially designed network system that can connect a large number of disk storage devices together. MySpace lets the database connect to the SAN. However, when the user increased to 3 million people, the vertical segmentation strategy became difficult to maintain, and later the architect upgraded the host to 34 CPU expensive server, but also can not load.
- Third generation architecture-go to distributed computing architecture
The architect moves MySpace to a distributed computing architecture, which is physically distributed across many servers, and the whole must be logically equivalent to a single machine. With the database, the application can no longer be split as in the past, and then supported separately by different databases, and the entire site must be considered an application. This time, instead of dividing the database by site function and application, MySpace began to split its users into 1,000,001 groups, and then put all the data from each group into a separate instance of SQL Server. Later, each database server in MySpace actually ran two instances of SQL Server, meaning that each server served approximately 2 million users.
- Fourth generation architecture-add data cache layer
When the user reached 90.01 billion, MySpace again encountered storage bottlenecks, and later referenced the new San products, but the site's current requirements have exceeded the SAN's I/O disk storage system, and its read and write data limit speed.
When the user reaches 17 million, a data cache layer is added, which is located between the WEB server and the database server, and its only function is to make a copy of the frequently requested data object in memory. Once each user queries a message, the database is requested once; now when any user requests a database, the cache layer retains a copy, and when another user accesses it, it does not need to request the database again, so that no database access can be supplied to the data.
- Fifth generation architecture-go to operating systems and database software that support 64-bit processors
When the number of users reaches 26 million, it goes to SQL Server 2005, which is still in beta but supports 64-bit processors. After upgrading to 64-bit SQL Server 2005 and Windows Server 2003, MySpace was equipped with up to five GB of memory per server and later to more than one GB.
The change of data structure of distributed system from the MySpace database:
Http://www.cnblogs.com/cxccbv/archive/2009/07/15/1524387.html
http://www.javaeye.com/topic/152766
Http://smb.pconline.com.cn/database/0808/1403100.html
Http://idai.blogbus.com/logs/14736411.html
-----------------------------------------------------
Resources:
[1] Liang Jian. NET:. NET deep experience and actual combat Essentials, 15th chapter, Author: Li Tianping
http://www.fecit.com.cn/
Http://www.litianping.com
[2] out of the software workshop, author: Zhu
http://www.china-pub.com/508874
Http://blog.csdn.net/david_lv
[3] Multiple books, Web files, MSDN
-----------------------------------------------------
Translated from: http://www.cnblogs.com/WizardWu/archive/2009/09/22/1571499.html
Website performance optimization-database and Server architecture Chapter