Evolution of websites and databases

Last Update:2018-12-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Forwarded from blog: http://www.cnblogs.com/birdshover/
Address: http://www.cnblogs.com/birdshover/archive/2009/08/03/1537225.html

(1)

The simplest website, probably the demo, is more suitable and can run normally on a computer. Generally, this deployment method is the most efficient. But why do we need to separate web servers from databases? This involves communication efficiency.

If you write a program, either winform, webform, or Windows service, there will always be two objects that need to exchange data, right? There are several ways to exchange data:

1. Pass the memory;

2. Hard Disk;

3. Use the NIC.

I. Data Exchange through memory

Fig 1.1

Figure 1.1 shows the data exchange between two classes. The withdrawal record is of the reference type, so the original object is used when "individual" is used, and the "balance" is of the value type, will be copied for use. However, whether it is a value type or a reference type, such communication in a process is conducted through the memory for data exchange, which is also the fastest way. Like my home machine, the memory read/write speed is about 6 Gbit/s.

As mentioned above, internal communication is carried out through the memory (this ignores non-mainstream programs), and inter-process communication also exchanges data with memory. For example, memory file ing. For example, the anonymous pipeline. (These technologies are not described here .)

2. Data Exchange through hard disks

Fig 2.1

This type of data exchange is also available in processes. It is more commonly used for data exchange between processes. During inter-process communication, the memory file ing and MPs queue technologies are comparatively more difficult to understand than reading and writing files on hard disks. Therefore, systems with low speed requirements also adopt this method. Figure 2.1 shows this situation. Currently, hard disks are generally divided into three types: IDE, SCSI, and SATA (serial port ). The read/write speed of the IDE hard disk is generally 10 ~ Between 100 Mb/s, the maximum speed of SCSI is about 320 Mb/s. Sata has several versions. The 1.0 specification is 150 Mb/s, And the 3.0 specification is 600 Mb/s. It can be seen that the read/write speed of the hard disk is much lower than that of the memory, and the other disadvantage of frequent read/write of the hard disk is that the hard disk is easy to break down relative to the memory.

3. Data exchange through the NIC

Fig 3.1

Figure 3.1 shows how data is exchanged through the NIC. Currently, network adapters are generally 100 Mbit/s. The actual transmission speed of a Mbit/s network adapter is about 12 Mbit/s, and that of a Mbit/s network adapter is about Mbit/s.

To put it bluntly, why do we need to separate web servers from databases? Isn't placing it on a single machine achieving the highest efficiency? Once placed separately, even if a gigabit Nic is used, the data transmission speed will be greatly reduced! This is a matter of scale. The website size is small. Of course, there is no problem in storing it. (here, we will analyze the performance and ignore other factors ). However, problems may occur when the scale is large, which is a consideration of the system bottleneck. When the traffic is high, it is very likely that the database and the Web server compete for the CPU. What's worse, when the database uses the cache to occupy a large amount of CPU, it is found that the memory is also occupied by the database. When the web and database are separated, the website will grow more like a website. During this period, the website needs to spend a lot of time thinking about performance issues and the work is more challenging.

But when we are complacent, the trouble is coming again. Now we have two servers, one Web database, and the performance is getting worse. The problems in this period are relatively simple, because the two parts are not the database bottleneck or the web server bottleneck. Of course, the database bottleneck is more likely. Why? Generally, when a company develops a website of this scale, all the programmers hired are just getting started. During this period, programmers are more concerned about how to implement functions and have no experience in performance optimization. If you carefully study how to optimize the database, how to make the query more efficient, and the learning transformation cost is still quite high. In this case, the static method is usually used to avoid heavy lifting.

(2)

(1) Speaking of static processing to improve website load.

I. Static processing scheme (especially file generation method)

Fig 1.1

1. HTML static solution

Figure 1.1 is the most common static processing method. IIS requests are sent to ASP. net, according to the path ASP. net to determine whether the static file of the request has been generated. If yes, the file is output directly. If no, the static page is generated and output by reading data. This method is the easiest to understand. It is easy to think of because of its low entry threshold.

This seems to solve the problem, but the new problem is coming. After the static page is generated, everyone will see the same and the database data is updated. What should I do now? At this time, if you do not want to make major changes to the system, the best way is to use a JS segment to replace the areas that need to be displayed by user. As for the static file update method after data update, you can create a set of policies. Of course, this does not solve all the problems. For example, if the overall style of the website needs to be changed, should it be all generated?

A static XML + XSLT solution was developed in the past few years to solve the problem of website style changes. Csdn has used this solution since its Forum revision (which year has been forgotten. This solution is the development of HTML static solutions. However, it seems that the effect is not very satisfactory. What problems will be encountered? The poor path has never been used and cannot be clearly stated. =!

2. Static Solution for dynamic pages as carriers

This solution is a derivative of Figure 1.1, replacing static files with aspx files. Now, we can solve the problem of updating the style and template. Because the generated file is aspx, you can use the. NET built-in template solution! Of course, if some parts of user-related data need to be displayed, there is no way to do this. You still need to use the JS call method. This solution is mainly used to solve the problem of updating the uniform style website style.

After the above processing, a web and a database can withstand certain access pressure. What is the pressure? In my experience, around 8000 PVS can be supported in 15 minutes. If there are more, such as, it is very difficult. Of course, the premise is that IFRAME cannot be found in your webpage or in the webpage that is accessed. Of course, it is also affected by factors such as the bandwidth, the configuration of the machine, and the distribution of user operations.

Ii. cache Solution

ASP. NET improves the off-the-shelf page cache solution. This kind of page cache scheme is also static in nature, but this part of static content is stored in the memory. From the memory and hard disk speed mentioned in the previous article, we can think of this solution, which is faster than static. This solution also requires specific display problems in some regions. You can use local static mode or js to call the solution. This method is not perfect, mainly because, once a large memory is cached, IIS will easily die when ASP. NET process pool is recycled.

Thus, the out-of-process cache is derived. The out-of-process cache stores cached data in another process and is out of IIS. This type of application is generally a Windows service. The local machine can use an anonymous pipeline, and the online machine can use remoting, socket, and other methods to exchange data with ASP. NET. This method is not efficient in IIS, but it is characteristic of stable operation. The most famous application is memcached. This method caches data instead of pages. The data is stored in the memory and is bound to the ASP. NET page. This is the biggest difference between this application and the preceding three methods.

At this point, we should just breathe a sigh of relief and solve all the problems. However, with the development of websites, increasing users and increasing access volume, the system has encountered bottlenecks.

(3)

(2) I spoke about the advantages and disadvantages of several static solutions. Some friends would like to explain in detail. Haha, this is not part of this section. Some friends also said that some websites are not suitable for static operations. This is the case. However, during this period, the website is still in the initial stage of development. In the initial stage, the number of website users is usually small, and most of them focus on providing consultation. A typical web1.0 system has a static solution closely related to this background. What kind of problems will the website encounter with its gradual development? This depends on the actual situation of website development. There are two types in general: 1. It refers to information. Users generally come from search engines and there are not many interactive tasks; 2. Use SNS or highly interactive products such as forums (this example is not used for download or text reading from forums ).

1. Content-based systems

For the first website that provides content, there are two situations. One is that the data capacity is too large and the database access speed is slow due to early design mistakes. The other is that there are too many visitors, resulting in insufficient IIS response, it is reported that the access speed is slow or the service unavailable error is reported. Or both cases occur.

When data access is slow, you need to optimize the database. Including optimizing query statements, optimizing database structures, and optimizing indexes. The optimization of tens of millions of data entries in a single table requires table sharding. Not available in versions earlier than sql2005. You do not need to use the built-in table partition function. Generally, data is stored in different tables by time. Then, use the view function to aggregate table queries. This method is much different from the table partition in sql2005, and the efficiency is far lower than that in sql2005. Why? For example, SQL2000 creates two tables with the same structure and stores data. Table 1 and Table 2 both have 5 million data records. During the query, It is not slow to filter data from table 1, filter data from table 2, merge data, and sort data by conditions or by single thread? Sql2005 allows you to place indexes in different partitions and operate on multiple threads. Because data is filtered and sorted in the process, the speed is still very fast. Of course, the premise is that the server has many cores. (Sql2005 table partitions can only be used in the server version .)

If the IIS response is slow or the service is unavailable, the bandwidth may be too small or the number of connections may be too large. I remember someone tested that the maximum number of TCP connections in IIS is about 8000, and Apache in Unix (or httpd forgot .) The maximum number of connections is over 10 thousand. It seems that it is the limitation of the TCP/IP stack in the operating system. I don't know much about this. If the Web Service is unstable due to exceeding this volume or other similar causes, you should add the server.

2. Highly Interactive System

Highly Interactive Systems are prone to high database concurrency. Many database operations are locked. The locks are stored in the system table. If the throughput of the system cannot meet the requirements, the locks will become faulty. You can think that a database can have up to 100 connections at a time (tested on the sql2005 server version ). If the number exceeds the limit, 101st will time out. If a statement takes a long time and is frequently operated, the database timeout error may easily occur.

If the database itself cannot meet the requirements of such a system, it can be solved with an interceptor. You also need to consider how to design an interceptor. Suppose there are 100 database operation commands per second, and these 100 commands are different, and the database can process these 100 commands in just one second. Now there are 101 commands per second, and the commands are still different. The commands generated per second are also different, so it is useless to implement the interceptor. There can be only one relief effect at most. Because an unhandled command is added every second.

Fig 2.1

Fortunately, many of the statements are repeated. For example, the interceptor 2.1 currently works the same way. In one second, it intercepts 101 commands and merges 20 statements into the same query content (generally a list page ), finally, 40 commands need to be operated, and then the command is executed. After obtaining the database, the 101 requests are distributed. That is to say, 101 jobs are compressed into 40 jobs.

You can also cache some infrequently changed data. For example, the document category and User Name (this depends on the Growth of registered users ). Change the model in figure 2.1 to figure 2.2.

Fig 2.2

Of course, the cache block can also be added to the web application. It is mainly used to store data that is not updated for a period of time. Of course, this cache has an expiration policy.

The cache can also help optimize SQL queries. For example, a joint query queries an article classification table and an article table. Only the article table can be queried, but the article table only has a category ID. What should I do when displaying the table? In the memory, a category dictionary is cached. The key is the category ID and the value is the category name. When displayed, you can use the category ID in the document to find it in the dictionary. This improves the efficiency of SQL statements.

For the case of large tables, refer to the first part of this article.

Now, there is another problem that cannot be solved in this article. That is the first part left behind. How can I deploy a new server?

(4)

(3) When it comes to adding servers, how many servers should be added and how to deploy the added server websites? The simplest way is to split an application.

I. Separating applications

Figure 1

1. Split the application and place it on different servers according to the pressure. The same is true for databases. Set different subdomain names for access. Images should be independent.

In this way, user access will be distributed to different servers. This advantage is obvious, and the pressure on websites is significantly increased. The disadvantage is that you have to re-develop the program.

2. Multiple copies

Figure 2

2. Use Server Load balancer to distribute the pressure to different servers. In this way, you do not need to modify the program. Because the pressure is not very huge, simply use the Server Load balancer that comes with windows to display the purpose.

Some local problems can be solved through caching. However, databases can process the pressure in three ways.

1. database/table sharding

The principle is similar to the separation of applications on websites. The application-related parts are split and placed on different database servers.

2. Distributed Computing

When you hook up a database, applications also need to be split. However, when accessing the database, you only need to access the primary database. Other databases become black boxes, so you do not need to know the details of other databases.

3. Distribution

The principle is the same as that of multiple copies on the website. Data is also distributed to multiple copies to access different copies. The disadvantage is that the data will not be synchronized. It takes some time to synchronize data between database servers.

When the traffic increases, you need to purchase professional devices or develop complex applications to solve the problem. For example, bigtable, mapreduce, F5, etc. If you have never touched those things, you can stop talking about them. This series of articles is a summary of your work over the past few years, and hope to help you. If there is a better method, I hope I can enlighten you.

(5)

This section has little to do with the content of the first four articles. It will mainly discuss the general nature of the website's evolution under pressure.

As a website administrator, when a website encounters a performance bottleneck, you may think about why the website encounters performance problems and how to solve them. In the first four articles, we talked about some methods, most of which are splitting applications or vertical partitioning. So why is vertical division?

Consider the following scenarios:

1. There is no performance bottleneck when installing windows on any server;

2. There are so many small websites around the world. If they are regarded as a whole, they will not encounter performance bottlenecks.

This indicates that the performance bottleneck occurs because users are concentrated on a limited server. As a website, we generally only have one or more domain names to facilitate our users or build a good brand. Then the problem arises. If a large number of users are treated as a river, but in the early stages of the website, the river is dry. With the development of the website, the water level in the river is getting higher and higher, and our website actually plays a role as a sluice. But when a sluice cannot be met, we use multiple locks for diversion, such as flood relief in some places during the 98-year flood. This type of application is vertical split. In the second scenario described above, we can see that there are n locks. However, the larger the water flow, the less the sluice, the greater the pressure.

In fact, I think the website itself is produced by the emergence of the Internet, and its limitations are also caused by the Internet solution. Assume that one of our domain names can be bound to N servers without meeting the performance requirements of web servers. That is to say, the server that the domain name points to is the sluice, and the water that the sluice can pass is limited. There are some ways to allow the sluice to open, expand, or speed up the sluice in other places through as much water as possible.

To add locks in other places is vertical segmentation, which imports user browsing to different subdomain names, and subdomain names can be bound to different servers. To expand the sluice, you need to buy better servers on the hardware so that the server can have better performance. To speed up the flow rate of water, we need to make the user connection time as short as possible. This is not only true for web games, but also for online games.

In fact, the most ultimate solution so far is to keep the user connection time as short as possible. Why? Theoretically, a website can be divided into any sub-application, but it is not easy to manage. For example, if you divide five applications, and one of the five applications produces performance problems, what should we do?

In the online game server (used as a private server and used as a legendary server), applications such as connecting and logging on to the game are separated. The user does not feel it when using it. It is like separating the database server of the website from the Web server does not make the user aware of the truth. In general development, cache and other methods are used to increase the response speed of user operations, but the user has not left the server. For example, if you direct a domain name such as www.a.com to an IP address such as 111.111.111.111 (assumed as server a), although the server of this address uses some methods in the LAN to make the user response faster, however, the user did not leave the server. Now, in a cluster-like way, the layer-4 TCP/IP protocol is used to send requests sent to server a to server B again, then the data received by server B is directly sent back to the client. In this way, the actual workload of server a is very small, so as to achieve the high load of the website, but in fact these loads have been distributed out.

If you want to go out and have dinner with friends, you can come here first ....

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Evolution of websites and databases

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support