Knowledge Summary of High-load and high-concurrency website architecture-recognition of high-traffic website architecture

Source: Internet
Author: User
Tags zend framework
Knowledge about high-load and high-concurrency website architecture-Overview of high-traffic website architecture [others] Post by wrong-T/Monday

I. Hard Architecture

 

1: Data Center Selection:

When selecting a data center, you can select a China Netcom or China Telecom Data Center Based on the regional distribution of website users. However, it is more appropriate to select a dual-line data center. The larger the city is, the more expensive the data center price is. From the perspective of cost, servers can be hosted in small and medium-sized cities. For example, companies in Guangzhou can consider hosting servers in Dongguan, Foshan, and other places, it is not very far away, but the price will be much cheaper.

2: bandwidth size:

Usually, when the boss asks us to construct a website, they will give us some goals, such as the website's daily capacity to withstand 1 million PVS. In this case, we need to estimate the amount of bandwidth required. The bandwidth calculation mainly involves two indicators (peak traffic and page size). We may make the necessary assumptions before calculation:

First, assume that the peak traffic is 5 times the average traffic.
2. Assume that the average page size for each access is about 100 kb.

If 1 million PV traffic is evenly distributed within one day, it is equivalent to about 12 visits per second. If the average size of each page accessed is about 100 kb, the total number of these 12 visits is about 1200 kb. The unit of bytes is byte, while the unit of bandwidth is bit. The relationship between them is 1 byte = 8 bit, therefore, 1200 K bytes is equivalent to 9600 K bit, that is, 9 Mbps. In actual situations, our website must be able to maintain normal access during peak traffic, therefore, based on the assumed peak traffic, the actual bandwidth needs should be around 45 Mbps.

Of course, this conclusion is based on the two assumptions mentioned above. If your actual situation is different from these two assumptions, the results will also be different.

3: Server Division:

Let's look at the servers we all need: image servers, page servers, database servers, application servers, log servers, and so on.

For websites with a large traffic volume, it is necessary to separate individual image servers from page servers. We can use Lighttpd to run image servers and Apache to run the Page Server, of course, you can also choose other, or even, we can expand to many image servers and many page servers, and set related domain names, such as img.domain.com and www.domain.com, the image paths on the page all use absolute paths, such as , and then set DNS round robin to achieve the initial load balancing. Of course, when there are more servers, it will inevitably involve a synchronization problem. This can be done using rsync software.

Database servers are the top priority, because the bottleneck of the website lies in the database. Currently, most small and medium-sized websites use MySQL databases, but its cluster function does not seem to have reached the stable stage, so we will not comment on it here. Generally, when using the MySQL database, we should create a master-slave (one master multiple slaves) structure. The master database server uses the InnoDB table structure and the data server uses the MyISAM table structure, give full play to their respective advantages, and this master-slave structure separates read and write operations, reducing the read operation pressure. We can even set a dedicated slave server as a backup server, convenient backup. Otherwise, if you only have one master server, mysqldump is basically useless in the case of a large amount of data. If you copy data files directly, you have to stop the database service before copying it, otherwise, an error occurs in the backup file. However, for many websites, even if the database service is stopped for only one second, it is unacceptable. If you have a slave database server, you can stop the service (slave stop) before backing up data, and then start the service (slave start) the slave server automatically synchronizes data from the master server. However, the master-slave structure also has a fatal drawback, that is, the master-slave structure only reduces the read operation pressure, but does not reduce the write operation pressure. In order to adapt to a larger scale, only the last step may be left: Split the database horizontally/vertically. The so-called horizontal split database stores different tables on different database servers. For example, the user table is saved on database a server, and the article table is saved on Database B server, of course, this split is costly. The most basic thing is that you cannot perform operations such as left join. Vertical Split databases generally refer to servers that divide data storage by user ID (user_id). For example, we have five database servers, if "user_id % 5 + 1" is equal to 1, it is saved to Server 1. If it is equal to 2, it is saved to server 2. Similarly, there are many vertical separation principles, you can select as needed. However, just like horizontal split databases, Vertical Split databases also have a cost. The most basic thing is that we will have a lot of trouble in performing summary operations such as Count and sum. To sum up, the database server solution is generally a hybrid solution based on the situation, so as to take advantage of various solutions. Sometimes, third-party software such as memcached needs to be used, in order to meet the requirements of higher traffic volumes.

It would be most appropriate to have a dedicated application server to run php scripts, so that our page server can only save static pages, you can set domain names such as app.domain.com on the application server to be different from those on the page server. For application servers, I prefer to use Apache in the prefork mode. With necessary PHP cache software such as xcache, the fewer modules, the better. In addition to necessary modules such as mod_rewrite, unnecessary things are discarded to minimize the memory consumption of httpd processes. Lighttpd or tux can be used for static content such as image servers and page servers to give full play to the characteristics of various servers.

If conditions permit, an independent log server is also necessary. Generally, a small website combines a Page Server with a log server, in the early hours of the morning, cron runs the log computing of the previous day. However, if you use log analysis software such as AWStats, even if you archive logs by day, it also consumes a lot of time and server resources for computing. Therefore, it is good to separate a separate log server. This will not affect the running status of the formal server.

Ii. Soft Architecture

1: Framework selection:

There are many options for the current PHP framework, such as CakePHP, symfony, and Zend framework. As for which one should be used, there is no unique answer, it depends on the knowledge of the team members on each framework. In many cases, even if you do not use a framework, you can write a good program. For example, you can use a class library such as pear + smarty to write the code. So, do you use a framework or framework, generally, it is not the most important thing. What is important is that we should have a framework awareness in our programming ideas.

Currently, the. NET Framework has many options, such as cnforums,. Text, Cs, and Castle.

2: logical hierarchy:

After the website scale reaches a certain level, various logic in the code will be entangled, which will bring huge obstacles to maintenance and expansion. At this time, our solution is actually very simple, that is, restructuring, layer the logic. Generally, top-down data can be divided into presentation layer, application layer, domain layer, and persistence layer.

The so-called performance layer is not only a template, but also a wider scope. All performance-related logic should be included in the scope of the performance layer. For example, the font at a place must be displayed in red, and the start of a place must be empty. Most of the time, the common mistake we make is to put the logic of the performance layer to other layers to complete. Here is a very common example: when we display the title of an article on the list page, we will set a maximum number of words. Once the title length exceeds this limit, it will be truncated and ".. ", this is the most typical presentation layer logic, but in actual situations, many programmers complete data acquisition and truncation in non-presentation layer code, and then assign values to the presentation layer template, the most direct disadvantage of such code is the same piece of data. on this page, I may want to display the first 10 words, and on another page, I may want to display the first 15 words, once we fix this word in the program, we lose portability. The correct method is to create a view Assistant Program to deal with such logic. For example, the truncate in smarty belongs to this view Assistant (but its implementation is not suitable for Chinese ).

The application layer is mainly used to define what users can do and feedback the operation results to the presentation layer. As for how to do it, it is usually not its responsibility scope (but the responsibility scope of the domain layer), it will assign how to do the work to the domain layer for processing. On the website using the MVC Architecture, we can see a URL similar to the following: domain.com/articles/view/123. Its internal code implementation is generally a articles controller class, which contains a view method, this is a typical application-layer operation because it defines the actions that users can view. In the MVC Architecture, there is a rule that says: Rich model is good. The implication is that the Controller should be "thin", which means that the application layer should be as simple as possible and should not include Logic involving domain content.

The most direct explanation of the so-called domain layer is the layer that contains the domain logic. It is the soul of a software. Let's take a look at what the domain logic is. Simply put, the logic with a clear domain concept is the domain logic. For example, if we get money on an ATM machine, the process is roughly like this: inserting a UnionPay card, enter the password, enter the withdrawal amount, OK, take the money, and then the ATM will spit out a transaction receipt. In this process, the bank card transfers money from the account in the ATM machine is a domain logic, because money acquisition is a clear domain concept in the bank, however, the ATM is not a domain logic, but an application logic. It is only a technical means, correspondingly, it is also possible to send a reminder text message without spitting out the transaction receipt after obtaining the money. However, this is not the case. In actual conditions, we require that the transaction receipt be spit out after the money is withdrawn, that is to say, sp transactions are already closely integrated with withdrawals. You can also regard sp transactions as part of the logic of the domain. Everything depends on the specific circumstances of the problem. In Eric's classic domain-driven design, the domain layer is divided into five basic elements: entity, value object, service, factory, and warehousing. For details, refer to the introduction in the book. The most common mistake at the domain layer is to expose the logic that should belong to the domain layer to other layers. For example, in a CMS system, the definition of popular articles is as follows: browsed more than 1000 times a day, and commented more than 100 times. Such an article is a hot one. For a CMS, the term popular articles is undoubtedly an important field concept. How can we design this logic? You may give the following code: "select... from... where browsing> 1000 and comment> 100 ", yes, this is the simplest implementation method, but here you need to note that" more than 1000 views are browsed every day, the logic of this important field is hidden into SQL statements. SQL statements obviously do not belong to the domain layer. That is to say, our domain logic is leaked.

The persistence layer stores our domain model in the database. Because our program code is object-oriented, and databases are generally relational databases, we need to flatten the domain model to save it to the database, but in PHP, till now there has not been a very good Orm, so there are not many solutions in this regard. Refer to Martin's Enterprise Application Architecture Model Book, the following methods can be used: Row data gateway or table data gateway ), or combine the domain layer and the persistent layer into an active record.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.