The large website architecture here only includes highly interactive and interactive data-type large websites. based on the well-known reasons, we will not talk about news and some architectures that can be implemented through static HTML, for example, we exchange websites with high load, high data exchange, and high data mobility, such as domestic and Kaixin.com, and other similar web2.0 series architectures. We will not discuss here "> <LINKhref =" http://www.php100.com
The large website architecture here only includes highly interactive and interactive data-type large websites. based on the well-known reasons, we will not talk about news and some architectures that can be implemented through static HTML, for example, we exchange websites with high load, high data exchange, and high data mobility, such as domestic and Kaixin.com, and other similar web2.0 series architectures. We will not discuss PHP, JSP, or. NET environment, we look at the problem from the perspective of architecture, the implementation of language is not a problem, the advantage of the language lies in implementation rather than good or bad, no matter you choose any language, the architecture must face.
Here we will discuss the considerations for large websites.
1. massive data processing
As we all know, for some relatively small sites, the data volume is not very large. select and update can solve the problem we are facing, and the load is not very large, you can simply add a few more indexes. For large websites, the daily data volume may be millions. if a well-designed multi-to-many relationship exists, there is no problem in the early stage, but as users grow, the amount of data will increase at the geometric level. At this time, the cost of selecting and updating a table (not to mention multi-table joint queries) is very high.
2. Concurrent Data processing
In some cases, 2.0 of CTO have a powerful sword, that is, cache. For caching, high concurrency and high processing are also a big problem. In the entire application, the cache is shared globally. However, when we modify the cache, if two or more requests require updating the cache at the same time, the application will die directly. At this time, a good data concurrent processing policy and cache policy are required.
In addition, it is the database deadlock problem. we may not feel it at ordinary times. The probability of deadlock in high concurrency is very high, and disk cache is a big problem.
3. file storage problems
For some websites that support file upload for 2.0, we should consider how files should be stored and effectively indexed as the disk capacity increases. A common solution is to store files by date and type. However, when the file volume is massive data, if a hard disk stores 500 GB of trivial files, disk I/O is a huge problem during maintenance and usage. even if your bandwidth is sufficient, your disk may not respond. If uploading is involved at this time, the disk will easily be over.
Raid and dedicated storage servers may be used to solve the current problem, but there is still a problem with access from different regions. maybe our servers are located in Beijing, how can we solve the access speed in Yunnan or Xinjiang? If distributed, then how should we plan the file index and architecture.
Therefore, we have to admit that file storage is not easy.
4. Data relationship processing
We can easily plan a database that conforms to the third paradigm and is full of many-to-many relationships. we can also use GUID to replace indentify column. However, many-to-many relationships are in the 2.0 era, the third paradigm is the first one that should be abandoned. Multi-table join queries must be effectively minimized.
5. Data Index problems
As we all know, indexing is the cheapest and easiest way to improve database efficiency. However, in the case of high UPDATE, the cost of update and delete will be high. I have encountered a situation where it takes 10 minutes to UPDATE a focused index, these are basically intolerable for websites.
Indexing and updating are A natural enemy. problems A, D, and E are problems that we have to consider when building the architecture, and may also be the most time-consuming issue.
6. Distributed Processing
For 2.0 of websites, due to their high interaction, the CDN effect is basically 0, and the content is updated in real time, which is our regular processing. To ensure the access speed across regions, we need to face a major problem: how to effectively implement data synchronization and updates, real-time communication between servers in different regions is a problem that must be considered.
7. Analysis of Ajax advantages and disadvantages
AJAX has become a mainstream trend, and it suddenly finds that post and get based on XMLHTTP are so easy. The client can get or post data to the server. the server returns data after receiving the data request. this is a normal AJAX request. However, in AJAX processing, if we use a packet capture tool, the data return and processing will be clear at a glance. For some AJAX requests with a large computing volume, we can construct a attacker to easily eliminate a webserver.
8. Data Security Analysis
For HTTP, data packets are transmitted in plain text. we can say that we can use encryption, but for G problems, the encryption process may be plain text (for example, we know that QQ can easily determine its encryption, and effectively write the same encryption and decryption methods as others ). No one cares about you when your site traffic is not large, but when your traffic comes up, the so-called plug-in, the so-called group sending will come one after another (starting from the beginning of qq ). Maybe we can say that we can use a higher level of judgment or even HTTPS for implementation. Note that when you perform these operations, you will pay for a massive database, i/O and CPU costs. It is basically impossible for group sending. I have already been able to implement mass mailing of Baidu Space and QQ space. It is not difficult to try it.
9. data synchronization and cluster processing problems
When one of our databaseservers is overwhelmed, we need to implement database-based load and clusters. At this time, it may be the most disturbing problem. Data delay is a terrible problem based on the different designs of the database for network transmission. in this case, we need another means to ensure effective interaction within a few seconds or longer. Such as data hash, segmentation, and content processing.
10. Data Sharing channels and OPENAPI trends
Openapi has become an inevitable trend. from google, facebook, myspace to domestic schools, we are all considering this issue, it can effectively retain users and stimulate more interests of users, as well as help more people with the most effective development. At this time, an effective data sharing platform makes it an essential way to open the data platform, and ensures data security and performance in the case of open interfaces, it is another question that we must seriously think about.