10 Questions to consider for a large web site architecture

Source: Internet
Author: User

The large web site architecture here only includes highly interactive and highly interactive data-based large web sites, for everyone's well-known reasons, we do not talk about the news class and some rely on the static HTML to implement the architecture, we use high-load high data exchange high data mobility of the site, for example, at home, NET and other similar web2.0 series architecture. We do not discuss whether PHP or JSP or. NET environment, we look at the problem from the aspect of architecture, the realization of language is not a problem, the advantage of language is to achieve rather than good or bad, regardless of your choice of language, architecture must be faced.

Here's a discussion of the issues that big websites need to be aware of and consider

1, the processing of massive data

As we all know, for some relatively small sites, the amount of data is not very large, select and update can solve the problems we face, the load is not very large, up to a few more indexes can be done. For large sites, the amount of data per day may be millions, if a poorly designed many-to-many relationship, in the early days there is no problem, but as the user grows, the amount of data will be the geometric level of growth. At this time we have very high costs for a table of select and update (not to mention multi-table union queries).

2. Processing of data concurrency

At some point, 2.0 of the CTO has a imperial sword, which is the cache. For caching, high concurrency and high processing are also a big problem. Under the entire application, the cache is shared globally, but when we make changes, the application will die directly if two or more requests have updated requirements for the cache at the same time. At this point, you need a good data concurrency processing strategy and a caching strategy.

In addition, is the database deadlock problem, perhaps we do not feel, the deadlock in high concurrency in the case of the occurrence of the probability is very high, disk caching is a big problem.

3, the problem of file storage

For some 2.0 sites that support file uploads, we should be more concerned about how files should be stored and indexed effectively when we are thankful for the growing size of our hard drive. A common scenario is to store files by date and type. But when the volume of the file is a huge amount of data, if a hard disk storage of 500 g of trivial files, then maintenance and use of the disk IO is a huge problem, even if you have enough bandwidth, but your disk may not respond to come over. If this time also involves uploading, the disk is easily over.

Perhaps with raid and dedicated storage server can solve the problem at the moment, but there is a problem is the local access problems, perhaps our server in Beijing, may be in Yunnan or Xinjiang, how to solve the speed of access? If distributed, then our file index and how the schema should be planned.

So we have to admit that file storage is a very difficult problem.

4. Processing of data relations

We can easily plan a database that conforms to the third paradigm, which is filled with many-to-many relationships and can be replaced with a GUID indentify COLUMN However, many-to-many relationships flooded the 2.0 era, the third paradigm is the first should be discarded. Multiple-table federated queries must be effectively minimized.

5. Problems with Data indexing

As we all know, indexes are the cheapest and easiest solution to improve database efficiency queries. However, in the case of high update, the cost of update and delete can not be thought of, I have encountered a situation, in the update of a focused index when it takes 10 minutes to complete, so for the site, these are basically intolerable.

Index and update is a pair of natural enemies, the problem a,d,e these are the things we have to consider when we do architecture, and it may be the most time-consuming problem,

6. Distributed processing

For the 2.0 site because of its high interactivity, the CDN implementation of the effect is basically 0, the content is real-time updates, our regular processing. In order to ensure the speed of access around, we need to face a huge problem, is how to effectively achieve data synchronization and update, the realization of the real-time local server communication has to be considered a problem.

7. Analysis of the pros and cons of Ajax

Into Ajax, the Ajax,ajax became the mainstream trend, suddenly found that based on XMLHTTP post and get is so easy. It is a normal AJAX request that the client get or post to the server data, and the server returns to the data request. But when it comes to Ajax processing, if we use a grab kit, it's clear that the data is being returned and processed. For some computationally large AJAX requests, we can construct a contract machine that can easily kill a webserver.

8. Analysis of data security

For the HTTP protocol, the packet is in clear text transmission, perhaps we can say we can use encryption ah, but for the G problem, the encryption process may be clear text (such as we know QQ, can easily judge his encryption, and effectively write a similar to his encryption and decryption method out). When your site traffic is not very big time no one will care about you, but when you flow up, then so-called plug-in, so-called Mass will follow (from the beginning of the QQ mass visible clues). Perhaps we can very much say that we can use a higher level of judgment or even HTTPS to achieve, note that when you do these processing will pay a huge amount of database,io and the cost of the CPU. For some mass, it is basically impossible. The author has been able to achieve for Baidu Space and QQ space group. It's not really difficult for everyone to try.

9, data synchronization and the problem of cluster processing

When one of our databaseserver is overwhelmed, we need to do database-based workloads and clusters at this time. And this time may be the most troubling problem, the data based on the network transmission according to the design of the database, data latency is a terrible problem, but also the inevitable problem, so we need to use another means to ensure that the delay in a few seconds or a few minutes longer, to achieve effective interaction. such as data hashing, segmentation, content processing and so on.

10. Data sharing channels and OPENAPI trends

Openapi has become an unavoidable trend, from the google,facebook,myspace to the domestic school, are considering this problem, it can more effectively retain users and inspire users more interest and let more people help you do the most effective development. At this time an effective data sharing platform, the data open platform is an indispensable way, and in the Open interface to ensure the security and performance of data, but also we have to seriously think about the problem.

10 Questions to consider for a large web site architecture

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.