A few issues to be considered in large Web site architecture
Source: Internet
Author: User
KeywordsWe when large Web sites cache for
This two-day machine is broken, is being sent to repair, write a series of large http://www.aliyun.com/zixun/aggregation/11116.html "> website architecture articles, Hope to have a career in the Internet to make some of the webmaster friends help.
Note: The large Web site architecture here only includes highly interactive, highly interactive data-type large Web sites, for all the reasons we know, we don't talk about news classes and some architectures that rely on HTML to be static, for example, with high load and high data exchange, and highly mobile web sites, such as at home, Happy net and other similar web2.0 series architecture. We don't talk about PHP or JSP or. NET environment, we look at the architectural aspects of the problem, the implementation of language is not a problem, the advantage of language is to achieve rather than good or bad, regardless of any language you choose, the architecture is to be faced.
Text to the point:
First, discuss the issues that need to be noted and considered by large Web sites
A. Massive data processing.
As we all know, for some relatively small sites, the amount of data is not very large, select and update can solve the problem we face, the amount of their own load is not very large, add a few more indexes can be done. For large web sites, the amount of data per day may be millions, if a poorly designed many-to-many relationship, in the early stage is no problem, but as the user's growth, the amount of data will be the geometric level of growth. At this point we have a very high cost for the select and update of a table (not to mention a table-joint query).
B. Data concurrency processing
At some point, 2.0 of CTO have a sword, which is caching. For caching, it is also a big problem when dealing with high concurrency. Caching is shared globally across the entire application, but when we make changes, the application will die directly if two or more requests are simultaneously updated on the cache. This time, you need a good data concurrency processing strategy and caching strategy.
In addition, is the database deadlock problem, perhaps usually we do not feel, deadlock in the case of high concurrency probability is very high, disk caching is a big problem.
C. Issues of document storage
For some of the 2.0 sites that support file uploads, we should be more concerned about how files should be stored and indexed efficiently when the hard drive is getting bigger. A common scenario is to store files by date and type. But when the volume of files is massive data, if a hard disk storage 500 g of trivial files, then the maintenance and use of the disk IO is a huge problem, even if you have enough bandwidth, but your disk may not respond to come. If this time also involves uploading, the disk is easily over.
Perhaps with raid and dedicated storage servers to solve the current problem, but there is a problem is the access issues around, perhaps our servers in Beijing, may be in Yunnan or Xinjiang, how the speed of access to solve? If distributed, then our file index and how the architecture is planned.
So we have to admit, file storage is a very difficult problem.
D. Processing of data relations
We can easily plan a database that conforms to the third paradigm, which is full of many-to-many relationships and can replace indentify COLUMN with a GUID. But, in the 2.0 era of many-to-many relationships, the third paradigm is the first one to be discarded. Multiple-table joint queries must be effectively minimized.
E. Problems with data indexing
As we all know, indexing is the cheapest and easiest solution to improve database efficiency queries. However, in the case of high update, the cost of update and delete can not think, I encountered a situation, in the update of a focus on the index of 10 minutes to complete, then for the site, these are basically intolerable.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.