Large Web site bottlenecks and solutions

Source: Internet
Author: User

Big Web sites are facing problems:
    • Massive data processing

The volume of data on a large website can be millions, even tens of thousands or more per day. If there are many-to-many relationships that are poorly designed, there may be no problems upfront, but as users grow, the amount of data increases exponentially. At this point, the cost of a table's Select and update (not to mention multi-table union queries) is very high.

    • Data concurrency Processing

The probability that deadlocks exist in high concurrency is very high, and using caching is still a big problem. Because the cache is globally shared across the scope of the application. When two or more requests update the cache at the same time, although we have the lock mechanism, but it is not very effective, the application will die directly.

    • File storage Issues

Disk I/O is a huge problem when it comes to massive amounts of data, and if you have enough bandwidth, your disk may not respond. If this time also involves uploading, the disk is easily over.

    • Data relationship Processing

In the Web2.0 era, most of the data relations are many-to-many relationships, involving mostly multi-table joint queries. If avoiding is a problem.

    • Data indexing issues

Indexing and updating are a pair of contradictions. Cheap indexes can bring high-cost update.

    • Distributed processing

In order to ensure the speed of local access, how to effectively achieve data synchronization and update, the realization of real-time local server communication is a big problem.

    • Ajax pros and cons

Ajax uses simple post and get data transfer, using HTTP Debuger crawl data, but there is a risk of attack.

    • Data security

Large sites face the danger of the main plug-in, mass, etc., such as the use of verification code, the user experience is a very unexpected impact.

    • Data synchronization and cluster processing

When the database server is overwhelmed, you need to do database-based workloads and clusters. At this point, you may encounter the most troubling problem: depending on the design of the database, data latency occurs based on network transmission. This is a horrible question, and it is inevitable. As a result, we need to use additional means to ensure effective interaction within a few seconds or longer of this delay. such as data hashing, segmentation, content, asynchronous processing and other issues.

    • Open API and Data sharing

Open API has become an unavoidable trend, from google,facebook,myspace to domestic, school, are considering this problem, it can more effectively retain users, and inspire users to participate more, so that more people to help you do the most effective development.

    • Performance barriers with a large number of like,or,in and multi-table queries
    • Massively uploading file attacks
Solution Solutions 
Set the Web2.0 website user level to three, millions (M), tens (S), and billions of levels (Q). If a full table query, you can use partitioned views, table index processing.

For M-level, the main response is the I/O problem: The database file files partition disk storage (not partitions, is a different hard disk), depending on the size of the load, we can properly control the number of hard disks and the number of file partitions. for the S-level, a simple modification of the registration and warehousing process is required. Solutions are data hashing and partitioned views.

There are three types of common scenarios. The first is the equal capacity expansion method: On the basis of user Registration control, ensure that the user capacity of each library is not more than 5 million, more than after the second library, and so on. This scheme guarantees effective scalability, but does not guarantee that the data will be indexed effectively. The second is the common area indexing scheme, which is in fact similar to the first scheme. However, the first scheme was optimized reasonably and the storage of the database was carried out according to the user name. If you set up 26 databases, follow the user name index to control which library the user data is in. If the user name is Crazycode, the data for that user name is stored in user table C. Scenario three is a more modeled scheme, which encodes the user ID. We use a serialized scheme to store the user name in encoded form, such as Crazycode according to C,r,a,...... The storage is a digital index, which is then partitioned for storage. Digital types of data can be queried, updated, and shared more efficiently in the database, which is the combination of scenario three and scenario two. for q level, the temporary data table can be stored according to the weights of user activity and the amount of data. In the case of non-accidental data, the number of users logged in per day will not be tens of millions. With a simple data agent, a temporary user authentication database, a daily batch processing, a high-activity user account is summarized into the temporary database. Query the time to query the temporary library, if not, then the whole library query. a more advanced query scheme, the data caching service, is to store the most common and direct data directly in the cache server, and the cache server periodically obtains and updates the information from the master server. In more depth, the cache server can be cached two times, that is, one-time processing input and put into the list data, as a global variable into memory for querying, while using a hash table or an array of data index, according to the query distribution to each variable, directly from within the data read.    

Large Web site bottlenecks and solutions

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.