Key points for developing large-scale and high-load website applications

Last Update:2018-12-06 Source: Internet

Author: User

Tags mysql host

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

After reading some people's so-called large projects, I feel a little uncomfortable without talking about ideas.
I also talk about my own views. I personally think it is difficult to determine whether a project is large or not,
Even simple Application High Load and high growth are both challenges.
Many of these problems need to be considered in high concurrency or high growth. Program Development But not the whole. System Of
The architecture is closely related.

Database

that's right. The first is data database, which is the first spof facing most applications. In particular, Web 2.0 applications. To respond to a Database, resolved .
in general, my SQL is the most commonly used, probably initially a mysql host. When data increases to more than 1 million,
the MySQL Performance drops sharply. The common optimization is the M-S (master-slave) mode for Synchronous replication, the query and operate on different
server . I recommend the m-slaves method, with two primary MySQL instances and multiple Server Load balancer instances. Note that although there are two master nodes,
but only one of them is active at the same time. We can switch between them at a certain time. 2 m is used to ensure that M will not become the spof of the system.
Server Load balancer can further balance the Select Operation to different Server Load balancer instances by combining LVS.

The above architecture can compete against a certain amount of load, but as the number of users increases, your user table data exceeds 10 million, then the M becomes
Spof. You cannot expand slaves at will. Otherwise, the overhead of replication synchronization will rise. What should you do? My method is Table Partitioning,
Partitions on the business layer. The simplest is to use user data as an example. Based on a certain splitting method, such as ID, it is split into different database clusters.
The global database is used to query meta data. The disadvantage is that each query will be added. For example, if you want to query a user, you must first
The global database Group finds the cluster ID corresponding to the blacklist server, and then finds the actual data of the blacklist server in the specified cluster.
Each cluster can be m-m or m-slaves.
This is a scalable structure. As the load increases, you can simply add new MySQL clusters.

Note that:
1. Disable all auto_incrementField
2. The ID must use a commonAlgorithmCentralized allocation
3. You must have a good way to monitor the load of the MySQL host and the running status of the service. If you have more than 30 MySQL Databases running, you will understand what I mean.
4. Do not use persistent links (do not use pconnect). On the contrary, use a third-party database connection pool such as sqlrelay, or simply do it by yourself, because MySQL in PhP4
Connection Pool issues frequently.

Cache

Caching is another big problem. I generally use memcached as a cache cluster. Generally, about 10 servers are deployed (10 Gb memory pool ). Note that you must never use it.
Swap, it is best to disableLinux.

Load Balancing/acceleration

Some people may think about caching first.PageStaticThe so-called staticHtmlIn my opinion, this is common sense and is not a key point. Static Page brings about static services.
Load Balancing and acceleration. I think lighttped + squid is the best method.
LVS <-------> lighttped ==> squid (s) === Lighttpd

I often use the above. Note: I have no useApacheI do not deploy Apache unless specified, because I usually use PHP-FastCGI with Lighttpd,
The performance is much better than Apache + mod_php.

Squid can be used to solveFileBut you must be aware that you need to monitor the cache hit rate well and increase the hit rate by more than 90% as much as possible.
Squid and lighttped have many topics to discuss.

Storage

Storage is also a big problem. One is the storage of small files, such as slice. The other is the storage of large files, suchSearchEngine indexes, generally more than 2 GB for a single file.
The simplest way to store small files is to use Lighttpd for distribution. Or simply use the RedHat GFS, the advantage is that the application is transparent, the disadvantage is that the cost is high. I mean
You have purchased a disk array. In my project, the storage volume is 2-10 TB, and I use distributed storage. The file replication and redundancy should be solved here.
In this way, each file has different redundancy. For details, refer to Google's GFS paper.
For more information about the storage of large files, see the nutch solution. Now it is an independent hadoop sub-project. (You can Google it)

Others:
In addition, passport and so on are also considered, but all of them are relatively simple.

After dinner, don't write anything.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More