Several key points of developing large-scale high-load Web site application

Source: Internet
Author: User

Looking at some people's so-called big project method, I feel is not to the point, a bit uncomfortable.

I also say my own opinion. I personally think it's hard to measure the size of a project, even if it's a simple application that is a challenge in high load and high growth situations. So, in my mind, let's say it's a matter of high load, high concurrency, or high growth, which needs to be considered. Many of these problems are not related to program development, but are closely related to the architecture of the system as a whole. It's closed.

Database


Yes, the first is the database, which is the first spof that most applications face. Especially the application of Web2.0, database response is the first to be resolved.

In general, MySQL is the most commonly used, may initially be a MySQL host, when the data increased to more than 1 million, then the efficiency of MySQL decreased dramatically. A common optimization measure is to replicate m-s (master-from), and to operate the query and operation on separate servers. I recommend the M-m-slaves way, 2 main MySQL, multiple slaves, it should be noted that although there are 2 master, but at the same time only 1 are active, we can switch at a certain time. The reason for using 2 m is to ensure that M does not become the spof of the system again. Slaves can be further load balanced and can be combined with LVS, thus balancing select operations to different slaves.

The above architecture can contend with a certain amount of load, but as the user increases further, your user table data exceeds 10 million, and that M becomes spof. You can not arbitrarily expand slaves, otherwise the cost of replication synchronization will rise in a straight line, how to do? My approach is to partition the table, from the business level. The simplest, for example, is the user data.

According to a certain way of segmentation, such as ID, segmentation to a different database cluster. A query that is used by the global database for meta data. The disadvantage is that each query, will be added once, such as you want to check a user Nightsailer, you first to the global database cluster to find the nightsailer corresponding cluster ID, and then to the specified cluster to find nightsailer actual data.

Each cluster can be used in m-m or m-m-slaves mode. This is a structure that can be expanded, and as the load increases, you can simply add new MySQL cluster in.

It is to be noted that:

1. Disable all auto_increment fields

2, the ID needs to adopt the common algorithm centralized allocation

3, to have a better way to monitor the MySQL host load and service running state. If you have more than 30 MySQL databases running, you know what I mean.

4, do not use persistent links (do not use pconnect), on the contrary, the use of sqlrelay this Third-party database link pool, or simply do it yourself, because the PHP4 in the MySQL link pool often problems.

Caching

Caching is another big problem, I generally use memcached to do cache cluster, generally deployed around 10 units (10g memory pool). Note that you must not use swap, it is best to turn off the Linux swap.

load Balancing/acceleration

It may be said that the cache, someone first thought is the page static, the so-called static HTML, I think it is common sense, does not belong to the main points. The static of the page comes with the static service
Load balancing and acceleration. I think Lighttped+squid is the best way out.

LVS <------->lighttped====>squid (s) ====lighttpd

I use it a lot. Note that I do not use Apache, unless specific requirements, otherwise I do not deploy Apache, because I generally use php-fastcgi with LIGHTTPD, performance than apache+mod_php much stronger.

Squid can be used to solve file synchronization and so on, but you need to be aware that you have to monitor the cache hit rate, as much as possible to improve the 90% or more. Squid and lighttped also have a lot of topics to discuss, here do not repeat.

Storage

Storage is also a big problem, one is the storage of small files, such as pictures. The other is the storage of large files, such as the index of search engines, general single files are more than 2g.

The simplest way to store small files is to combine lighttpd to distribute them. Or simply use the Redhat GFS, the advantage is the application of transparency, the disadvantage is higher costs. I am referring to your purchase of the disk array problem. In my project, the storage is 2-10TB, and I'm using a distributed store. This is to resolve file replication and redundancy. So each file has a different redundancy, this can refer to Google's GFS paper. Large file storage, you can refer to the Nutch scheme, now independent of the Hadoop subproject. (You can Google it)

Other:

In addition, passport and so on are also considered, but all belong to relatively simple. Just a little bit.



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.