Large Web site Architecture Learning notes

Last Update:2017-04-25 Source: Internet

Author: User

Tags message queue browser cache

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Objective

Two books have been read recently:

1, Hae Teacher's "large-scale website technology Architecture core principle and Case analysis"

2, Zeng Xianjie Teacher's "large-scale website system and Java middleware practice"

Saw and combined with their current project to think, feel benefited, benefited a lot, their understanding of large web site has deepened a lot, the following to share their own study notes.

Study Notes

1, the history of large-scale website architecture (red letter is the key to each step of the development process)

(1) from a small web site, a server, applications, databases, files and all the resources on a single server

(2) The development of the website business, a server gradually can not meet the demand, so to separate applications and data, application and data separation after use of three servers: Application server, file server and database server

(3) Further development of the website, the database pressure is too large to lead to access delays, so use the cache to improve site performance (remember, using caching is the first step to improve the performance of the site), There are two types of caches used by Web sites: Local caches cached on application servers and remote caches on dedicated distributed cache servers

(4) using the cache, the database access pressure is effectively mitigated, but the application server during peak site visit is the bottleneck of the whole website. This time to understand, do not attempt to replace the more powerful server, for large sites, no matter how powerful server, no need to meet the continued growth of the business needs of the site, so you can increase the server way to improve load pressure , and then through the Load Balancer scheduling server To distribute access requests from the user's browser to a server in the application server cluster

(5) Although the use of the cache can make most of the data do not go to the database, but the cache does not hit, cache expired data or will go to the database, the site reached a certain scale, the database reading and writing pressure will be very large, become the bottleneck of the website. At this time can use the database read and write separation to improve the database load pressure, application server write data write library, application server read Data library , the majority of mainstream database currently provides master-slave hot standby function, through the configuration of two database master-slave relations, You can synchronize data updates for one database server to another server

(6) With the continued development of the website business, the user scale is increasing, because of China's complex network environment, different regions of the users to visit the site, the speed difference is also great. Therefore, reverse proxies and CDNscan be used to speed up user access and, on the other hand, reduce the load pressure on the back-end servers because the rationale behind the reverse proxy and CDN is caching

(7) The database after read and write separation, by a server split into two servers, but still can not meet the needs of the Web site traffic, so the use of distributed databases , the main split is a business sub-library, the different business data deployed on different physical servers

(8) Large-scale web site in order to cope with increasingly complex business scenarios, you can use divide and conquer to split the entire site's business into different applications , each application is deployed independently, can be established through a chain of hyperlinks, or through Message Queuing for data distribution

Large-scale web site development here, basically most of the technical problems are solved

2, the key of high-performance website: Control the amount of concurrency. As long as you can do this, a lot of tricky data problems are not a problem.

3, do not attempt to solve all problems through technology, business problems can also be resolved by means of business

For example, 12306, the beginning of the establishment, 0 ticket sales, the site to withstand tens of millions of of the traffic, directly resulting in 12306 of this site crashes, professional and professional professionals and professionals of different opinions, ideas. But will this only solve the problem through technology? Therefore, in response to this demand, 12306 not only to improve its technical structure, but also to adjust its business structure, do not 0-point ticket sales, in the form of ticket to introduce queuing mechanism, the whole point of ticket sales to the time of the ticket sales, concurrency control, the overall performance of the site improved

4, the development of computer software is an important goal and driving force is to reduce the coupling of software, the less the relationship between things, the less affect each other, the more independent development

5. Asynchronous architecture is a typical producer-consumer model

6. There are several benefits of using an asynchronous queue

(1) Improve the usability of the system

(2) Speed up website access

(3) Eliminate concurrent access spikes

7, Web site scalability means to continuously add servers to the cluster to alleviate the increasing user access pressure and the growing demand for data storage

8. Metrics to measure the scalability of the architecture

(1) Whether more than one server architecture cluster is available

(2) It is easy to add a new server to the server

(3) Whether the added server can provide services that are not differentiated from the original server

(4) Whether there is a limit to the total server contained in the cluster

9, the reaction system busy idle important indicator load

System load, also known as load, is the sum of the number of threads currently being executed by the CPU and waiting to be executed by the CPU, which is an important indicator of how busy the system is. With multicore CPUs, the perfect situation should be that all CPUs are in use and no threads are waiting to be processed. The load value is lower than the number of CPUs, indicating that the CPU is idle, the resource is wasted, and the load value is higher than CPU data, indicating that the process is waiting for CPU scheduling and wasting resources

10, browser access optimization means

(1) Reduce HTTP requests, and CSS, JS, pictures, do not initiate multiple HTTP requests to get these data

(2) using browser cache, storing static resources, Cache-control, Expires, Pragma, last_modified, and cached HTTP headers

(3) Enable compression, effectively reduce the amount of data transmitted by the communication

(4) CSS placed above, JS placed below, because JS download will be executed immediately, may block the page loading speed

(5) Reduce cookie transmission

11, the cache uses a few details

(1) Frequently modified data do not write cache, generally read and write ratio of at least 2:1 to do the cache, that is, a write at least two times above the read, such as Sina Weibo, the popular microblogging, a write may be read millions of times, it is greatly cost-effective

(2) If access is not hot, most of the data access is not concentrated on a small portion of the data, then the cache is meaningless, because the cache has a failure mechanism, most of the data has not been re-accessed by the extrusion cache

(3) data that tolerates a certain amount of time is inconsistent, unless the cache is notified immediately when the data is updated, but it also leads to a problem of overhead and thing consistency

(4) Use of distributed cache clusters for improved cache availability

(5) The new boot cache does not have any data, in the process of rebuilding the cache, the system performance and database load is not very good, so according to the project, according to the business, a part of the data will be loaded at startup, which is the cache preheating

(6) The cache to do invalid parameters and set the expiration time, to avoid improper business or malicious attacks frequently call the interface query database, once a key value database can not find data to enter the invalid cache, a period of time to access the key value no data return

12, the message queue has a good peak shaving effect (mentioned above), but note the need to properly modify the business process to cooperate

By asynchronous processing, things that are generated by a short period of high concurrency are stored in the message queue, which can flattened the peaks of concurrency. Note, however, that since data is returned to the user immediately after it is written to the message queue, the data may fail in subsequent business checks, write databases, and so on, so that after the use of Message Queuing for business asynchronous processing, the business process needs to be modified appropriately to match, such as after the order is submitted, the order data is written to the message queue Can not immediately return to the user order submission success, need to be in the Message Queuing order consumer process to really finish the order, and even after the merchandise out of the library, and then by e-mail or SMS message to notify the user order success, avoid trading disputes.

Large Web site Architecture Learning notes

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More