Reading Note 4: High-performance architecture of a website with instantaneous response

Last Update:2016-03-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Major website technical architecture reading Note 4: High-Performance Architecture Performance Analysis of instantaneous response websites: check the logs of each link in request processing and analyze which link has an unreasonable response time, check the factors that affect the performance of monitoring data analysis. 1. test the website performance.

(1) Performance test indicators: ① response time; ② concurrency; ③ throughput; ④ performance counters;

(2) Performance Testing methods: ① performance testing; ② load testing; ③ stress testing; ④ stability testing;

(3) performance optimization policies:

① Performance analysis: check the logs of each request processing link, analyze which link has an unreasonable response time, and check the factors that affect the performance of monitoring data analysis;

② Performance optimization: Web front-end optimization, application server optimization, and storage server optimization;

II. Web front-end performance optimization

(1) browser access optimization:

① Reduce http requests:Because http is stateless, the overhead of each request is expensive.(Communication links need to be established for data transmission, and the server needs to start an independent thread to process each http request ); the main means to reduce http is to merge CSS, JS, and images (CSS Genie, using offset to locate the image );

② Use browser Cache: Set the Cache-Control and Expires attributes in the http header;

③ Enable compression: Gzip compression can be enabled for html, css, and js files to achieve high compression efficiency. However, compression will put pressure on servers and browsers;

④ Put CSS at the top of the page and JS at the bottom of the page:The browser starts rendering the entire page after downloading the complete CSS.Therefore, it is best to place CSS on the top of the page; andThe browser will execute it immediately after loading JS, which may block the entire page and cause slow page display.Therefore, it is best to put JS at the bottom of the page;

⑤ Reduce Cookie transmission: On the one hand, too many cookies will seriously affect data transmission; on the other hand, sending cookies for access to some static resources (such as CSS and JS) is meaningless;

(2) CDN acceleration:

CDN (Content Delivery Network) is stillCache, ItCache data in the nearest place to the userAllows you to obtain data as quickly as possible. The so-called"First hop for network access", As shown in:

　　CDN only caches frequently accessed hotspot content (such as images, videos, CSS, and JS scripts ).To greatly speed up user access and reduce the load on the data center.

(3) reverse proxy:

The reverse proxy server is located in the website data center. the proxy website Web server receives Http requests and forwards the requests, as shown in:

The reverse proxy server has the following features:

① Protect website security: any requests from the Internet must first go through the proxy server;

② Accelerate Web requests by configuring the cache function: reduce the load pressure on the real Web server;

③ Achieve load balancing: Evenly distribute requests and balance the load pressure of each server in the cluster;

III. Application Server Performance Optimization

(1) distributed cache:

PS:The first law of website performance optimization:Cache optimization is preferred. Cache refers to storing data in a relatively high access speed storage medium (such as memory) for the system to quickly process and respond to user requests.

① Cache is essentiallyMemory Hash table, Data is stored in the memory in the form of (Key, Value.

② Cache is mainly usedStore data with high read/write ratio and few changesSuch as product category information and popular product information. In this way, when the application reads data, it first obtains the data in the cache. for example, if the data in the cache does not exist or becomes invalid, it then retrieves the data from the database and writes the data into the cache for the next access. Therefore, you canImproves system performance, increases data reading speed, and reduces storage access pressure..

③ Distributed Cache architecture: On the one hand, it is represented by JBoss CacheCommunicationSchool, represented by Memcached.Non-communicationSchool;

JBoss Cache needs to synchronize the Cache information to all machines in the cluster at a high cost. Memcached adopts a centralized Cache cluster management, and the Cache is deployed separately from the application.Consistent Hash AlgorithmWhen the cache server is selected to remotely access the cached data, the cache server does not communicate with each other. Therefore, the cluster size can be easily expanded, providing good scalability.

Memcached consists of two core components: the server (MS) and the client (mc), in a memcached query, mc first calculates the hash value of the key to determine the location of the kv pair in ms. After ms is determined, the client sends a query request to the corresponding ms to find the exact data. Because there is no interaction or multicast protocol between them, the impact of memcached interaction on the network is minimized.

(2) asynchronous operations:

① UseMessage QueueAsynchronization of calls can improve the scalability of websites and improve website performance;

② Message queue hasLoad shifting-> Store transaction messages generated in a short period of time and high concurrency in the message queue to flatten the concurrent transactions during the peak period;

PS:Anything that can be done later should be done later.. The premise is: this can be done later.

(3) cluster usage:

① In high concurrency scenarios, useServer load balancerTechnology is used to build a server cluster composed of multiple servers for an application;

② It can avoid slow response of a single server due to heavy load, so that user requests haveBetter response latency;

③ Server load balancer can use hardware devices or software load. Commercial hardware load devices (such as the famous F5) are usually expensive (a unit of hundreds of thousands or even millions is normal), so we use soft load when conditions permit, the two core problems solved by soft load are: who to choose and who to forward, the most famous of which isLVS(Linux Virtual Server ).

PS:LVS is a layer-4 server load balancer, that is, built on the layer 4-transport layer of the OSI model. we are familiar with TCP/UDP on the transport layer. LVS supports TCP/UDP server load balancer.

LVS forwarding is mainly implemented by modifying the IP address (NAT mode, which includes modifying the SNAT of the source address and the DNAT of the target address), and modifying the target MAC (DR mode. For more information about LVS, see: http://www.importnew.com/11229.html

(4) code optimization:

① Multithreading: the reason for multithreading: I/O blocking, and multiple CPUs are used to maximize the use of CPU resources, improve system throughput, and improve system performance;

② Resource reuse: The purpose isReduce the creation and destruction of system resources with high overhead, Mainly using two modes: Singleton and Object Pool ). For example, in. NET development, the commonly used thread pool and database connection pool are essentially object pools.

③ Data structure: reasonable use of the appropriate data structure in different scenarios can greatly optimize the program performance.

④ Garbage collection: Understanding the garbage collection mechanism helps program optimization, parameter optimization, and write memory security code. This article mainly targets Java (JVM) and C # (CLR) languages with GC (garbage collection mechanism.

IV. storage performance optimization

(1) mechanical or Solid State Disks?

① Mechanical hard drive:Use a motor to drive the head arm to access data at a specified disk location. It can achieveFast sequential read/write and slow random read/write.

② SSD: no mechanical device,Data is stored in a silicon crystal with persistent memory., So it can be the same as the memoryFast random access.

In the current website applications, most of the application access data is random. in this case, SSD has better performance, but the price/performance ratio needs to be improved (pretty expensive, so it's awkward ).

(2) B + tree vs LSM tree

① Traditional relational databases widely use B + trees, and B + trees store data in order to accelerate data retrieval.

PS:Currently, most databases use B + trees with two-level indexes, with a maximum of three layers of trees. Therefore, you may need5 disk accessesOnly one record can be updated (data index and row ID are obtained through three disk accesses, one data file read operation and one data file write operation, and finally the database operation is troublesome and time-consuming)

② NoSQL (such as HBase) products widely adopt the LSM tree:

The specific idea is:The incremental modification of data is kept in the memory. after the specified size limit is reached, these modifications are written to the disk in batches.However, reading is a little troublesome. you need to merge historical data in the disk and the latest modification operations in the memory. Therefore, the write performance is greatly improved. when reading data, you may need to check whether it hits the memory, otherwise, you need to access a large number of disk files.

The principle of the LSM tree is to split a big tree into N small trees, which are first written into the memory. as the small trees grow larger, the small trees in the memory will be cleared and written to the disk, the tree in the disk can be merged regularly to form a big tree to optimize read performance.

The advantage of the LSM tree is that a data update on the LSM tree does not require disk access and can be completed in the memory, which is much faster than the B + tree.

5. Learning Summary

Reading the high-performance architecture chapter of the website, we learned performance optimization strategies from three main aspects through Daniel's books. although they are all theories, they are simple descriptions, however, for our many development Cainiao, it is not a bad thing to expand their knowledge and understand some optimization strategies. we can pay attention to the daily code specifications, writing Efficient code is also worth studying. In the book, I saw the author write this sentence and paste it to share it with Cainiao who are on their way to learning:"In the final analysis, technology serves the business. The Technical Selection and architecture decision-making depend on business planning and even strategic planning of the enterprise. without the support and driving of business development, technology is not far away and may even get lost". After more than a year of internship, I felt a lot about this sentence and suffered a lot of losses. I also had my own insights on communication with customers, so I posted it to my friends. Finally, we hope that, as Cainiao, we can go a long way on the technology Road. it doesn't matter if we get lost. what's important is that we can get lost! In more than a month, I am about to start looking for a job. I hope I can read my plan carefully during this period. come on!

References

(1) Li smart, large-scale website technology architecture-core principle and case analysis, http://item.jd.com/11322972.html

(2) Zhou Yan, "Memcached details", http://blog.csdn.net/zlb824/article/details/7466943

(3) Baidu Encyclopedia, CDN, http://baike.baidu.com/view/8689800.htm

(4) Wang Chenchun, "Web infrastructure: load balancing and LVS", http://www.importnew.com/11229.html

(5) the light, "B tree, B-tree, B + tree", http://www.cnblogs.com/oldhorse/archive/2009/11/16/1604009.html

(6) yanghuahui's blog, "LSM tree origin, design ideas and application to HBase index", http://www.cnblogs.com/yanghuahui/p/3483754.html

Mind map in this Chapter

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Reading Note 4: High-performance architecture of a website with instantaneous response

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support