[Reading Notes] 2016.12.10 "building high-performance Web sites" to build high-performance web Sites

Source: Internet
Author: User
Tags website performance

[Reading Notes] 2016.12.10 "building high-performance Web sites" to build high-performance web Sites

Address of this Article

 

Sharing outline:

1. Overview

2. knowledge points

3. Waiting for sorting

4. References

 

1. Overview

1.1) [Book Information]

Building a high-performance Web site:

      

-- Baidu encyclopedia

-- Book directory:

Chapter 2 Introduction 1st waiting truth 1.1 bottleneck where 1.2 increase bandwidth 1.3 decrease HTTP requests in webpages 1.4 speed up server script calculation 1.5 Use Dynamic Content cache 1.6 use data cache 1.7 static dynamic content 1.9 changing Web server software 1.10 page component separation 1.11 reasonably deploying servers 1.12 using Server Load balancer 1.13 optimizing database 1.14 considering scalability 1.15 reducing visual waiting Chapter 1 Data Network Transmission 2nd layered network model 2.1 bandwidth 2.3 response time 2.4 interconnection Chapter 1 server concurrent processing capability 3rd throughput 3.1 CPU concurrent computing 3.2 system calls 3.3 memory allocation 3.4 persistent connections 3.5 I/O model 3.6 server concurrency policy chapter 2 Dynamic Content cache 4.1 duplicate overhead 4.2 cache and speed 4.3 page cache 4.4 Local no cache 4.5 static content 5th chapter dynamic script acceleration 5.1 opcode cache 5.2 interpreter extension module 5.3 script tracking and analysis chapter 6th browser cache 6.1 don't forget browser 6.2 cache negotiation 6.3 completely eliminate request chapter 7th Web Server cache 7.1 URL ing 7.2 cache response content 7.3 cache file descriptor chapter 8th reverse proxy cache 8.1 traditional proxy 8.2 what is reverse 8.3 create cache on reverse proxy 8.4 careful pass through proxy 8.5 Traffic Distribution chapter 9th Web component separation 9.1 controversial separation 9.2 teaching in accordance with your aptitude 9.3 having different domain names 9.4 browser concurrency 9.5 play their respective potential 10th chapter distributed cache 10.1 Database Front-end cache zone 10.2 use memcached10.3 read operation cache 10.4 write operation cache 10.5 monitoring status 10.6 cache expansion chapter 11th database performance optimization 11.1 friendly status report 11.2 correct use of index 11.3 locking and waiting 11.4 performance of transactional tables 11.5 use query cache 11.6 temporary tables 11.7 thread pool 11.8 reverse paradigm design 11.9 discard relational database chapter 12th Web Server Load balancer 12.1 some thoughts 12.2 HTTP redirection 12.3 DNS server Load balancer 12.4 Reverse Proxy Server Load balancer 12.5 IP Server Load balancer 12.6 direct routing 12.7 IP Tunnel 12.8 consideration availability Chapter 1 shared file system 13th network sharing 13.1 NFS13.3 limitations Chapter 2 Content Distribution and synchronization 13.2 copy 14th SSH14.3 WebDAV14.4 rsync14.5 Hashtree14.6 distribution or synchronization 14.7 reverse proxy chapter 15th Distributed File System 15.1 File System 15.2 storage node and tracker 15.3 MogileFS chapter 16th database extension 16.1 replication and separation 16.2 vertical partition 16.3 horizontal partition 17th chapter distributed computing 17.1 asynchronous computing 17.2 parallel computing chapter 18th performance monitoring 18.1 Real-time Monitoring 18.2 Monitoring Agent 18.3 System Monitoring 18.4 service monitoring 18.5 Response Time Monitoring referencesBook catalog

--

-- [Content Overview ]:

Building a high-performance Web site (revision) is the best-selling version. It comprehensively describes how to build a high-performance Web site from multiple aspects and perspectives, covers almost all the content of website performance optimization, including network transmission of data, concurrent server processing capability, dynamic Web page cache, dynamic Web page static, application layer data cache, distributed cache, Web server cache, reverse proxy cache, script interpretation speed, page component separation, local browser cache, concurrent browser requests, file distribution, database I/O optimization, database access, distributed database design, Server Load balancer, distributed file system, and performance monitoring. These contents fully grasp the essence and combine them with practice, and allow readers to fully and deeply understand the truth of the high-performance architecture through easy-to-understand texts and vivid and interesting matching diagrams.

-- Digress:

I read this book on the app and it feels good. You can share your ideas, purchase paper books, and study them carefully.

 

 

2. knowledge points

  

1) [function tracing]
Another important tracking function of Xdebug is function tracing, which records the execution time of all functions and the context of function calling Based on the execution sequence of the program during actual running, including actual parameters and returned values. That sounds exactly what we need.


2) [joint index]
Because one query can only use one index for one data table


3) [leftmost prefix]
You must have heard of the basic principle of the "leftmost prefix" composite index.

 

4) [database lock mechanism]
Mechanism is another important factor affecting query performance. When multiple users concurrently access a resource in the database, the database must use a lock mechanism to coordinate the access to ensure consistency of concurrent access.


5) [third paradigm]
The third paradigm requires that no dependency exists between non-primary key fields in a data table.


6) [Load Balancing at different layers]
In fact, load balancing of different mechanisms can be achieved at the data link layer (Layer 2), network layer (Layer 3), and transmission layer (Layer 4). However, the operations of these Server Load balancer schedulers must be completed by the Linux kernel.


7) [dedicated bandwidth]

This is called exclusive bandwidth. It excludes part of the egress bandwidth of the vro, rather than the bandwidth of the vswitch, because the vswitch is the dedicated bandwidth of each port and does not affect each other.


8) [China Internet]
China, the Internet operated by China Telecom, also known as "China broadband Internet (CHINANET)", its backbone network core node is located in Beijing Shangdi telecom data center, through direct access to eight major domestic city nodes, including Beijing, it connects to the second-level network and expands layer by layer until the surrounding cities, IDCs, and home broadband access.

9) [Nmon]

We use the Nmon tool to monitor the number of context switches per second on the server. Nmon is a very good Linux performance monitoring tool.

10) [Apache Multi-Process Model]
We know that the overhead of Apache Multi-Process Model limits the number of concurrent connections, but Apache also has its own advantages, such as stability and compatibility, the advantage of the multi-process model is that it is a relatively secure independent process. The crash of any sub-process will not affect Aapche itself. The Apache parent process can create a new sub-process.

11) [script interpreter]
The script interpreter usually runs in a Web server process (such as a sub-process of the Apache-prefork model) or in the form of a fastcgi process.


12) [introduce Java in PHP]
For example, for some PHP developers, to directly reference Java class libraries in Web applications, they must load the Java extension module in PHP,

13) [ESI]

ESI is a W3C standard. Its syntax is very similar to SSI. It can embed subpages in a webpage like SSI, but what's different is that, SSI assembles content on the Web server, while ESI assembles content on the HTTP proxy server, including reverse proxy.

14) [wordpress]
For example, for a blog built with Wordpress, dynamic content and databases can use UNIX Socket to establish faster data exchange.

15) [database status]
Mysql> show status; Publish success
Mysql> show innodb status;
Show processlist command
For example, we can see through the dig command that www.sina.com.cn points to 16 servers

16) [index data structure]
The data structure of indexes (MySQL uses BTree, Hash, and RTree) determines that they have very efficient search algorithms. We basically don't have to worry about this part of the overhead.

17) [index creation]
Generally, if a field appears in the row-based selection, filtering, or sorting condition in the query statement, it is valuable to create an index for this field.

18) [leftmost principle]
You must have heard of the basic principle of the "leftmost prefix" composite index.


19) [lock and wait]
Mechanism is another important factor affecting query performance. When multiple users concurrently access a resource in the database, the database must be locked to ensure consistency of concurrent access.

20) [reverse proxy nginx]

We know that the reverse proxy server works at the HTTP layer, and all HTTP requests must be forwarded in person.

21) [stuck session]

All we need to do is adjust the scheduling policy so that all the requests in a session cycle are always forwarded to a specific backend server. This mechanism is also called Sticky Sessions ), the key to implementing it is how to design a continuous scheduling algorithm.


22) [persistence algorithm]
You can also use Cookies to design persistence algorithms. For example, the scheduler can append the number of a backend server to the Cookies written to users, in this way, the scheduler can know which backend server should be forwarded to the user's subsequent requests. In this way, every user can be tracked in a more fine-grained manner. Imagine that when many users are hiding behind a public IP address, the persistence algorithm using Cookies will become more effective.

23) [it is best to have nothing to do with the local device]
It is indeed unwise to store Session data and local caches on the backend server. It makes the backend server too personalized and so incompatible with the entire system. If allowed, we should try our best to avoid such design, such as using distributed sessions or distributed caches, so that the applications on the backend server are not related to the local server as much as possible and can better adapt to the environment.

24) [IP Server Load balancer]
Recall the layered network model. In fact, load balancing with different mechanisms can be achieved at the data link layer (Layer 2), network layer (Layer 3), and transmission layer (Layer 4, however, the difference is that the work of these Server Load balancer schedulers must be completed by the Linux kernel.

25) [Iptables implements Load Balancing]
Speaking of iptables, the most common application scenario is firewall. I do not hesitate to configure iptables firewall for almost every Linux server.

26) [multiple IP addresses of one Nic]
A network interface naturally has an IP address, but in addition, we can configure more IP addresses for it, which are called IP aliases. The network interface can be a physical NIC (such as eth0 and eth1) or a Virtual Interface (such as loopback network interface lo ). According to regulations, a network interface can set up to 256 IP aliases. That's right. You can set all IP addresses of a Class C network segment to one Nic. Theoretically, there is no problem.
You may have opened your mouth. A network card can have multiple IP addresses and have the same MAC address. Yes, they can work well.

27) [LVS-DR.DNS-RR]
Fortunately, for the LVS-DR, once the scheduler fails, you can immediately switch the LVS-DR to the DNS-RR mode, which almost only requires a few additional DNS records, resolve the domain name to the real IP addresses of multiple servers. Once the scheduler recovers, you can modify the DNS record again, direct the domain name only to the scheduler, and switch back to LVS-DR.

28) [IP tunneling-based request forwarding]
Similar to LVS-DR, an IP Tunneling-based load balancing system can also be implemented using LVS, also known as LVS-TUN.

29) [Shared File System]
For the implementation of shared File systems, NFS (Network File System) and Samba are commonly used.

30) [RPC]
For the transport layer, the RPC service uses UDP by default,

31) [SSH]
We are familiar with SSH (Secure Shell). It is a security protocol built on the application layer and transport layer. It can be used to transmit any data. We hope to use it for file replication, of course, this is an active distribution method.

32) [http extension protocol WebDAV for file distribution]
WebDAV is designed to support version control. Do you still remember the HTTP working method of Subversion? It is implemented using WebDAV.

33) [update the upper-level directory time]
The operating system does not automatically update the modification time of the parent directory for file modification. Some specific applications do this. For example, after editing a file and saving it through VI, you will find that all its parent directories will automatically update the modification time. We must also find a way to implement file synchronization by ourselves.

34) [reduce overhead and increase scalability]
The more data packets are returned, the more services (such as videos) request data packets, the more overhead of the scheduler transfer request should be reduced, the more the overall scalability can be improved, eventually, the more you rely on the WAN egress bandwidth

35) [MogileFS]

MogileFS is an open-source distributed file system written in Perl, including trackers, storage nodes, and some management tools. In addition, the tracker uses MySQL to store all information that is running in the distributed file system.


36) [MySQL master-slave replication]

Take MySQL as an example. It supports master-slave replication and the configuration is not complicated. Simply put, you only need to do the following:
● Enable the binary log (log-bin) on the master server ).
● Perform simple configuration and authorization on the master server and slave server respectively.
We know that the master-slave replication of MySQL is based on the binary log of the master server. That is to say, the operations recorded in the log of the master server will be replayed on the slave server for replication, therefore, the master server must enable binary logs, which automatically records all database update operations, including potential update operations, such as DELETE operations without deleting any actual records.
Apparently, this replication is asynchronous.

37) [reverse database proxy]

Use Database reverse proxy
If you are using MySQL, you can try MySQL Proxy, which works between the application and MySQL server and forwards all requests and response data.


38) [site growth]

In fact, many large-scale sites have basically gone through steps from simple master-slave replication to vertical partitioning, and then to horizontal partitioning. This is an inevitable growth process.


39) [reverse partitioning proxy]
Reverse partitioning proxy
Do you still remember the MySQL Proxy mentioned above? It helps applications implement read/write splitting. Here, Spock Proxy, another open-source product, also plays a similar role. It can help applications implement horizontal partition access scheduling, this means that we do not need to maintain the corresponding partitions in the application.

40) [Gearman]
Gearman is an open-source product. It was originally used to call remote functions, so that it can transfer computing to other servers, all of these are cleverly hidden in the APIS it provides.

41) [Map/Reduce]
However, some parallel computing frameworks exist. Let's look at the Map/Reduce introduced later.

42) [Nmon]
Nmon is a real-time monitoring software that works locally on the server. It provides system monitoring at intervals of seconds.

43) [monitoring center]
Of course, we also need to establish a monitoring center to collect statistics and present the status data. Fortunately, there are many open-source products that can help us. Here we take Cacti as an example. It can fully support the system monitoring just mentioned and draw corresponding charts, it is easy for us to browse.
Cacti uses RRDtool as the storage engine for monitoring data. It is a storage format designed specifically for drawing coordinate charts, which saves a lot of storage space compared with other storage structures, this provides us with long-term monitoring of a large number of servers

 

 

3.Waiting for sorting

 

 

 

4. References

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.