"reading notes" 2016.12.10 "building a high-performance web site"

Source: Internet
Author: User
Tags sessions browser cache

Address of this article

Share an Outline:

1. overview

2. Knowledge points

3. To be organized point

4. Reference Documentation

1. overview

1.1) "the Book information"

Building a high-performance Web site:


--baidu Encyclopedia

--directory of the Book:

1th Chapter Introduction 1.1 Waiting for the truth 1.2 bottlenecks where 1.3 increase bandwidth 1.4 reduce HTTP requests in Web pages 1.5 speed Up server script calculation 1.6 Use the dynamic content cache 1.7 Use the data cache 1.8 to static dynamic content 1.9 Replace the Web Server Software 1.10 page component  1.11 Reasonable Deployment Server 1.12 Use load Balancer 1.13 Optimize database 1.14 Consider scalability 1.15 reduce visual wait 2nd Chapter data Network Transmission 2.1 Tiered network model 2.2 Bandwidth 2.3 Response Time 2.4 Interconnection 3rd Chapter Server concurrency processing capacity 3.1 throughput rate 3.2 CPU Concurrency Calculation 3.3 system call 3.4 memory allocation 3.5 Persistent connection 3.6 I/O Model 3.7 Server concurrency policy 4th dynamic Content Cache 4.1 Duplicate cost 4.2 cache with Speed 4.3 page cache 4.4 Local No cache 4.5 static Content Chapter 5th Dynamic Script Acceleration 5.1 op Code Cache 5.2 Interpreter Extension Module 5.3 script trace and analysis 6th browser cache 6.1 Do not forget browser 6.2 cache negotiation 6.3 Total Wipe request 7th Web server Cache 7.1 URL Map 7.2 Cache response Content 7.3 Cache file Descriptor Chapter 8th reverse Proxy Cache 8.1 Pass System Agent 8.2 What is reverse 8.3 create cache on reverse proxy 8.4 Careful through proxy 8.5 traffic allocation 9th Chapter Web Component Separation 9.1 Controversial separation 9.2 individualized 9.3 have different domain name 9.4 browser concurrency 9.5 play their potential 10th chapter distributed cache 10.1 data Library front-end Buffers 10.2 using memcached10.3 read operation Cache 10.4 Write cache 10.5 Monitoring status 10.6 Cache extension 11th Database performance Optimization 11.1 Friendly status report 11.2 Correct use Index 11.3 lock and wait 11.4 transactional table performance 11 .5 using Query Cache 11.6 Temp Table 11.7 thread pool 11.8 inverse normalization design 11.9 discard relational database 12th Web load Balancing 12.1 Some Thoughts 12.2 http redirection 12.3 DNS Load Balancer 12.4 Reverse proxy Load Balancer 12.5 IP Load Balancer 12. 6 Direct route 12.7 IP tunnel 12.8 Consider availability 13th share file System 13.1 network share 13.2 NFS13.3 Limitations 14th Chapter Content Distribution and synchronization 14.1 Replication 14.2 SSH14.3 WebDAV14.4 rsync14.5 hashtree 14.6 distribution or synchronization 14.7 reverse proxy 15th Chapter distributedFile system 15.1 File system 15.2 storage node and tracker 15.3 mogilefs 16th database expansion 16.1 copy and detach 16.2 vertical partition 16.3 Horizontal partition 17th Chapter distributed Computing 17.1 asynchronous computation 17.2 Parallel Computing 18th Chapter performance monitoring 18.1 Real-time supervisor Control 18.2 Monitoring Agent 18.3 system Monitoring 18.4 Service monitoring 18.5 response Time Monitoring Reference index
Directory of the book


--"content profile":

Build a high-performance Web site (revision) is a best-selling revision, around how to build a High-performance web site, from a variety of aspects, multiple perspectives of a comprehensive exposition, almost all of the Web site performance optimization of all content, including data transmission network, server concurrency processing power, Dynamic web cache, Dynamic Web page static, application layer data caching, Distributed cache, Web server cache, Reverse proxy cache, Script Interpretation speed, page component separation, Browser local cache, browser concurrent request, file distribution, database I/O optimization, database access, database distributed design, load balancing, Distributed File system , performance monitoring and so On. Fully grasp the essence of these elements and combine practice, through Easy-to-understand text and lively and interesting mapping, so that readers full and deep understanding of the truth of the high-performance Architecture.


This book, I read in the reading app, feel very good. You can share ideas, plan and buy paper books, and read them carefully.

2. Knowledge points


1) "function tracking"
Another important tracking feature of xdebug is function tracking, which keeps track of the execution time of all functions, including the actual parameters and return values, based on the order in which the program executes at the actual Runtime. yes, That sounds exactly what we desperately need.

2) "federated index"
Because a single query can use only one index for a data table

3) "leftmost prefix"
You must have heard of the basic principle of the "leftmost prefix" combination Index.

4) "database Lock mechanism"
Mechanism is another important factor that affects query Performance. When more than one user accesses a resource in the database concurrently, in order to ensure the consistency of concurrent access, the database must coordinate these accesses through a lock mechanism.

5) "third paradigm"
The third paradigm requires that there cannot be a dependency between Non-primary key fields in a single data table

6) "load balancing at different levels"
In fact, in the Data link Layer (the second layer), the network layer (the third layer) and the transport layer (four Layers) can achieve different mechanisms of load balancing, but the difference is that these load Balancer Scheduler work must be done by the Linux kernel

7) "exclusive bandwidth"

This is called exclusive bandwidth, which is exclusive to a portion of the Router's egress bandwidth, not the bandwidth of the switch, because the switch is inherently bandwidth independent of each port.

8) "china internet"
china, the internet operated by China telecom, which is what we often call "china broadband Internet (chinanet)", its backbone network core node located in Beijing Telecom data center, it through direct access to the 8 major cities, including Beijing nodes, and then connected to the Two-tier network, and then extend the layer of expansion, To the surrounding cities, IDC, home broadband access and so On.

9) "Nmon"

We use the Nmon tool to monitor the number of context switches per second for the Server. Nmon is a very good Linux performance monitoring tool

10) "apache multi-process model"
We know that the cost of this multi-process model of Apache limits its number of concurrent connections, but Apache has its own advantages, such as stability and compatibility, the advantage of a multi-process model is reflected in its relatively secure independent process, the collapse of any one child process will not affect the Aapche itself, Apache Parent process can create a new child process

11) "script interpreter"
Script interpreters typically run in the process of a web server, such as a apache-prefork Model's child processes, or run independently as a fastcgi process.

12) "java introduced in php"
For example, for some PHP developers, to refer to a Java class library directly in a Web application, you have to load the Java extension module in PHP.

13) "ESI"

ESI is a standard developed by the World wide web, its syntax is very similar to SSI (Server Side includes), can be like SSI in the Web page to embed sub-pages, but the difference is that SSI is the Server-side assembly of content, and ESI is on the HTTP proxy server to assemble content, Includes the reverse Proxy.

14) "wordpress"
For example, using WordPress to build a blog, dynamic content and database can be completely through the UNIX socket to establish a faster exchange of Data.

15) "database status"
Mysql> Show status;??
Mysql> Show InnoDB status;
Show Processlist Command
For example, we see www through the dig command. Sina com CN points to 16 servers

16) "data structure of the index"
The data structure of the index itself (MYSQL uses btree, hash, and Rtree) determines that they have a very efficient search algorithm, and we basically don't have to worry about this part of the overhead

17) "how to create an index"
In general, If a field appears in a row-based selection, filter, or sort condition in a query statement, indexing that field is valuable

18) "left-most principle"
You must have heard of the basic principle of the "leftmost prefix" combination Index.

19) "lock and wait"
Mechanism is another important factor that affects query Performance. When more than one user accesses a resource in the database concurrently, in order to ensure the consistency of concurrent access, the database must coordinate these accesses through a lock mechanism.

20) "reverse Proxy nginx"

We know that the reverse proxy Server works at the HTTP level, and for all HTTP requests, go to

21) "sticky reply"

What we need to do is to adjust the scheduling policy so that all requests made by the user during a session cycle are always forwarded to a specific back-end server, which is also known as sticky session (Sticky Sessions), and the key to implementing it is how to design a continuous scheduling algorithm.

22) "persistence algorithm"
A cookie mechanism can also be used to design a persistence algorithm, such as a scheduler that appends the number of a backend server to the cookie written to the user, so that the scheduler can know which backend server to forward to in the User's subsequent Request. This can be traced more finely to every user, and imagine that a persistent algorithm using cookies will be more effective when many users are hiding behind a public IP address.

23) "it's best to be locally independent"
Saving session data and localizing the cache on a back-end server is an unwise thing to do, making the backend servers too personal to be out of tune with the entire system, and if allowed, we should try to avoid such designs, such as distributed sessions or distributed caches, Make Back-end server applications as local-agnostic as possible, and better adapted to the environment

24) "ip Load balancing"
Recall the network layering model, in fact, in the Data link Layer (the second layer), the network layer (the third layer) and the transport layer (four Layers) can achieve different mechanisms of load balancing, but the difference is that these load Balancer Scheduler work must be done by the Linux kernel

25) "iptables for Load balancing"
Speaking of iptables, The most application scenario is the firewall, I almost every Linux server do not hesitate to do iptables firewall configuration

26) "one Nic Multiple ip"
A network interface naturally has an IP address, but beyond that, we can also configure it with more than one IP address, which is called an IP alias. The network interface can be either a physical network card (such as eth0, eth1) or a virtual interface (such as a loopback network interface lo). According to the rules, a network interface can be set up to 256 IP aliases, yes, You can put a Class C network segment all the IP address on a network card, theoretically there is no problem.
You may have a big mouth, a network card can set up multiple IP addresses, and have the same Mac address, yes, they can work Well.

27) "lvs-dr. DNS-RR "
fortunately, for lvs-dr, Once the scheduler fails, you can immediately switch LVS-DR to DNS-RR mode, which requires just a few additional DNS records to resolve the domain name to the real IP address of multiple actual servers. Once the scheduler resumes, you can modify the DNS records again, point the domain name only to the scheduler, and switch back to Lvs-dr.

28) "request forwarding based on IP tunneling"
Similar to the LVS-DR principle, a load balancing system based on IP tunneling (ip Tunneling) can also be implemented with lvs, also known as Lvs-tun.

29) "shared File system"
For implementations of shared file systems, there is a common NFS (Network file System) and Samba

30) "RPC"
For the transport layer, the RPC service uses UDP by Default.

31) "SSH"
Referring to SSH (secure Shell), Everyone is not unfamiliar, it is built on the application layer and the transport layer based on the security protocol, can be used to transfer any data, we want to use it to achieve file replication, of course, This is the way of active Distribution.

32) "http Extension protocol for file distribution WebDAV"
WebDAV is designed to include support for versioning, remember how the Subversion http works? It is implemented using WEBDAV.

33) "update Parent Directory time"
Modifications to the file by the operating system itself do not automatically update the modification time of the parent directory, and some specific applications will do so, such as editing a file by vi and saving it, and you will find that all of its parent directories will automatically update the modification Time. so, for file synchronization, we have to find a way to do it ourselves.

34) "reduce overhead, Increase expansion"
The more you respond to a service (such as video) where the packet is far beyond the request packet, the more you should reduce the overhead of the scheduler transfer request, the more you can increase the overall scalability, and ultimately the more you will rely on the WAN egress bandwidth

35) "mogilefs"

MogileFS is an open source Distributed file system that is written in perl, including trackers, storage nodes, and some management tools, and the tracker uses MySQL to store all the information in the Distributed File system Operation.

36) "mysql Master-slave copy"

We take MySQL for example, it supports Master-slave replication, configuration is not complex, simply put, you only need to do the following two points:
Turn on the binary log (log-bin) on the primary server.
Simple configuration and authorization on the primary server and the slave server respectively.
We know that the master-slave copy of MySQL is based on the primary Server's binary log, that is, the operations recorded in the primary server log are replayed from the server, thus replicating, so the primary server must turn on the binary log, which automatically records all the operations that generate updates to the Database. Also includes potential update operations, such as delete operations that do not have any actual records Deleted.
obviously, This replication is done asynchronously

37) "database Reverse proxy"

Using a database reverse proxy
If you are using mysql, then you can try MySQL proxy, which works between the application and the MySQL server, responsible for all requests and response data forwarding

38) "site growth"

In fact, many large-scale sites have basically gone through the steps from simple master-slave copying to vertical partitioning to horizontal partitioning, which is an inevitable growth process

39) "partition Reverse proxy"
Partition Reverse Proxy
Do you remember the MySQL proxy mentioned earlier? It helps the application implement Read-write separation, where another open source product, Spock proxy, plays a similar role, which helps the application achieve horizontal partitioning of access scheduling, which means that we do not need to maintain those partition correspondence in the Application.

40) "Gearman"
Gearman is an open source product that is designed to implement remote function calls, so that it can transfer computations to other servers, all of which are subtly hidden in the APIs it provides

41) "map/reduce"
But there is a certain parallel computing framework, and we look at the Map/reduce introduced Later.

42) "Nmon"
Nmon is a real-time monitoring software that works locally on the server, providing system monitoring at intervals of seconds

43) "monitoring center"
Of course, We also need to establish a monitoring center for these status data to be counted and presented. fortunately, There are a lot of open source products to help us, here we mainly take cacti as an example, it can fully support the just mentioned system monitoring, and draw the corresponding chart, easy for us to Browse.
Cacti uses RRDtool as the storage engine for monitoring data, which is a storage format designed specifically for plotting coordinate graphs, saving a lot of storage space relative to other storage structures, which provides us with long-term monitoring of a large number of servers

3. To be organized point

4. Reference Documentation

"reading notes" 2016.12.10 "building a high-performance web site"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.