the process of sending data (from host to line)The application first writes data to the address space of the process. The application makes system calls to the kernel through the System function library interface, which copies the data from the user-state memory into the kernel buffer. (The kernel buffer size is limited and all data is entered here as a queue). After the data is written to the kernel buffer, the kernel notifies the NIC controller to fetch the data and copy the data to the network card's buffer. The NIC buffers the data that will be sent out into a byte-in-place conversion (binary), which is then emitted sequentially.
bottleneck of network data transmission
Generally at the lowest bandwidth exchange point. The formula for the response time and download speed of the data
Response time = Send time (amount of data bits/bandwidth) + propagation time (propagation distance/propagation speed) + processing time
Download speed = amount of data bytes/response time. The concept of throughput rate
The number of requests processed by the server per unit of time, in REQS/S. The test throughput rate is three important elements: number of concurrent users, total request, request resource description. For example, request a 13KB static page, regardless of bandwidth, 100 concurrent, 1000 requests, can reach the 11769reqs/s throughput rate. But if in 100M exclusive broadband, then the throughput rate = ((100MBIT/8) *1000)/13kb=961.53reqs/s. So if you optimize, you can buy higher exclusive bandwidth (such as 1GB bandwidth), and then install a Gigabit network card for the server, and the server network will also use the Gigabit network adapter switch. How to load the server's throughput rate for different request properties
Resource requirements models that are interleaved with requests of different natures, such as acquiring static resources, acquiring dynamic content, are too complex, that is, too many factors for CPU processing and I/O operations. So we generally simplify the model, stress test a representative request of the same particular type, and then, as needed, calculate the weighted average of the throughput rate of multiple requests proportionally.The concept of processes, lightweight processes, threads, and co-routinesProcess: The process is scheduled by the kernel, and from the kernel point of view, the purpose of the process is to assume the entity that allocates the system resources. At the same time, a process can be understood as a set of data that is currently running to a logger instance, and multiple processes are associated with that data through different process descriptors. Each process has its own independent memory address space and life cycle. Process creation uses the fork () system call, which is inexpensive, but is frequently created, is very performance-intensive, and consumes a lot of memory. Each process shares a CPU register, and the essence of a process being suspended is to take its data in the CPU register out of the kernel stack, and the essence of a process recovery work is to reload its data into the CPU register. Lightweight process: LINUX2.0 provides support for lightweight processes, called by using Clone () system, and managed directly by the kernel, as standalone as normal processes. However, these processes allow you to share some resources, such as address space, open files, and so on. It reduces the overhead of memory and provides direct support for data sharing for multi-process applications, but the context overhead is still unavoidable. Threads: Multithreading is generally just a normal process, in the user state through a number of library functions to simulate the implementation of the multi-execution flow, context switching overhead is minimal, but can not take advantage of the advantages of multi-CPU. Another thread implementation is linuxthreads, which can be said to be a kernel-level thread, which is also created through Clone () and managed by the kernel's process scheduler, so that the implementation is based on a one-to-a-kind correlation between threads and lightweight processes. Co-process: a user-defined lightweight thread, in fact, is a single-threaded, specified to execute the entire function into a part and then go out to perform the other, and so on, the next update frame to continue execution. The advantage is that without the overhead of thread context switching, full development of the single CPU capacity, low resource consumption, suitable for high concurrent I/O. The disadvantage is also obvious, is that there is no way to take advantage of multi-CPU.How to view Linux system loadCat/proc/loadavg can view the system load for the current system in the last 1 minutes, 5 minutes, and 15 minutes. Top real-time view of the system's load statistics and the consumption of individual processes.UNIX provides 5 of the I/O modelsBlocking I/O Model: Call recvfrom in process space, its system call is not returned until the packet arrives and is copied into the buffer of the application process or an error occurs, and the wait is blocked. nonblocking I/O model: Recvfrom from the application layer to the kernel, if the buffer has no data, it returns a Ewouldblock error, and then polls the state to see if the kernel has data coming. I/O multiplexing model:
Select/poll: The process passes one or more FD to a Select or poll system call, which sequentially scans for the readiness of these FD, the single-supported FD can only be 1024, and the entire operation is blocked on the select operation. Epoll: Supports both horizontal and edge triggering, with horizontal triggering by default. But Nginx uses the Edge trigger by default. Epoll also only informs those file descriptors that are ready, when calling epoll_wait () to get the ready file descriptor, the return is not the actual descriptor, but rather a value representing the number of ready descriptors, just go to epoll the specified array in order to get the corresponding number of file descriptors, Using memory-mapped (MMAP) technology, this eliminates the overhead of descriptors being duplicated at system calls. Another essential improvement is that Epoll uses event-based readiness notification, which, when a file descriptor is in place, activates the file descriptor with a callback mechanism similar to callback, which is notified when the process calls Epoll_wait (). Epoll_wait () only handles active active connections, and inactive or idle connections are not affected by epoll. Signal-driven I/O Model: first open the socket signal drive IO function, and through the system call sigaction execute a signal processing function (This system call returns immediately, the process continues to work, it is non-blocking). When the data is ready, a sigio signal is generated for the process, and the signal callback notifies the application to call Recvfrom to read the data and notifies the main loop function to process the data. asynchronous I/O model: tells the kernel to initiate an action and let the kernel notify us when the entire operation is complete, including copying data from the kernel to the user's own buffer area. The difference between the asynchronous IO model and the signal-driven IO Model: The signal-driven IO is notified by the kernel when we can start an IO operation, and the asynchronous IO model informs us when the IO operation has been completed.Memory Mapping
Associates an in-memory block of address space with a specified disk file, which translates access to the memory as a access to the disk file. can improve I/O performance. There are two types of memory-mapped types, shared and private. Shared types can be shared by multiple processes, but are less efficient than private because they need to be synchronized. management of static content
Generally use CMS (Content management System) to manage, while CMS can also help us to update the static content when necessary. the update policy for a static page regenerates static content when the data is updated. In general, there is no data update, the redistribution, but there will be a static content buffer, the need to update the content will be added to this buffer, when reached a certain amount of batch processing. According to the real-time requirements of different content, set different time interval to update periodically. In general, both will be used together. Local static
SSI technology enables independent updates of individual pages and avoids the entire static Web page update, which can greatly save the computational overhead and disk I/O overhead when rebuilding an entire Web page, even including network I/O overhead at distribution. how the browser negotiates with the server if the file is to be cached
There are two main types of negotiation: Last Modified time and Etab
1. Last modified: The first request of the response will be added last-modified time, after the request again, will bring if-modified-since time to the server, Determine if the time is inconsistent with the file to be accessed, consistent with the browser cache, and the inconsistency is re-acquired. However, if there is a cluster, it is difficult to ensure that the same file on different servers last modified time consistent.
2. ETag: Different Web servers will have their own algorithm implementation ETag, but the principle is the same, that is, if the etag of a page has not changed, then this page must not be updated. The last modified time of the Nginx etag= file (turned into 16) + the size of the file (turns into 16 binary).
3. The file can be cached using the expire expiration time or the Cache-control:max-age setting is valid. But unlike the above 1, 2 points, this does not go to the request server, the above two also need to send the request to the server to determine whether to obtain the cache. applicable scenarios for browser caching caches large files. Save bandwidth. For small files, and regardless of bandwidth, the throughput rate of the system is not increased and the browser requests a resource's concurrency limit
In different browsers, the number of concurrent requests for resources in the same domain is limited. So to speed up all the resources in the browser concurrent request page, you can assign different domain names to different resources. TCP/IP network layered model The TCP/IP model is divided into 5 tiers: Application layer (Application layer, presentation layer, Session layer) (5th layer, which can be divided into 3 layers), Transport layer (4th layer), Network layer (3rd layer), Data link layer (2nd layer), physical layer (1th layer). Application layer: Mainly user-oriented interaction, there are commonly used HTTP protocol, FTP protocol and so on. Transport Layer: Transports the data of the application layer. Commonly used are TCP (reliable Transmission Control Protocol), UDP (User Datagram Protocol). Network layer: Processes the packets flowing in the network. Commonly used are IP protocol, ICMP protocol, ARP protocol. The physical MAC address is obtained by analyzing the IP address. Data Link layer: The part used to handle the connected hardware, including the control network card, hardware-related device driver, etc. Physical layer: The hardware responsible for data transmission, commonly used twisted-pair cable, wireless, optical fiber and so on. Data transfer process: First, the application layer compresses the data message in accordance with the protocol encapsulation format and then passes it to the transport layer, and the transport layer wraps the datagram into a datagram segment and passes it to the network layer through the Protocol, the network layer encapsulates the datagram segment into a packet and passes it to the data link layer, which is encapsulated as a data frame The data frames are then transferred to bits to the physical layer, and the physical layer sends the bits through the light or electrical signals to the target. seven-layer load balancing
Seven-tier load balancing mainly refers to the current mainstream reverse proxy server. The core work of the reverse proxy server is to forward the HTTP request, so it works on the HTTP level, which is the application layer (layer seventh) in the TCP seven layer structure, so load balancing based on reverse proxy is also called seven-tier load balancing. four-layer load balancing
Load balancing is based on IP. NAT-based load balancing, which can work on the Transport Layer (layer fourth), modifies the IP address and port information in the packet, so it is also called a four-tier load balancer.
1. Lvs-nat: Use a server as the NAT Scheduler (equivalent to the gateway), with LVS to load balance the backend server. Both the request and the returned response must go through the NAT scheduler, so the bandwidth of the NAT scheduler can become a bottleneck. However, the performance is much better than the reverse proxy server because it is a forwarding schedule within the kernel.
2. LVS-DR (Direct routing): The back-end server is connected to the external network, set the same IP alias, with the LVS load balancing, the request response is not directly returned to the user by the scheduler, so the advantage of the Lvs-nat is no gateway bottleneck. One prerequisite is that the scheduler and the back-end server must be in the same WAN segment.
3. Lvs-tun (IP Tunnel): The principle of lvs-dr is basically the same. The difference is that the actual server can drink the scheduler is not in the same WAN segment, the scheduler through the IP tunneling technology to forward the request to the actual server. CDN Service is based on IP tunneling technology. Distributed Computing Framework
The Map/reduce framework, which breaks down a big man into several small people, is calculated separately in the map and then aggregated to reduce.