The following is a draft of "the performance of Open source Applications" (POSA) and the successor to the Architecture of open source applications
.Posa includes a number of papers on performance optimization and design, as well as performance management in the development process, and is expected to be available in the spring of 2013.
by Ilya Grigorik on January 31, 2013 (translated: Horky [Http://blog.csdn.net/horkychen])
Google Chrome's history and guiding principles
This part is no longer translated in detail, only the core meaning.
The core principles that drive chrome to move forward include:
- Speed : do the fastest (fastest) browser.
- Security : to provide users with the most secureInternet environment.
- Stability: provides a robust and stable Web application platform (resilient and stable).
- Simplicity: The Technology of Excellence (sophisticated technology) packaged with a concise user experience (simple usersexperience).
This article closes the note on the 1th, speed. About all aspects of performance
A modern browser is a platform that is the same as the operating system. Browsers before chrome are single-process apps, and all pages share the same address space and resources. Introducing a multi-process architecture This is Chrome's most famous improvement, "omit some of the details that you've talked about repeatedly."
Within a process, the web app primarily needs to perform three tasks: Get resources, page layout and rendering, and run JavaScript. Both rendering and scripting run alternately as a single-threaded operation in order to maintain the consistency of the DOM, and JavaScript itself is a single-threaded language. So optimizing rendering and scripting is extremely important both for the page developer and for the browser developer.
Chrome's rendering engine is WebKit, and JavaScript engine uses the V8 ("V8" JavaScript runtime) for deep optimization. However, if the network is not smooth, regardless of optimizing V8 JavaScript execution, or optimization webkit parsing and rendering, the role is actually very limited. Paddle, the data will have to wait !
In relation to the user experience, the most obvious effect is how to optimize the load order of network resources, the priority and the delay time of each resource (latency). You may not realize that the Chrome Web module is progressing every day, gradually reducing the cost of loading each resource: Learn from DNS lookups, remember the page topology (topology of the web), pre-connect to possible destination URLs, and much more. From the outside is a simple mechanism of resource loading, but inside it is a wonderful world.
about Web Apps
Before you get started, get to know the needs of Web pages or Web applications today.
The HTTP Archive project has been tracking Web page builds. In addition to the page content, it analyzes the number of resources, types, headers, and meta-data (metadata) of different destination addresses used by popular pages. The following is the statistics for January 2013, the average data from the 300,000 target pages:
- 1280x720 KB
- Consists of 88 resources (Images,javascript,css ...)
- Connect more than 15 different hosts (distinct hosts).
These numbers have continued to grow over the last few years (steadily increasing), with no sign of stopping. This suggests that we are constantly building a larger, more ambitious network application. Also note that, on average, each resource is only 12KB, indicating that the vast majority of network transmissions are ephemeral (short and bursty). This is inconsistent with the direction of TCP for big data, streaming (streaming) downloads, and because of this, some complications have been introduced. Here is an example to cobwebs, a glimpse of ...
A lifetime of resource request Navigation Timing specification defines a set of APIs that allow you to observe timing and performance data for each request of the browser. Here are some details: given a Web resource address, the browser checks the local cache and the app cache. If you have previously obtained and have the appropriate cache information (appropriate cache headers) (such as
Expires,
Cache-control, etc
.), the request is populated with cached data, after all, the fastest request is no request (
The fastest request is a request not made)。 Otherwise, we re-verify the resource, if it has failed (expired), or have not seen at all, a consuming network of requests can not be avoided to send.
Given a hostname and resource path, chrome first checks whether an existing established connection (existing open connections) can be reused, meaning that sockets specifies the connection pool (scheme, host, and port) defined. However, if a proxy is configured or a proxy auto-config (PAC) script is specified, chrome checks the connection to the proxy. The PAC script provides different proxies based on the URL, or specific rules are specified for this purpose. With each agent can have their own socket pool. Finally, none of the above is present, and the request begins with a DNS lookup, which obtains its IP address.
Fortunately, the host name has been cached. Otherwise, you must first initiate a DNS Query. The time required for this process is related to the likelihood of the ISP, the visibility of the page, the host name in the intermediate cache (intermediate caches), and the response time of authoritative servers. This means that there are a lot of variables here, but it is generally not as dramatic as hundreds of milliseconds. With the parsed IP, Chrome opens a new TCP connection between the target addresses, and we're going to perform a 3-degree handshake ("Three-way handshake"): SYN > Syn-ack > ACK. This operation must be completed for each new TCP connection and there is no shortcut. Depending on the distance, routing path selection, this process can take up to hundreds of milliseconds, or even a few seconds. By now, we have not even received a valid byte.
When the TCP handshake is complete, if we are connecting to an HTTPS address, there is also an SSL handshake process, while at the same time adding up to two rounds of delay waiting. If the SSL session is cached, it only takes one time.
Finally, Chrome is finally sending an HTTP request (as shown in the
requestStart
)。 Once the server receives the request, it transmits the response data (response) back to the client. This includes the minimum round-trip delay and service processing time. Then a request is completed. But what if it's an HTTP redirect (redirect)? We're going to start this process again. If you have some redundant redirects on your page, it's best to think twice!
Have you got all the delay time? We assume a typical broadband environment: there is no local cache, relatively fast DNS lookup (50MS), TCP handshake, SSL negotiation, and a faster server response time (100MS) and one delay (80ms, average within the United States):
- 50ms for DNS
- 80ms for TCP handshake (one RTT)
- 160ms for SSL handshake (both RTT ' s)
- 40MS (send request to server)
- 100MS (server processing)
- 40MS (server backhaul response data)
A request took 470 milliseconds, of which 80% of the time was taken up by network latency. See, we really have a lot of things to do! In fact, 470 milliseconds has been very optimistic:
- If the server does not reach the congestion window of the initial TCP (congestion window), which is 4-15kb, more round-tripping delays are introduced.
- SSL latencies can also get worse. If you need to get a non-certified (certificate) or perform online certificate status check (OCSP), it will require a new TCP connection and a delay of hundreds of to thousands of milliseconds.
How do you calculate "fast enough"?
Before you can see the server response time is only 20% of the total delay time, the other is the DNS, handshake and other operations occupied. In the past, user experience research showed users different responses to delay times:
Delay |
User response |
0-100ms |
Quickly |
100-300ms |
A little slow. |
300-1000ms |
The machine is still running. |
1s+ |
Think about something else ... |
10s+ |
I'll take a look at it later. . |
The above table also applies to the performance of the page: render the page, at least within 250ms to give a response to attract users. This is simply for speed. From Google, Amazon, Microsoft, and thousands of other sites, additional delays directly affect page performance: Smooth pages attract more browsing, more user attractiveness (engagement), and page conversion rates (conversion rates) .
Now we know that the ideal delay time is 250ms, and the previous example tells us that DNS Lookup, TCP and SSL handshake, and request preparation time took 370ms, even if the server processing time is not considered, we are more than 50%.
For the vast majority of users and web developers, DNS, TCP, and SSL latency are transparent and few people think of it. That's why Chrome's network modules are so complex.
We've identified the problem, let's dive into the details of the implementation: .
Deep Web module multi-process architecture in chrome
Chrome's multi-process architecture is important for Web request processing in the browser, which currently supports four different execution modes (four different execution models).
By default, the desktop Chrome browser uses the Process-per-site mode to isolate different site pages and organize the pages of the same site. A simple example: Each tab is independent of a process. From the point of view of network performance, there is nothing inherently different, just process-per-tabl mode is easier to understand.
Each tab has a render process, which includes the layout engine for parsing the page (interpreting) and layout (layout out) of the WebKit, which is the HTML render in. There is also the V8 engine and the DOM Bindings between the two, if you are curious about this part, you can see here (great introduction to the plumbing)
.
Each of these rendering processes is run in a sandbox environment and only has very limited access to the user's computer environment-including the network. With these resources, each rendering process must communicate with the Browse kernel processes (Browser[kernel] process) to manage security and access policies for each rendering process (access policies). interprocess communication (IPC) and multi-process resource loading
Communication between the rendering process and the kernel process is done through the IPC. On Linux and Mac OS, a Socketpair () is used to provide an asynchronous named pipe communication method. The messages for each rendering process are serialized into a dedicated I/O thread and then sent to the kernel process. At the receiving end, the kernel process provides a filtering interface (filter interface) for resolving resource-related IPC requests (Resourcemessagefilter), which is part of the network module's responsibility.
One of the benefits of this is that all resource requests are handled by the I/O process, whether it's UI-generated activity or network event-triggered interactions. The I/O thread in the kernel process (browser/kernel process) resolves a resource request message and is forwarded to a Resourcedispatcherhost singleton (singleton) object for processing.
This single interface allows the browser to control access to the network for each rendering process, as well as to achieve effective and consistent resource sharing:
- Socket Pool and connection limits: Browser can limit each profile to open 256 sockets, each proxy opens 32 sockets, Each group {scheme, host, port} can open 6. Note Open 6 HTTP and 6 HTTPS connections for a set of {Host,port} maximum allowable counts at the same time.
- Socket reuse: Provides a persistent TCP connections in the Socket pool for reuse. This avoids the additional time it takes to establish DNS, TCP, and SSL (if needed) for the new connection.
- Socket late-binding (Delayed binding): Network requests are always connected to a TCP connection when Scoket is ready to send data, so there is a chance to have a valid rating for the request ( prioritization), for example, may reach a higher-priority request during the socket connection. You can also have a better throughput rate (throughput), for example, to reuse a socket that is just available when the connection is opened, and to use a fully available TCP connection. In fact, the traditional TCP pre-connect (pre-connected) and a lot of other optimization methods are also this effect.
- Consistent session state: Authorization, cookies, and cached data are shared across all rendering processes.
- Global resource and networks optimizations (global resource and network optimization): The browser is able to make better decisions between all the rendering processes and the unhandled requests. For example, the current tab corresponding to the request for a better priority.
- Predictive optimizations (predictive optimization): By monitoring network activity, chrome builds and continuously improves the predictive model to improve performance.
- ... The project is also increasing in .
In the case of a rendering process, it is easy to send resource requests through the IPC, simply by telling the browser kernel process a unique ID and then handing it over to the kernel process.
Cross-platform resource loading
Cross-platform is also a major consideration for the Chrome network module, including Linux, Windows, OS X, Chrome OS, Android, and iOS. to this end, the network module as far as possible to achieve a single process mode (only separate cache and proxy process) cross-platform function library, so that the platform can share the basic components (infrastructure) and sharing the same performance optimization, The opportunity to optimize all platforms at the same time.
The relevant code can be found here in the "src/net" subdirectory). This article does not expand each component in detail, but understanding the structure of the code can help us understand its capabilities. Like what:
Net/android |
Bind to Android Runtime (Horky): The runtime is a rotten term, flipping and flipping. ] |
Net/base |
Public Network tool functions. For example, host parsing, cookies, network change detection (detection), and SSL authentication management |
Net/cookies |
Enables the storage, management, and acquisition of cookies |
Net/disk_cache |
Implementation of disk and memory caching |
Net/dns |
Implemented an Asynchronous DNS resolver (DNS resolver) |
Net/http |
Implements the HTTP protocol |
Net/proxy |
Proxy (SOCKS and HTTP) configuration, parsing (resolution), script fetching (scripts fetching),. . |
Net/socket |
Cross-platform implementation of TCP SOCKETS,SSL streams and socket pools |
Net/spdy |
Implements the Spdy protocol |
Net/url_request |
The realization of URLRequest, Urlrequestcontext and Urlrequestjob |
Net/websockets |
Implements the WebSockets protocol |
Each of these items is worth reading, the code is well organized, and you'll find lots of unit tests.
Architecture and performance on the mobile platform
Mobile browsers are growing, and the Chrome team sees the highest priority in optimizing the mobile experience. The first thing to say is that the mobile version of Chrome is not a direct migration of its desktop version, because that doesn't bring a good user experience at all. The innate nature of the mobile side determines that it is a resource-constrained environment, with some fundamental differences in the operating parameters:
- Desktop users can use the mouse to operate, there may be overlapping windows, large screen, also do not worry about the battery. The network is also very stable, with plenty of storage space and memory.
- Mobile users are touch and gesture operation, small screen, battery limited, through the only turtle speed and expensive network, storage space and memory is quite limited.
In addition, there is no typical model of mobile devices, but there are a large number of various hardware equipment. What Chrome wants to do is try to be compatible with these devices. Fortunately, Chrome has a different operating mode (execution models), in the face of these problems, the ease!
on the Android version, Chrome also uses a desktop version of the multi-process architecture -a browser kernel process, and one or more rendering processes. But because of the memory limitations, mobile chrome cannot run a specific rendering process for each tabl, but rather determines the number of rendering processes based on conditions such as memory, and then shares the rendering process across multiple tabs.
If the memory is really low, or otherwise causes chrome not to run multiple processes, it will be cut into single-process, multi-threaded mode. For example, on iOS devices, Chrome can only run in this mode because of its sandbox mechanism limitations.
With regard to network performance, Chrome first uses the same network modules on Android and iOS as the other platforms. This enables cross-platform network optimization, which is one of the advantages of Chrome's obvious lead. The difference is that it is often necessary to make adjustments based on network conditions and device capabilities, including the prioritization of speculative optimization, the timeout setting and management logic of the socket, the size of the cache, and so on.
For example, in order to prolong the battery life, chrome on the mobile side tends to delay closing idle sockets (lazy closing of idle sockets), usually to reduce the use of the signal (radio) and close the old one when opening a new socket. Also, because pre-rendering (pre-rendering, described later) uses a certain amount of network and processing resources, it is usually used only on WiFi.
The mobile browsing experience will be a separate chapter, perhaps in the next issue of the Posa series.
Reprint Please specify source: Http://blog.csdn.net/horkychen
Original address: http://www.igvita.com/posa/high-performance-networking-in-google-chrome/
from:http://blog.csdn.net/horkychen/article/details/9708103
High-performance networking in Google chrome (i)