This article is prohibited from being reproduced and produced internally by the UC browser.
0. Foreword Outline
Browser caching and storage-related features fall into four categories:
- Loading process
- Memory Cache
- Application Cache (abbreviated as AppCache)
- HTTP Cache
- Cookie Storage
- Javascript API
- Web Storage
- Indexed Database
- File API
- Cache Storage (core functionality of Service worker)
- Filesystem API
- Quota Management API
- Forward and backward
- Page Cache (Back-forward cache)
- History
- Save Web Page
List of terms
English |
Chinese meaning |
explain |
Resource |
Resources |
All network files are called resources, HTML documents, CSS, JavaScript, pictures, etc. |
Loader |
Loading device |
Modules in the browser that are responsible for loading resources |
NET Module/library |
Network Library |
The module responsible for network IO can be easily understood as the implementation of the HTTP protocol. |
Layout Engine |
Typesetting engine |
The module that is responsible for HTML parsing and loading control, during the WebKit period, it is called the render engine |
1. Cache Overview
The module that is responsible for loading the resource flow in the browser typesetting engine, which we call loader. In chromium, loader is more fine-grained to control the various resource loading processes initiated by the HTML standard (including HTML tags and JavaScript), the actual network IO is responsible for the dedicated network module, and there is a layer called fetch between the network module and the loader. Fetch still belongs to the typesetting engine, and the network module is separated from the hierarchy. For the front end, you only need to know that the fetch contains the memory cache as the first level cache. Loader is to get the cached resources from memory cache, AppCache, HTTP cache by different conditions, and then go to the next cache to find them.
//JavaScript pseudo-code description loading process function loadresource(Request) {Cookiestorage.addcookieifmatch (Request);if(Memorycache.containsvalidcache (Request)) {returnMemorycache.fetch (Request); }Else if(Request.isfromappcache) {if(Appcache.containsvalidcache (Request)) {returnAppcache.fetch (Request); }Else{returnAppcache.loadfromnetworkthenstore (Request); } }Else if(Httpcache.containsvalidcache (Request)) {returnHttpcache.fetch (Request); }Else{returnNetworktransaction.fetch (Request); }}
Where the memory cache data is always stored in RAM, both AppCache and HTTP caches are on disk. This design is to imitate cpu-memory-disk external memory these three.
The disk is an external device, and the CPU cannot directly access the data on the hard disk, first read the data on the hard disk to memory, and then the CPU accesses the data on the memory. Loader, like the CPU, is loaded from the network or from the disk cache, and the resulting data is organized and put in memory before proceeding with other operations. Later, if you want to manipulate these resources, you may have direct access to in-memory backups to achieve very high performance.
HTTP cache, as its name, is a data flow cache belonging to the HTTP (S) protocol, which is part of the network module outside of the typesetting engine, and the data is stored on disk. The implementation of the AppCache and HTTP caches in the disk cache is the same, except that there are different conditions of entry and exit according to the specification. From the norms also know, the priority to judge whether to go appcache.
The following is the data flow in the cache:
(Sorry, the picture is too lazy to look good)
As you can see, the memory cache is cached together with the decoded data, so it's particularly fast.
Since AppCache is manipulated by the manifest attribute in the HTML tag, it is an active action, so it is not described in the cache chapter.
Regardless of the cache, the URL is the key mapping relationship to determine whether the response data is slow.
In an incognito window, chromium does not write any resources to the disk, and all of the information is put in memory. But other browsers, in pursuit of a certain user experience, use HTTP cache to access some of the resources. This requires a certain algorithm, both to protect privacy and to reuse the cache.
2 Memory Cache Overview
The Memory cache is not required by any specification to do this, it is a browser optimization, but it is natural to do so in order to implement the specification. Because the browser window may need to be redrawn at any time, such as changing the window size, changing the scrollbar position, or JS modifying the DOM, all the resources of the current page must be kept in memory to respond quickly, without jumping out of the current page and all of its resources need to be cached in memory. This cache is retained as a certain algorithm for a period of time beyond the need, and it becomes the memory cache.
Because the memory cache still belongs to the typesetting engine, the data can be used directly by loader, so it is the most efficient cache. According to the HTTP protocol, if the resource is set to expire soon, the memory cache will be down to the cache for resources even if it has been backed up after expiration.
In addition to expiration, there are some conditions for the cache to be available, such as: Method and body are the same, security policy is the same (allow use of cookies or vouchers or not), most of the header is the same, etc. There are some other considerations, not listed, but basically have the relevant specifications to describe, and with the increase in HTML5 function is still expanding. A slight mention is that if the resource is still available after revalidation (HTTP GET 304), it is still cached with memory, not first eliminated and then taken from disk.
Chromium's Code reference:
Rawresource::canreuse ()
Resourcefetcher::d eterminerevalidationpolicy ()
Content
The original data and the decoded data are cached. Where the text is UTF-8 decoded, the image is decoded into an RGBA sequence.
Capacity
There is an important concept in the implementation of Memory cache: The resource used in the current page is called the active resource, and the resources that are not available on the new page become inactive resources After leaving the current page. The memory cache is limited to inactive resources and has a capacity of 8MB, which includes raw data and decoded data. There are no restrictions on active resources, and they will not be released if they are not visible. So the normal infinite scrolling page will run out of memory sooner or later, causing the browser to stutter or even crash. One of the improvements that the front end needs to make is to dynamically release elements. When the element is farther away from the viewable area, it moves out of the DOM tree without any references. Alternatively, simply rename the src attribute of the img tag to SRC-SRC (optional) attribute.
Eliminated
Elimination algorithm: LRU-SP (size-adjusted and Popularity-aware extension to Least recently used), that is, to add the size of the consideration of "recently used" obsolete. Refer to this article
Http://www.is.kyusan-u.ac.jp/~chengk/pub/papers/compsac00_A07-07.pdf
Shooting
Regardless of the cache, the hit rate is one of the performance metrics. For memory cache, the proportion of inactive resources being used. From the previous sections of the information, the natural increase in hit rate generally requires users to continue to browse the same site, because the site has the highest resource reuse, such as the same as the jquery URL reference.
From this point of view, small and medium-sized websites cite the large-volume site of the resource Cdn has a certain acceleration effect. (CDN reference http://www.jq22.com/cdn/)
According to statistics, hit the proportion of resources: Picture > JS > CSS.
Web Development tips for browser insider caching and Storage (1)