The startup speed is very important. How can we speed up? There is a simple principle: Local principle. Today, as the computing speed is getting faster and faster, the performance bottleneck is often on I/O (SSD hard drive machines are much faster than mechanical hard drives ).ProgramThe number of times the disk is read during running can effectively increase the speed. Reducing the number of disk reads during the program running is to reduce page fault errors, so that most of the data during the running process is loaded into the physical memory in advance, so there is a word called "pre-read ".
I. System Support for startup Acceleration
1. prefetch support
The operating system records the files and locations accessed by the application every time the application is started, and the information is recorded under \ windows \ prefetch. For example, I found the CHROME.EXE-D999B1BA easily on the machine. PF file. When you start the program next time, the system first transfers the data required for the program to the memory, because the system knows all the disk locations to be read in the future in advance, so we can try to read the data in the order possible, reduce the number of I/O times and save the seek time when reading data.
2. spatial locality principle support
The minimum I/O unit of the operating system is 4 kb per page. When a page is interrupted and disk I/O is caused, it generally does not retrieve only the number of data pages required, instead, the adjacent data is read to the physical memory. If the adjacent data happens to be accessed immediately, the I/O triggering can be reduced.
3. cache support
The system's virtual memory management mechanism and Cache Management Mechanism provide support for this aspect. After the program is closedCodeAll the physical memory occupied by the data is released and will be kept for a period of time. Then, the program is started for the second time, so it does not need to be read from the disk. I/O is less and the speed is faster. This can also be referred to as "time Locality Principle": After the user opens the program, it is likely to be opened for the second time.
The system has done so much work for the acceleration of program startup that we can do very little. Very little means we can do something more.
Ii. Method of Cold Start Acceleration
Cold start is the first time an application has been started since the operating system started. The corresponding Hot Start is not the first time the application has been started since the operating system started.
From the perspective of reducing I/O time consumption, it is best to put all the data in the physical memory at startup, and you do not need to transfer the disk data into the physical memory, this can be done with hot boot (but we cannot confirm it because the Cache Management of the system is transparent to the program ). Hot Start, we can't do anything on reducing I/O, it is already very good. What can be optimized is cold start, which will inevitably trigger the I/O of a large amount of data. How can I/O times be reduced and I/O time consumption be reduced? The speed for reading disks in a distributed manner is significantly lower than that for centralized reading. Therefore, to reduce I/O time consumption, random distributed reads are converted into centralized sequential reads.
In fact, it is very simple. Before the program is started, read the dynamic library used as common data. After this centralized read, the system will map the disk data into the physical memory, based on the time Locality Principle, the ing between these disks and the physical memory will be retained for a period of time. When the program is started, the system will not read the disk as needed, and the startup speed will be faster.
Chromium does this. The loadchromewithdirectory () function of \ SRC \ chrome \ app \ client_util.cc reads the dynamic library before loading chrome. dll. The pre-read code is in the \ SRC \ chrome \ app \ image_pre_reader_win.cc file. win7 is different from XP. It is estimated that the system's support for caching and locality is inconsistent in different system versions. The chromium code is very good. We can use it directly without having to spend time researching system support.
Iii. Effects of chromium on Acceleration
Use process monitor to view the number of times chrome. dll uses readfile. It is found that the readfile is still triggered during the running of the program, which is estimated to be related to the available physical memory of the current system. The test found that the best situation is that the chromium browser is opened just after the machine is started, and the readfile of chrome. dll is only preread during the startup process. The worst case is that the system memory usage is very high. The system cannot allocate enough physical memory to the chromium process, which may cause page fault after the readfile is complete, replace the pre-read data with the physical memory, so that the pre-read will not work. In addition, observe the duration of each readfile from the process monitor and find that sometimes the time is long and sometimes short, and the time is several times short after a long time, the disk may also cache based on the local principle.
Chromium will also trigger pre-reading during hot start, which is estimated to have a limited effect. You can consider removing it, which may speed up the hot start. How can we determine whether it is cold start or hot start? You can use atom. This function is available only for applications and unavailable for console programs. For details, refer to msdn. Sample Code:
Bool Iscoldstartup (){ Static Int Nret =- 1 ; If (Nret! =- 1 ){ Return Nret = 1 ;} Nret = 0 ; Atom Atom =: Globalfindatom (L " Cswuyg_test_cold_startup " ); If (Atom = 0 ) {Nret = 1 ;: Globaladdatom (L " Cswuyg_test_cold_startup " );} Return Nret = 1 ;}
4. Process Monitor Tool
The readfile operation was observed through process monitor. The fast I/O, paging I/O, and non-cached I/O displayed in the operation were confused and some information was searched. This is probably the case:
1. Paging I/O reads the disk.
2. Non-cached generally means that data is not in the cache and needs to be read from the disk, or the cache is intentionally disabled. 3. If the data is in the cache, that is, cached, You can have fast I/O
Detailed information is shown below:
1. After reading this figure, I basically know what it means.
From: http://i-web. I .u-tokyo.ac.jp/edu/training/ss/lecture/new-documents/Lectures/15-CacheManager/CacheManager.pdf
2. Glossary
Q25 what is the difference between cached I/O, user non-cached I/O, and paging I/O? In a file system or file system filter driver, read and write operations fall into several different categories. For the purpose of discussing them, we normally consider the following types: -Cached I/O . This operation des normal user I/O, both via the fast I/O path as well as via the irp_mj_read and irp_mj_write path. it also operations des the MDL operations (where the caller requests the FSD return an MDL pointing to the data in the cache ). -Non-cached user I/O . This operation des all non-cached I/O operations that originate outside the virtual memory system. -Paging I/O . These are I/O operations initiated by the virtual memory system in order to satisfy the needs of the demand paging system. Cached I/O Is any I/O that can be satisfied by the file system data cache. in such a case, the operation is normally to copy the data from the virtual cache buffer into the user buffer. if the virtual cache buffer contents are resident in memory, the copy is fast and the results returned to the application quickly. if the virtual cache buffer contents are not all resident in memory, then the copy process will trigger a page fault, which generates a second re-entrant I/O operation via the paging mechanic. Non-cached user I/O Is I/O that must bypass the cache-even if the data is present in the cache. for read operations, the FSD can retrieve the data directly from the storage device without making any changes to the cache. for write operations, however, an FSD must ensure that the cached data is properly invalidated (if this is even possible, which it will not be if the file is also memory mapped ).Paging I/O Is I/O that must be satisfied from the storage device (whether local to the system or located on some "other" Computer System) and it is being requested by the virtual memory system as part of the paging mechanism (and hence has special rules that apply to its behavior as well as its serialization ).
From: http://www.osronline.com/article.cfm? Article = 17 # q25
3. What does fast I/O disallowed mean?
I noticed this "Fast Io disallowed" againest createfile API used in exe. What does this error mean .? It's benign but the explanation is a bit long. basically, for a few I/O operations there are two ways that a driver can service the request. the first is through a procedural interface where the driver is called with a set of parameters that describe the I/O operation. the other is an interface where the driver has es a packetized description of the I/O operation. the former interface is called the "fast I/O" interface and is entirely optional, the latter interface is the IRP based interface and what most drivers use. A driver may choose to register for both interfaces and in the fast I/O path simply return a code that means, "Sorry, can't do it via the fast path, please build me an IRP and call me at my IRP based entry point. "This is what you're re seeing in the process monitor output, someone is returning "no" to the fast I/O path and this results in an IRP being generated and the normal path being taken.
Fast I/O is optional. If not supported, disallow. Therefore, you cannot determine whether the cache is hit.
From: http://forum.sysinternals.com/what-is-fast-io-disallowed_topic23154.html
In this regard, the content in Chapter 5th of winndows nt File System internals is explained.
V. References
1. Chapter 2 and 10 of C ++ Application Performance Optimization
2. Start acceleration of chromium source code
3. Read notes http://www.cnblogs.com/cswuyg/archive/2010/08/27/1809808.html of "C ++ application performance optimization" previously written