Until Windows 8, Microsoft did not provide an AppStore like Apple, so on this platform to develop and use the software there is a certain threshold: For ordinary users, specifically went to the electronics market to buy a set of office or entertainment Software CD-ROM is not something everyone likes, and even in the less healthy software development environment in China, to the major software Web site to download non-toxic software is not an easy task for developers Even more so, not only do you want to take care of the various functions that software development expects, but also deal with a series of additional tasks such as packaging, anti-hacking, registration, and so on, all of which are encountered with all software running on the Windows platform , Their solutions are similar, but Windows does not help developers to deal with these issues.
One of the topics we are discussing today, "Upgrading," is also covered. If you have years of experience using the Windows operating system, you should find a phenomenon that is common to current software upgrades: placing a menu item called Software Update in the Help menu and clicking to launch a query The latest version of the loading interface, once inquired about the new version, will prompt the user whether to download and install.
The logic of this implementation is very simple, but it is an important "lifeline" for many applications that require continuous operations. Because only let the user feel the efforts of this team, the team has the meaning of sustaining, and this requires users to upgrade to get a new version.
However, even though many teams make the logic of "check whether it is the latest version" to become the "top priority" after the software launches, only a few enthusiastic users will choose to click the "OK" button. Let's take a look at their concerns:
(1) I do not have access to new features, there is no need to upgrade.
(2) The new version may not be stable.
(3) I am a pirated user, upgraded, cracked the patch is invalid
(4) Upgrade Once downloaded a large installation package, to wait a long time, but also to install, too tossing.
The first three questions do not have a common solution to different products. Compared with the technical problems, they are more like the localization of the product itself or the quality of the development team, which needs to be analyzed concretely. But for the fourth case, it is true that technicians can try to challenge.
Security Patch
I remember just a few years ago, the entire Chinese environment was far less comfortable. At that time, in addition to the relatively high speed of campus networks in some colleges and universities, the 1M bps "small pipes" accounted for the vast majority. Let us call this time period "narrowband era", an online watch video can only rely on P2P acceleration era.
And turtle speed network is inconsistent with the volume of the installation package is often ten or even tens of megabytes in this environment, waiting for a new version of the installation package to download is very test the patience, I believe we must Most people have the experience of clicking the "Cancel" button on the slow downloading screen.
It seems, simply download the new version of the installation package and perform the installation process of the program, only to loyal users and specific scenarios, then we pay attention to Microsoft's development team adopted what? To know the Windows operating system development team There is more than one security breach on Windows every day, and fixing these issues is not only frequent but urgent.
The face of bandwidth consumption and installation time-consuming, Microsoft adopted a security patch approach: The so-called security patches, in essence, is to bring new logic one or more new version of the program file, such as the original abc.dll exposed a buffer Overflow vulnerability, then we released a version of the already made protection abc.dll 2.0, packaged into a security patch can be installed independently, and then distributed to the user when the user restarts the Windows operating system to complete the replacement (Windows in the real initialization Completed, it is difficult to replace its system files, because the system to face the occupation problem, mitigation method is hotfix technology, but in most cases, still need to restart the user installation).
This process seems simple, and slim security patches will not be wasted in the download and installation time, in theory, is a very good measure. But actually in operation it is a huge cost, not even the average team can afford:
First of all, a complex enough software, the relationship between the various modules must be very close, 2.0 versions of abc.dll often require a 2.0 version of "ddd.dll", "eee.dll" ... Unless a dll package The module is very, very independent, otherwise there can not be a dll that can be optionally replaced in any previous release. This requires a high-cost job - "custom." The implication is that if a software version of the service as many as a dozen, we need to provide a dozen abc.dll version 2.0, to a specific external version of the specific patch.
Second, the release of any patch requires rigorous testing, a version of the test may take several days, then a dozen versions of the test? The price can be imagined. (For example, a Windows development team needs a few weeks of internal and external tests for each release of a security patch, so a dedicated team is responsible for that and the scale is impressive)
Figure 1: Patch Upgrade
As long as the user a confirmation, the entire process is executed slowly in the background (of course, reluctant to restart the hint really harassment). Although Microsoft has maintained this safe, effective and user-friendly solution for so many years, it has to be said that this solution is too expensive for the average development team.
Keep up to date
But we also do not have to rigidly adhere to such a mentality, you know, Microsoft adopted a specific version of the specific patch push practice has its own position some considerations, such as:
(1) Microsoft's patch is actually very complex in terms of execution logic, much more than simple file replacement, which is a product of a complex operating system that reaches a certain stage. However, for general application software, simple file matching and replacement basically can meet most of the scenes.
(2) Microsoft certainly does not want XP or Vista users to upgrade to Win7 for no apparent reason, but for normal free software or after paying a license to enjoy life-long paid software.
If we abandon the two burden of appeal, then we can now design a program, everything begins with a simple idea, let the user local program file to follow the latest version of the server, which we call "the file is up to date" . Moreover, the idea of this program is very simple, that is, the client program uploads its own version number to the upgrade server, the upgrade server to determine the current user version to the latest version need to update those files, these files need to be updated into a file -list issued to the user, and then by the client in the process of running the files listed in the file-list one by one downloaded to the local, and complete the replacement before the next start the program.
If we do some more careful consideration, we also pay attention to some compilation knowledge, for example, we know that a PE file (dll, exe, etc.) the head will carry some of this compilation-specific information, such as time stamping, etc., and For the upgrade, only the changes in the data segment and the code segment is our concern, so in the comparison between versions of the file, we need to remove this part of the interference information.
The figure below is a description of this simple idea:
Figure 2: Maintain the latest ideas
This program seems reasonable, but we all know the fact that now there are tens of millions of applications on Windows, but few are using the above ideas to upgrade, and why?
Because this solution is too idealistic: if a software is based on a C ++ -based compiled language, even if the change of header information is filtered out, there is not much or no difference between the two different versions of the program files. To explain this phenomenon, we should also like some of the above, and then focus on some compile and link knowledge:
We look at the reasons for causing a PE file differences, in fact, mainly from three factors (details can refer to http://www.daemonology.net/papers/bsdiff.pdf):
(1) header information changes: As mentioned above, this is information that the compiler will compile every time a binary file is generated. As we mentioned earlier, this information is well circumvented.
(2) changes in the code: You modify a part of the code in the project, then reflected in the binary PE file there will be a clear and focused difference. This is inevitable.
(3) indirect effects: we know that there is a lot of information in the compiled PE file is absolute address, if you change the direction of a pointer, then all the addresses that reference this pointer will change, well, you have to accept The fact that a single line of code changes is greatly enlarged by the compiler, and the changes in one PE file may not originate from the Project's own source code differences.
If it is a huge system of millions of lines of code, as many as a hundred projects make up its composition, and the dependencies between modules are extremely complex, then the third difference mentioned above (ie, the indirect effect ), The role of determining the degree of difference in the program file will be magnified.
In other words, with this solution, a common application software upgrade may need to consume more than 100M bandwidth. Although the network environment in China has become more comfortable in recent years, we all know that this magnitude is not yet acceptable to us. So we can answer that question just now. "Why are not a few softwares that have been upgraded with these ideas."
Difference compression
Little friends who are a little interested in military affairs know that the most innovative military weapons of the 20th century were born in the latter part of World War II. The demand for war has greatly contributed to technological progress. Similarly, many of the needs of the modern Internet are constantly evoking technological advances, such as games.
In fact, large-scale online games have always faced a similar problem: First of all, most of the game resource files are packaged together and encrypted, especially 3D online games, their resource pack easily G. On the other hand, in order to increase the interactive game has been festive activities, frequent updates are also essential, it can not always force each user to download updates on the G file, right?
However, the technical team has found a solution to "Save the Country by Curving the Curve." Instead of just staring at files one by one, it narrows the granularity of comparison and narrows the differences between files to Bit , Direct attention to binary differences. The same idea can also be moved here:
If a DLL file has changed in two iterations of the previous and next versions, then certainly not every bit has changed, but only some of them differ. If we extract these differences and make a binary patch, then only those patches need to be delivered to the client by the upgrade server, then the client can "synthesize" a new version of the patch based on these patches as well as the old files.
If a 5M DLL makes a difference, it's often just because one of the lines is changed. Using the above idea, we can produce a difference patch of less than 1K and deliver it to the client, from 5M to 1K , We save 99.98% of the bandwidth.
Figure 3: Reduce the granularity to bit
So how to achieve it?
In fact, there are many ways, but at present the most commonly used is a program called "sliding comparison", a simple diagram to describe the idea of roughly the following:
Figure 4: Sliding delta algorithm
The idea of the above said, in computer science is called "delta compression", or "difference compression," the more typical example is the field of video compression gap between the second frame only save the first frame The changing part.
【PE file】
But for the PE file, the problem reappears, and variations caused by address changes tend to be sparsely scattered throughout the entire file. (In the worst case, a single line of code changes in the public header can make a 10% difference in the resulting binary.) Carefully compare the two versions of the DLL file using Beyond Compare, Every few dozen bytes, an address has changed, another dozen bytes, the same address has changed. Therefore, the "sliding match" scheme does not apply, because the difference patch calculated by this method is often not much smaller than the actual file.
Figure 5: PE file differences
This problem is actually a directional error, but also remember the above mentioned differences in PE file when the third type of difference? We use the idea of exact matching to deal with a problem can not be accurately compared to, as in genetic engineering similar The genetic differences between humans and chimpanzees may be less than 1%, but the difference of 1% is scattered in 23 pairs of DNA fragments.
Figure 6: Find the law
We take a closer look at the binary changes in Figure 6 above, you will find there seems to be some rules:
Differences in approximation regions result from changes in the pointer address
■ Pointer to the same address will change the same
■ near the address often happen to bring the nature of the change
As a result, our algorithm needs to classify these problems. In other words, binary differences due to code changes can be handled as a class because those differences are both focused and obvious. However, due to the indirect deviation caused by pointer offset, the statistics should be converged. These differences tend to be homogeneous. Dozens of binary differences often result from the jump of a pointer
Figure 7: Dividend treatment
How to find out these differences in a binary program segment and reasonably compress it is an urgent need and there are already mature solutions:
■ BsDiff: An open source tool in Linux dedicated to fast and lightweight updates to Linux operating system vulnerabilities (similar to Microsoft's security patches). The core idea of the algorithm is to make an approximate match based on statistical rules and then through a series of (Such as BWT transform) to improve the "approximate segment" compression rate.
■ Courgette: The core Google Chrome upgrade system based on BsDiff but with a series of improvements that incorporate platform-related information (x86 assembly instructions) in the hope of pinpointing pointers more accurately and avoiding statistical algorithms The error rate when the difference is obvious.
The internal principle of these two modules is still very interesting, the former is a typical academic thinking, the latter is a master of the style of engineers, but in order to avoid your motivation to read this article, I will do in a later article supplement.
Without regard to cross-platform issues, Google's Courgette is undoubtedly the best of the best, here's what's on the Chrome team's website (http://dev.chromium.org/developers/design-documents/software-updates-courgette) Statistics provided icon created:
Figure 8: good bandwidth savings
Due to the amazing bandwidth savings caused by the binary differences, users can download an upgrade package with low bandwidth consumption. The whole process can reduce user harassment as much as possible. For example, users in the use of Chrome browser, you can upgrade to the latest version without feeling, as far as possible to avoid their browser vulnerabilities are not used by illegal web pages.
Most importantly, since the entire patch package is based on a uniform algorithm, the entire process can be automated to collapse and free people from it. That is to say, there are many different versions of this software on the Internet, but we omit the repetitive work of customizing different patches for each version mentioned at the outset, and we give it to the computer to do it automatically.
Review summary
The topic here is basically over, in general, this article summarizes several common client software upgrade programs, they are different in the realization of the program, but are born in order to solve a problem. As in all areas, advances in technology in this field of computers are the constant advancement over time and time again of challenges, constant experimentation, and constant evolution before the harsh demands and thorny issues before finally casting my present greatness Era.