Zero-cost repair of server kernel defects UCloud kernel hot Patching Technology Secrets

Source: Internet
Author: User

Zero-cost repair of server kernel defects UCloud kernel hot Patching Technology Secrets

On September 16, July 18, the ArchSummit global architect summit sponsored by InfoQ kicked off in Shenzhen. The conference focused on six most popular fields, including games, e-commerce, and mobile Internet. As a cloud service provider focusing on the above vertical fields in China, UCloud was invited to attend this conference. At the conference, Qiu Mo, a senior UCloud engineer, also unveiled the secrets of kernel technology on the UCloud platform with the subject "kernel practices of the UCloud platform. Among them, the "UCloud kernel hot Patching Technology" has aroused great attention from the architects.

How can we fix Linux kernel defects of massive servers at zero cost?

For a company with thousands of servers, Linux kernel defects often result in crashes. What worries engineers is whether to fix the defects by upgrading the kernel of the server? Upgrading means restarting the server, service interruption, and heavy preparation. If you do not upgrade the server, you are worried that the server will crash, which also causes business interruption and heavy aftercare.

In today's cloud computing era, a host often runs multiple VM instances. Every restart, whether it is an active Upgrade or a passive crash, means that all VM instances running on it are interrupted. Therefore, the repair of host kernel defects is more difficult.

As a cloud service provider that supports the IT infrastructure of tens of thousands of enterprise users, how does one fix core defects of massive hosts on the UCloud platform?

Qiu Mo-tao revealed that, if the repair is based on the traditional restart method, both UCloud and user will mean heavy O & M and service interruption. However, UCloud uses the "kernel hot Patching Technology"-that is, binary patches are applied to the running kernel. UCloud has fixed the kernel defects of massive servers with zero cost and no restart! Up to now, UCloud has fixed all 10 + defects in the upstream kernel in hot Patching mode, with a total of tens of thousands of times. There is no failure and no side effects; theoretically, it avoids the number of host machine restarts and the implicit cloud host service interruption. This technology has matured in UCloud.

Secrets of UCloud kernel hot Patching Technology

UCloud's hot Patching Technology is customized and optimized based on open-source ksplice many years ago. It fixes the kernel by loading a specially prepared hot Patching module. The process is shown in:

The hot patch module is compiled and generated by the ksplice program, which contains defective binary commands and repaired binary commands (which are organized at the function level). After the module is loaded, the system automatically locates the kernel defect and dynamically replaces the defect instruction with the fix instruction.

In addition to restart-free repair, hot Patching is also used for kernel development performance analysis and fault locating. For example, with the performance statistics code generation hot patch, you can analyze the performance issues that interest you online; add additional debugging code to capture exceptions in the running kernel. These are very useful, and they are the best way to capture kernel exceptions that cannot be reproduced in a large number of servers. Hot Patching does not need to restart the server. It can be penetrated or undone, so there is no side effect.

UCloud optimizes open-source Ksplice in the following three aspects:

Support for later kernel versions

The hot Patching Technology is closely coupled with the kernel. Different versions of the kernel have different command structures and conform to the table structure bodies and some features (for example, the earlier kernel does not have ftrace), which directly affects the success or failure of hot Patching. UCloud studies the differences between different versions of the kernel, so that the same ksplice supports linux kernels of different versions. It is worth mentioning that ftrace is not compatible with ksplice.

Allow hotfix for frequently called Functions

No matter what kind of hot Patching Technology, two types of kernel functions are difficult to hot Patching: Frequently Used kernel functions such as schedule and hrtimer; functions that are often at the top of the thread stack kernel, such as sys_poll, sys_read. UCloud has changed ksplice-related kernel code and user-state tools and successfully removed these restrictions. For example, three hot hrtimer patches have been installed on the current UCloud server.

Reduce service interruption time

Ksplice replaces binary commands after stop_machine. Although the interruption caused by a single stop_machine operation is about one millisecond, some frequently used kernel functions require a large number of retries before they can meet the appropriate hot Patching time, as a result, it may cause up to hundreds of milliseconds of interruption. UCloud has made some optimizations here, so that the service interruption time can be controlled within 10 milliseconds.

In a massive server environment, hot Patching can be used to fix kernel defects at zero cost without any side effects, and kernel development can be further and better implemented by hot Patching. In the past, due to the lack of Auxiliary analysis methods and fear of kernel bugs, even if the features suitable for Kernel implementation were also warned to be moved to user-mode implementation. However, with hot Patching, the related concepts can be adjusted as appropriate, kernel development can also be more bold and skipped.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.