Summary of the latest Power Management Technology in Linux 2.6 kernel

Source: Internet
Author: User
Tags echo command

Preface

This series of articles will combine the new energy-saving technologies that have been continuously added to various hardware (including the latest Bus Standards and peripherals such as CPU, chipset, and PCI Express) in recent years.

From the perspective of Linux 2.6 kernel and the entire software stack (including kernel, middleware, and various user-State Utility), how to add support for these innovative energy-saving technologies is introduced to readers.Linux operationsThe system has made great progress in power management in recent years and its future development direction.

As the beginning of this series of articles, we should first introduce cpufreq to you, linux 2.6 kernel is a new kernel subsystem to better support the variable frequency technology that has emerged in mainstream CPU processors in recent years.

Origin of cpufreq

With the introduction of concepts such as energy efficient computing and performance per watt, as well as the development of advanced configuration and Power Interface (ACPI) standards, currently, the mainstream CPUs in the market provide support for the frequency scaling technology. For example, the enhanced speedstep technology supported by Intel processor and the powernow supported by AMD processor! Technology, like the latest PowerPC,ArmAnd super H processors also provide similar support. Refer to the list of processors with Variable Frequency Technology supported by the Linux 2.6 kernel. It should be noted that the frequency conversion technology discussed here is different from the frequently-known overclocking technology. Overclocking refers to the process that enables the processor to work at a non-standard frequency by increasing the core voltage, which may lead to serious consequences such as shortening the CPU service life and reducing the system stability.

However, the frequency conversion technology means that the CPU Hardware supports running at different frequencies, during the operation, the system can dynamically switch between these different operating frequencies based on the system load situations that may change at any time, so as to achieve both performance and power consumption.

Although multiple processor manufacturers provide support for the frequency conversion technology, there must be a slight or even huge difference in hardware implementation and use methods. This allows each processor manufacturer to add code to the kernel according to its special hardware implementation and usage, so that the variable frequency technology in their products can be supported and used in Linux. However, the consequence of this kernel development mode is that the implementation code of various manufacturers is scattered inLinux KernelIn every corner of the code tree, no code is shared among various implementations, which brings huge overhead to kernel maintenance and support for new products in the future, and directly led to the birth of the cpufreq kernel subsystem. In fact, as mentioned above, the purpose of the frequency conversion technology is to enable the system to dynamically adjust the CPU operating frequency at any time according to the system load changes during operation. This can be divided into two parts: one is the question of "what to do" and the other is the question of "how to do. "What to do" refers to how to select the proper CPU running frequency based on the dynamic changes of the system load, "How to do" is to set the CPU according to the selected running frequency at the selected time, so that it truly works at this frequency. This is the issue of mechanisms and policy policies that we often encounter in software design, the well-designed software ensures that the two are clearly isolated and communicate with each other through standardized interfaces.

Design and use of cpufreq

To solve the problems mentioned above, cpufreq, a new kernel subsystem, came into being. Cpufreq is inLinux KernelProvides a unified design framework for Variable Frequency Technology that better supports different CPUs. Its software structure 1 is shown in.

Figure 1. Software Structure of cpufreq
 

As shown in 1, cpufreq is mainly divided into the following three modules:

The cpufreq module encapsulates and abstracts how to control the variable frequency technology supported by various CPUs at the underlying layer and how to dynamically select the appropriate running frequency based on the system load on the upper layer, A clear interface is defined between the two, and thus the separation of the mechanism and policy mentioned above is completed in the design.

At the underlying layer of the cpufreq module, each CPU manufacturer only needs to provide the CPU-related Variable Frequency Driver (CPU-specific drivers) based on the hardware implementation and usage of its variable frequency technology ), for example, Intel needs to provide a CPU driver that supports enhanced speedstep technology, while amd needs to provide support for powernow! Technical CPU driver.

In the upper layer of the cpufreq module, governor is used as the decision maker to select the appropriate target running frequency. according to certain standards, the appropriate CPU running frequency is selected at the appropriate time, the interface defined by the cpufreq module is used to operate the underlying CPU-related Variable Frequency Driver and set the CPU to run at the selected running frequency.

The latestLinux KernelProvides performance, powersave, userspace, conservative, and OnDemand governors for users to choose from, they use different standards when selecting the appropriate CPU running frequency and apply to different application scenarios respectively. You can only select one governor at a time, but you can switch to another Governor Based on application requirements during system running.

The benefit of this design is that the governor and CPU-related variable frequency driver can be developed independently of each other and code can be reused to the maximum extent, when writing and experimenting with the new Governor, kernel developers will no longer fall into the hardware implementation details of a specific CPU's frequency conversion technology, while the CPU manufacturerLinux KernelWhen adding code that supports its specific CPU frequency conversion technology, you only need to provide a relatively simple driver, you don't have to consider how to choose the appropriate operating frequency in different application scenarios.

The cpufreq subsystem in the kernel provides the user interface through the sysfs File System to the upstream application. For every CPU in the system, the sysfs user interface of cpufreq is located in the/sys/devices/system/CPU/cpux/cpufreq/directory. X represents the processor ID, which corresponds to the information in/proc/cpuinfo. Taking cpu0 as an example, users usually observe the following files in this directory:

$ LS-F/sys/devices/system/CPU/cpu0/cpufreq/
Affected_cpus
Cpuinfo_cur_freq
Cpuinfo_max_freq

Cpuinfo_min_freq
OnDemand/
Scaling_available_frequencies
Scaling_available_governors
Scaling_cur_freq
Scaling_driver
Scaling_governor
Scaling_max_freq
Scaling_min_freq
Stats/

All readable files can be read using the cat command, and all writable files can be written using the echo command. Cpuinfo_max_freq and cpuinfo_min_freq provide the highest and lowest operating frequencies supported by the CPU hardware. cpuinfo_cur_freq reads the current operating frequencies of the CPU from the CPU hardware register. Although the CPU Hardware supports a variety of operating frequencies, in some cases, you can select only one subset. This control is implemented through scaling_max_freq and scaling_min_freq. When selecting an appropriate running frequency, Governor selects only the frequency range determined by scaling_max_freq and scaling_min_freq, which is the content displayed by scaling_available_frequencies. Unlike cpuinfo_cur_freq, scaling_cur_freq returns the current CPU running frequency cached by the cpufreq module, instead of checking the CPU Hardware registers. Scaling_available_governors will tell you which governors are available for users, while scaling_driver will display the variable frequency driver used by the CPU. The stats Directory provides statistics on the usage of various CPU running frequencies, such as the running time of the CPU at various frequencies and the frequency of frequency conversion. The OnDemand directory is related to OnDemand Governor, which will be introduced later.

Through the above introduction, you have a general understanding of how to use the user interface provided by cpufreq through sysfs, but for most users, operating these files one by one is both laborious and time-consuming. Therefore, Dominik and others have developed the cpufrequtils toolkit [2], providing users with a simpler operation interface for the kernel cpufreq subsystem. With the output of cpufreq-Info, you can clearly see the content of each file in the/sys/devices/system/CPU/cpux/cpufreq/directory you just introduced above.

$ Cpufreq-Info
Cpufrequtils 002: cpufreq-Info (c) Dominik brodoski

2004-2006
Report errors and bugs to linux@brodo.de, please.
Analyzing CPU 0:
Driver: ACPI-cpufreq
CPUs which need to switch frequency at the same time:

0 1
Hardware limits: 1000 MHz-1.67 GHz
Available frequency steps: 1.67 GHz, 1.33 GHz, 1000

MHz
Available cpufreq governors: userspace, conservative,

OnDemand, powersave, performance
Current policy: frequency showould be within 1000 MHz

And 1.67 GHz.
The governor "OnDemand" may decide which

Speed to use
Within this range.
Current CPU frequency is 1000 MHz.
Analyzing CPU 1:
Driver: ACPI-cpufreq
CPUs which need to switch frequency at the same time:

0 1
Hardware limits: 1000 MHz-1.67 GHz
Available frequency steps: 1.67 GHz, 1.33 GHz, 1000

MHz
Available cpufreq governors: userspace, conservative,

OnDemand, powersave, performance
Current policy: frequency showould be within 1000 MHz

And 1.67 GHz.
The governor "OnDemand" may decide which

Speed to use
Within this range.
Current CPU frequency is 1000 MHz.

The origin of OnDemand governor and Its Implementation just now we can see in the output of cpufreq-info that the cpufreq subsystem provides a total of five governors for users to choose from. They are userspace, conservative, OnDemand, powersave and performance. In the latest kernel, if you do not set additional settings, OnDemand will be used as the default governor. To understand the cause of this situation, we will lead the readers to review the development history of governor in the CPU freq subsystem in the kernel.

As a sub-system, cpufreq was first added to the Linux kernel with only three governors, namely performance, powersave, and userspace. When you choose to use performance governor, the CPU will work at the highest running frequency supported by the user. When you choose to use powersave governor, the CPU will work at the lowest running frequency supported by the user. Therefore, both types of Governors belong to static governor, that is, when they are used, the CPU running frequency will not be dynamically adjusted based on the changes in the system running load. These two governors correspond to two extreme application scenarios. Performance Governor reflects the greatest pursuit of high system performance, while powersave governor is the biggest pursuit of low power consumption of the system. Although these two application requirements do exist, most users need a more flexible Variable Frequency strategy most of the time. The earliest cpufreq subsystem provided users with this flexibility through userspace governor. Just like its name, when using userspace governor, the system gives the decision-making power of the Frequency Control Policy to the user State application, and provides corresponding interfaces for user-State applications to adjust the CPU running frequency. After you use cpufreq-set in the cpufrequtils toolkit to set userspace to the Governor used by the cpufreq subsystem, we can see that a file named scaling_setspeed is added in the/sys/devices/system/CPU/cpux/cpufreq/directory, this is the special user interface provided by userspace governor. You can set the CPU to run at this frequency by writing any running frequency supported by scaling_available_frequencies to the file.

# Cpufreq-set-G userspace
# Cat cpuinfo_cur_freq
1000000
# Cat scaling_available_frequencies
1667000 1333000 1000000
# Echo 1333000> scaling_setspeed
# Cat cpuinfo_cur_freq
1333000

As mentioned above, when using userspace governor, the system handed over the decision-making power of the variable frequency policy to the user State application. This user-state application is generally a daemon program that collects system information at intervals and dynamically adjusts the CPU running frequency based on the system load using the scaling_setspeed interface provided by userspace governor. As the daemon program, powersaved or cpuspeed was generally used in several major Linux distributions. The two daemon programs generally count the CPU load in this sampling period every few seconds, and adjust the CPU running frequency according to the statistical results. Although the variable conversion method of the userspace governor and user-state daemon program provides users with certain flexibility, however, the feedback obtained through the extensive use of open-source communities gradually exposes two serious defects of this method. The first is performance issues. For example, if powersaved performs a sampling analysis of system load every five seconds, we can analyze the user experience in the following application scenarios. Assume that the sampling analysis of powersaved is just completed, and the CPU is set to run at the lowest frequency due to the low system load during the just-concluded sampling period. In this case, if you open a program such as Firefox that requires high CPU computing power, powersaved will have the opportunity to observe this requirement to increase the CPU running frequency after about five seconds at the next sampling point. That is to say, the CPU computing power was not fully utilized within five seconds after Firefox was started, which will undoubtedly compromise the user experience. The second is the accuracy of sampling and analysis of system load. It is not reasonable to assign a user-State program to a task that monitors the system load and determines the performance requirements of the CPU in the future, on the one hand, it is difficult for a user-State program to fully collect all the required information, because most of the information is stored in the kernel space. On the other hand, if a user-State program wants to collect the system information, data Interaction between the user State and the kernel state is required, and frequent data interaction between the user State and the kernel state will negatively affect the system performance.

Are there any solutions to these two problems? It should be noted that the developers in the community can easily reach an agreement on the second question. Since there is such a problem in collecting and analyzing the system load in the user mode, A more reasonable approach is to assign this part of work to the kernel. But what's the first problem? The most intuitive solution to the first problem is to reduce the time interval between sampling and analysis of the system load, so that powersaved can respond to changes in the system load as soon as possible. However, this simple solution to reduce the time interval of sampling analysis also has two problems. On the one hand, this means more frequent data interaction between the user State and the kernel state, therefore, it must have a greater negative impact on the system performance. On the other hand, the main reason is that the frequency conversion technology of various CPU manufacturers was still not complete on the hardware, it takes a long time to set the frequency of the CPU. For example, the Early Intel speedstep technology takes 250 microseconds to set the frequency of the CPU, in this process, the CPU cannot execute commands normally. It is easy to find that even for a CPU with a clock speed of 1 GHz, 250 microseconds means 250,000 clock cycles, during this period, the CPU can execute tens of thousands of commands. Therefore, from this perspective, it is easier to reduce the time interval of sampling analysis, which has a more serious negative impact on system performance. Fortunately, as the hardware technology continues to improve and improve, the operation time required to set the CPU frequency conversion has been significantly reduced, for example, the latest enhanced speedstep Technology of Intel has reduced the frequency conversion setting time to 10 microseconds, which is more than one order of magnitude. It is precisely this kind of CPU hardware technology that provides an opportunity for Kernel developers to solve these legacy problems. Venkatesh and others proposed and designed and implemented a new governor named OnDemand, it has long been expected to see a governor that works completely in the kernel state and can sample and analyze the system load at a finer interval. Before introducing the specific implementation of OnDemand governor, let's take a look at how to use OnDemand governor and what Operation interfaces It provides to users. After you use cpufreq-set to set OnDemand to the currently used governor, a subdirectory named OnDemand will appear in the/sys/devices/system/CPU/cpux/cpufreq directory.

$ Sudo cpufreq-set-G OnDemand
$ Ls/sys/devices/system/CPU/cpu0/cpufreq/OnDemand/
Ignore_nice_load
Powersave_bias
Sampling_rate
Sampling_rate_max
Sampling_rate_min
Up_threshold
$ Sudo cat sampling_rate_min sampling_rate

Sampling_rate_max
40000
80000
40000000
$ Sudo cat up_threshold
30

The three files named with sampling headers in this subdirectory provide the minimum sampling interval allowed by OnDemand governor, the currently used sampling interval, and the maximum allowed sampling interval, all three are in microseconds.

Taking the author's computer as an example, OnDemand Governor samples every 80 milliseconds. Another important file is up_threshold, which indicates the percentage of the system load exceeding the system load. OnDemand governor will automatically increase the CPU running frequency. Take the author's computer as an example. The value is 30%. How does this indicate the percentage of system load? In the CPU that supports Intel's latest enhanced speedstep technology, two MSR registers (model specific register) are directly provided in the processor hardware for OnDemand Governor sampling and analyzing system load conditions. The two MSR registers are named ia32_mperf and ia32_aperf [5] respectively. The mperf in ia32_mperf MSR represents maximum performance, and the aperf in ia32_aperf MSR represents actual performance. Like the two MSR names, the ia32_mperf MSR register is a counter that is added to every clock cycle according to the maximum operating frequency supported by the CPU Hardware when the CPU is in the ACPI C0 state; the ia32_aperf MSR register is a counter that is added to every clock cycle according to the actual operating frequency of the CPU Hardware when the CPU is in the ACPI C0 state. With the existence of these two registers, consider the time ratio of the CPU in three states: ACPI C0 and ACPI C1, C2, and C3, that is, the ratio of time when the CPU is in the working and sleep states. OnDemand governor can accurately calculate the CPU load.

After obtaining the CPU load, the next question is how to select the proper CPU running frequency. As mentioned above, when the system load exceeds the percentage set by up_threshold, OnDemand governor will automatically increase the CPU running frequency? When OnDemand Governor detects that the system load exceeds the percentage set by up_threshold, it indicates that the user needs the CPU to provide more powerful processing capabilities. Therefore, OnDemand Governor Sets the CPU to run at the highest frequency, developers and users in this community have no objection. However, when OnDemand Governor detects a decline in system load, which of the following frequencies can reduce the CPU running frequency? The initial implementation of OnDemand governor is to downgrade to the next available frequency within the available frequency range. For example, the CPU used by the author supports three optional frequencies: 1.67 GHz, 1.33 GHz, and 1 GHz, respectively, if OnDemand Governor finds that the CPU runs at 1.67 GHz, 1.33 GHz is selected as the target frequency for downgrading. The dominant idea of this frequency reduction strategy is to minimize the negative impact on the system performance, so that the system performance will not be quickly reduced in a short time to affect the user experience. However, after the first implementation version of OnDemand governor was released in the Community, a large number of users showed that this fear was actually redundant, onDemand governor can choose a more radical target frequency when reducing the frequency. Therefore, the latest OnDemand governor will choose one-time among all available frequencies to ensure that the CPU runs at a rate greater than 80%, of course, if none of the optional frequencies meet the requirements, the minimum operating frequency supported by the CPU will be selected. The test results of a large number of users show that this new algorithm can achieve more efficient energy saving without affecting the system performance. After the algorithm is improved, the name of OnDemand governor has not changed, and the initial implementation of OnDemand governor has been saved. Due to the conservative nature of the algorithm, the name of OnDemand governor is conservative.

The implementation of the CPU driver supporting intel enhanced speedstep technology was previously discussed when discussing the software structure of cpufreq, cpufreq is designed to separate the policy for CPU conversion from the mechanism, and the governor on the upper layer is responsible for determining the proper CPU operating frequency. However, when Governor decides to adjust the CPU running frequency based on system load changes, a specific driver related to the CPU needs to complete the task of setting the CPU running frequency. Here we will introduce the implementation principle of the CPU driver supporting Intel's latest enhanced speedstep technology, focusing on how to set the frequency conversion of the CPU. In fact, the processor that supports intel enhanced speedstep technology provides users with a very simple programming interface. The CPU running frequency is set through an MSR register named ia32_perf_ctl, there is also an MSR register named ia32_perf_status to check the Running frequency of the CPU. To set the CPU running frequency, you only need to write the specified value to the ia32_perf_ctl MSR register according to the instructions in the Intel development manual.

Summary and future development direction

This article introduces the appearance of the Variable Frequency Technology on the CPU hardware and various problems existing in the initial implementation of the Linux kernel, which leads to the birth of the new kernel subsystem cpufreq. Although the three governors provided by the cpufreq module in the early stage can meet the needs of users to a certain extent and provide certain flexibility, due to the limitations of the CPU hardware technology at that time, there are still many unsatisfactory points. With the continuous development of CPU frequency conversion hardware technology, especially the emergence of Intel enhanced speedstep technology, the original technical barriers were broken, subsequently, the cpufreq kernel subsystem has a brand new and more comprehensive and efficient OnDemand governor.

It is not difficult to see that the CPU freq subsystem in the kernel is due to the emergence of the CPU Hardware frequency conversion technology, and is also developing with the development of the CPU Hardware frequency conversion technology. In fact, this also indicates the future development direction of the cpufreq subsystem in the kernel, that is, continuing to follow the development of the CPU Hardware frequency conversion technology to keep pace with the times. At the end of this article, I will briefly introduce a new technology that supports hardware Frequency Conversion in Intel's latest CPU. As mentioned above, the CPU that supports Intel's latest enhanced speedstep technology provides two MSR registers named ia32_mperf and ia32_aperf respectively, in order to provide direct hardware support for the Governor used by the cpufreq module to dynamically collect the system load. Here, when the CPU is in the ACPI C0 status, the ia32_aperf MSR registers add one to each clock cycle according to the actual operating frequency of the CPU hardware. Intel's latest processor further considers that the CPU may be suspended due to access to memory or I/O, previously, ia32_aperf's simple method of adding one per clock cycle according to the actual CPU running frequency does not fully reflect the CPU load. In the latest intel processor, if the assembly line is stopped, ia32_aperf will temporarily stop accumulating, but will increase in the real "useful" time period for users, in this way, the CPU hardware can provide more accurate system load statistics for the Governor used by the cpufreq module.

 

----------------------------------------------
Overview of embedded excellent courses:
The latest embedded employment training courses, please refer to: http://www.top-e.org/page/pxjy/index.php
Embedded Linux summer training courses: http://www.top-e.org/page/sqb/index.php
Four-day course of Embedded Linux driver and kernel: http://www.top-e.org/page/scb/index.php

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.