Translation: Linux Power Management Architecture

Source: Internet
Author: User

Device Power Management

Copyright (c) 2010 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc.

Copyright (c) 2010 Alan Sternstern@rowland.harvard.edu

 

**************************************** *********************

This article is translated by droidphone from 2011.8.5.

**************************************** *********************

 

 

Most of the Linux source code belongs to the device driver code. Therefore, most of the power management (PM) Code also exists in the driver. Many drivers may only do a small amount of work, and others, such as hardware platforms powered by batteries (mobile phones), will do a lot of work on power management.

 

This document gives a rough description of how the driver interacts with the power management part of the system, especially the sharing of models and interfaces associated with the driver's core, it is recommended that people in Driver-related fields learn background knowledge through this document.

 

Two models of device power management

==========================================

Drivers can use one of the following models to bring devices to a low-power state:

1. System sleep model:

The driver, as a part of it, follows the low power consumption of the system level, like "Suspend" (also called "suspend-to-Ram"), or for a system with a hard disk, you can enter "hibernation" (also called "suspend-to-disk ").

 

In this case, drivers, bus, and device drivers are combined to clearly close hardware devices and software subsystems through various device-specific suspend and resume methods, then, the hardware device is reactivated without data loss.

 

Some drivers can manage wake-up events of hardware, which can leave the system idle. This feature can be implemented through the corresponding/sys/devices /... /power/Wakeup file to enable and disable (for the Ethernet driver, ethtool achieves the same purpose through the ioctl interface); enabling this function may lead to additional power consumption, but he gives the entire system more opportunities to enter the low power consumption state.

 

2. runtime power management model:

This model allows a device to enter low power consumption during system operation. In principle, the device can be independent of other power management activities. However, devices cannot be controlled independently (for example, a parent device cannot enter suspend unless all its sub-devices are in suspend State ). In addition, some special operations may be required based on different bus types. If the device enters a low power state during system operation, special processing is required during system-level power migration (suspend or hibernation.

 

For this reason, not only the device driver itself, the corresponding subsystem (bus type, device type, device class) drivers and power management cores will also be involved in rumtime power management. For example, when the system is sleeping, the above modules must cooperate with each other to implement various suspend and resume methods, so that the hardware can enter the low power consumption state, and then continue to provide services without losing data.

 

We do not have much to say about the definition of low power States, because they are generally system-specific or even specific to a certain device. If the system is running, enough devices are in a low-power state, and the effect is very similar to the system-level low-power state. These drivers can use the rumtime power management to bring the system into a similar state of deep power saving.

 

Most devices that enter the suspend status stop all I/O operations: there are no DMA or IRQ requests (except for those that need to wake up the system), and there is no data read/write, no longer accept upper-layer driver requests. This has different requirements for different bus and platforms.

 

Some examples of hardware wake-up events: Alarms initiated by RTC, arrival of network packets, keyboard or mouse activity, insertion or removal of media (PCMCIA, MMC/SD, USB, and so on ).

 

Interface for entering System sleep status

========================================================== ==============

The kernel provides programming interfaces for subsystems (bus type, device type, device class) and drivers so that they can participate in the Power Management of the devices they care about. These interfaces cover system-level sleep and runtime management.

 

Device power management operation

========================================================== ==============

The device power management operations of subsystems and drivers are defined in the dev_pm_ops structure:

Struct dev_pm_ops {

INT (* prepare) (struct device * Dev );

Void (* Complete) (struct device * Dev );

INT (* suspend) (struct device * Dev );

INT (* resume) (struct device * Dev );

INT (* freeze) (struct device * Dev );

INT (* thaw) (struct device * Dev );

INT (* poweroff) (struct device * Dev );

INT (* restore) (struct device * Dev );

INT (* suspend_noirq) (struct device * Dev );

INT (* resume_noirq) (struct device * Dev );

INT (* freeze_noirq) (struct device * Dev );

INT (* thaw_noirq) (struct device * Dev );

INT (* poweroff_noirq) (struct device * Dev );

INT (* restore_noirq) (struct device * Dev );

INT (* runtime_suspend) (struct device * Dev );

INT (* runtime_resume) (struct device * Dev );

INT (* runtime_idle) (struct device * Dev );

};

This structure is defined in include/Linux/PM. H. Their functions will be described later. Now, remember that the last three methods are specifically used for rumtime pm, while others are used for system-level power status migration.

 

Some Subsystems still have so-called "obsolete" or "traditional" power management operation interfaces. This method does not use the dev_pm_ops structure and is only applicable to system-level power management methods, this article will not describe it here. If you want to know it, please directly view the source code of the kernel.

 

Subsystem-level method

------------------------------------------------

The key method for devices to access suspend and resume is in the PM Member of the bus_type structure, device_type structure, and class structure. It is a pointer of the dev_pm_ops structure. In most cases, these are part of the maintainers who are concerned with the architecture of a specific bus (such as PCI or USB or a certain device category and device category.

 

Bus drivers implement these methods appropriately for hardware and drivers to use, because PCI and USB work in different ways. Only a few people write subsystem-level drivers. Most device drivers are built on code of various bus architectures.

 

For these calls, a more detailed description will be provided later; they will follow the Parent-Child device model tree and one device will be called.

 

/Sys/devices/.../power/Wakeup files

-------------------------------------------------

All devices in the device model have two signs to control the wake-up event (which can cause the device or system to exit the low-power state ). Set the two flags to be initialized by the bus or device driver using device_set_wakeup_capable () and device_set_wakeup_enable (), which are defined in include/Linux/pm_wakeup.h.

 

The "can_wakeup" flag indicates that the device (or driver) supports wake-up events physically. The device_set_wakeup_capable () function affects this flag. "Should_wakeup" indicates whether the device should try to enable its wake-up mechanism. Device_set_wakeup_enable () affects the flag. Most drivers do not modify their values. The initial should_wakeup value of most devices is set to false, and there are exceptions, such as power keys, keyboards, and NICs with the wake-on-lan function set by ethtool.

 

Whether the device is able to send wake-up events is a hardware issue. The kernel is only responsible for continuously tracking the occurrence of these events. On the other hand, whether a device with wake-up capability should initiate a wake-up event is a policy issue. It is managed by the user space through the sysfs attribute file (power/Wakeup. You can write "enabled" or "disabled" to set or clear the shoule_wakeup flag. when reading the file, if the can_wakeup flag is true, the corresponding string is returned, if can_wakeup is false, an empty string is returned, indicating that the device does not support event wake-up. (Note that, despite the null string returned, writing the file still affects the should_wakeup flag)

 

The device_may_wakeup () function returns true only when both flags are true. When the system is migrated to sleep state, the driver should pass this function check before enabling the device to enter the Low Power Consumption State to determine whether to enable the wake-up mechanism. However, in rumtime power management mode, wake-up events are enabled regardless of whether the device and driver are supported or whether the should_wakeup flag is set.

 

/Sys/devices/.../power/Control Files

------------------------------------------------

Each device in the device model has a flag to control whether it belongs to the runtime power management mode. The runtime_auto flag is initialized by the bus type (or other subsystems) using pm_rumtime_allow () or pm_rumtime_forbid. The default value is rumtimpm.

 

The user space can modify the flag by writing "on" or "Auto" to the sysfs file power/control of the device. Writing "Auto" is equivalent to calling pm_rumtime_allow (), allowing the device to perform rumtimepm by the driver. Writing "on" is equivalent to calling pm_rumtime_forbid (). When the flag is cleared, the device returns the full power status from the low power consumption status and stops the device from performing runtime power management. The user space can also read the file to check the current value of runtime_auto.

 

The runtime_auto flag of the device does not affect the migration of the system-level power status. Note that even if the runtime_auto flag is cleared, the device is also in low power when the system-level power status is migrated to sleep.

 

For more information about the runtime power management architecture, see documentation/power/runtime_pm.txt.

 

Call the driver to enter or exit the system sleep status

========================================

When the system goes to sleep state, the system requires the device driver to bring the device into a State compatible with the target system to suspend (suspend) the device. This is usually a "off" state. The specific situation is specific to various systems. In addition, devices that can be awakened usually maintain some functions so that the system can be awakened when appropriate.

 

When the system exits from the low power consumption status, the device driver is asked to restore the device to the full power status. The suspend and resume actions always occur together. The two actions can be divided into multiple stages.

 

For relatively simple drivers, suspend may use the upper-layer class code in the suspend_noirq phase to stop devices and try to make them off. During the wake-up, the corresponding resume call reinitializes the hardware and reactivates their I/O activities.

 

Drivers with special power requirements may make necessary preparations for the device so that wake-up events may occur later.

 

Ensure the callback order

-------------------------------------

When the device enters the suspend or resume, because the devices have a certain bridging relationship, in order to ensure correct access to them, the suspend will be performed in the ascending order of the number of devices, the Resume operation is performed in the top-down order.

 

The order of devices in the number of devices depends on the order of device registration: A sub-device can never be registered, probe, or resume before the parent device, nor can it be removed or suspended after the parent device.

 

The specific strategy is that the number of devices should be consistent with the hardware's bus topology. In particular, this means that when the parent device is being suspended (for example, if it has been selected as the next device by the PM core) or has been suspended, registering a sub-device fails. The device driver must handle this situation correctly.

 

Various phases of system power management

------------------------------------------------

Suspend and resume are completed in stages. Standby, sleep (suspend-to-Ram), and hibernation (suspend-to-disk) use different stages. Before entering the next stage, you must call the callback function of this stage for each device. Not all bus and device classes support all these callbacks, and not all drivers use these callbacks. In some stages, the process must be frozen and executed before it is restored. In addition, the * _ noirq stage must be executed when IRQ is disabled (unless they are marked by irq_wakeup ).

 

In most stages, callback using bus, type, and class (that is, defined in Dev-> bus-> PM, dev-> type-> PM and Dev-> class-> PM ). However, the prepare and complete phases are an exception. They only use bus callbacks. When multiple Callbacks are to be executed in a stage, they are called in the following sequence: <class, type, bus>, and resume: <bus, type, class>. For example, the following call sequence is executed during suspend:

Dev-> class-> PM. Suspend (Dev );

Dev-> type-> PM. Suspend (Dev );

Dev-> bus-> PM. Suspend (Dev );

On the contrary, in the resume phase, before moving to the next device, the PM core performs the following callback on the current device:

Dev-> bus-> PM. Resume (Dev );

Dev-> type-> PM. Resume (Dev );

Dev-> class-> PM. Resume (Dev );

These callbacks can, in turn, call the device or driver-specific methods through Dev-> driver-> PM, but this is not required.

 

Suspend)

------------------------------------

When the system enters the standby or sleep status, it must go through the following stages:

Prepare, suspend, suspend_noirq.

1.PrepareThe phase mainly prevents the occurrence of abnormal state by blocking the registration of new devices. If you want to register a sub-device at this time, the PM core will not know that all sub-devices of a device have been suspend. (On the contrary, the device can be canceled at any time .) Unlike other stages of suspend, the Device Tree of the prepare stage scans from top to bottom.

 

In the prepare stage, only the callback of bus is used. After the callback is returned, the device cannot register a new sub-device. The callback method also prepares the device or driver for the system power status to be migrated, but it should not enable the device to enter the low-power state.

 

2.SuspendThe phase is implemented by the suspend callback, which stops all I/O operations on the device. It can also store device registers and enable devices to enter a suitable low-power state based on the type of bus to which the device belongs and wake up events.

 

3.Suspend_noirqThe stage occurs after IRQ is disabled, which means that the Interrupt Processing code of the driver will not be called during the callback operation. The callback method can save the registers not saved in the previous stage and finally bring the device into the corresponding low-power state.

Most subsystems (subsystems) and drivers do not need to implement this callback. However, some bus types that allow devices to share interrupt vectors, such as PCI, usually require this callback; otherwise, when the device is in low power consumption, another device that shares the interruption with it perceives the interruption, the driver will get wrong.

 

After these stages are completed, the driver must stop all I/O transactions (DMA, irqs ), store sufficient status information so that they can be reinitialized or restored to the previous status (depending on your needs), and then enable the device to enter the low-power state. On many platforms, they turn off some of the clock; sometimes they turn off the power or reduce the voltage. (Drivers supporting rumtime PM may have completed some or all of the steps in advance .)

If device_may_wakeup (Dev) returns true, the device is prepared to generate a hardware wake-up signal to trigger a system wake-up event to wake up the system that has entered the sleep state. For example, enable_irq_wakeup () can capture a gpio connected to a switch or external hardware, while pci_enable_wake () responds to signals such as pci pme.

 

As long as one of these callbacks returns an error, the system will not enter the described low-power state, but the PM core will initiate a resume action to the suspend device.

 

Exit System suspension (resume)

----------------------------------------

When the system exits the standby or sleep status, it must go through the following phases:

Resume_noirq, resume, and complete.

1.Resume_noirqThe callback method should execute all the required actions before the interrupt handler is called. This usually means revoking the action in the suspend_noirq phase. If the bus type permits shared interrupt vectors, such as PCI, this callback method should enable devices and drivers to identify whether they are the source of the interrupt, and if so, they must be able to handle it correctly.

For example, for the PCI bus, bus-> PM. resume_noirq () enables the device to enter the full power (D0 in PCI) and replies to the standard configuration register of the device. Then, call the PM. resume_noirq () method of the device driver to perform device-specific actions.

 

2.ResumeThe callback method allows the device to return to its working status so that it can perform normal I/O operations. This is usually equivalent to the suspend withdrawal.

 

3.CompleteThe phase only uses the callback of bus. This method should cancel the action made in the prepare stage. However, please note that the new device may be registered immediately after the resume callback is returned, instead of waiting until the complete stage is complete.

 

After these stages, the driver should be the same as before suspend: I/O can be executed through DMA or irqs, and the corresponding clock is turned on. Even though the device has been in a low-power state since the runtime PM is already in use before the system goes to sleep, the device should still return to the full-power state. There are many reasons for this. For more information, see documentation/power/runtime_pm.txt.

 

However, in the future, it will be specific to the platform. For example, some systems support multiple "run" states. The mode after resume may be different from that before suspend. It may be a change in the clock or power supply, which can easily affect how the driver works.

 

The driver needs to be able to process the hardware reset after the suspend callback is called, for example, it needs to be completely reinitialized. This may be the most difficult part, and the Implementation Details may be protected by documents such as NDA and chip errata. The simplest case is that the hardware status has not changed since suspend was executed. This is not guaranteed (in fact, this is usually not true ).

 

Regardless of whether it is physically possible, the driver should also prepare to be notified that the device is removed during system power-down. In Linux, PCMCIA, MMC, USB, Firewire, SCSI, and even IDE are examples that can be removed. The specific information about how the driver is notified and how to handle such a removal event is bus-specific and usually has a separate thread for processing.

 

Go to hibernation

--------------------------------------

Omitting ......................................

 

Exit Hibernation

--------------------------------------

Omitting ......................................

 

System Device

----------------------------------------

System devices (sysdevs) follow a slightly different API, which can be found in the following files:

Include/Linux/sysdev. h

Drivers/base/sys. c

System devices must be suspend when they are disconnected and shut down. Other devices must be suspended before being resume, of course, it is also in the case of Guanzhong disconnection. These actions are specific to the "sysdev_driver" phase, which takes effect only on system devices.

 

Therefore, after the suspend_noirq (freeze_noirq, poweroff_noirq) stage, when CPUs of non-boot are disabled and irqs of the remaining CPU are also disabled, at this time, the sysdev_driver.suspend phase will be started, and the system enters the sleep state (for hibernation, the system image is created ). The sequence during the resume period is: sysdev_driver.resume stage execution, enable IRQ of the CPU used for startup, enable other non-startup CPUs, and then start the resume_noirq stage.

 

Code that actually enters and exits the system-level low-power status sometimes calls some boot firmware (bios? Bootloader ?) And then keep the CPU running a software (from Ram or flash) to monitor the system and manage the wake-up sequence.

 

Suspend status

------------------------------------------

The device's low power consumption status is not standard. One device can only process "on" and "off", but the other device may support a dozen different versions of "on" (How many engines are activated ?), In addition, it can return to the "On" State faster than "off.

 

Some buses define rules for different suspend states. An example of PCI is provided: After the suspend sequence is complete, a non-traditional (non-Legacy) del PCI device cannot execute DMA or issue irqs, in addition, the wake-up event must be sent through the PME # bus signal. Several PCI standard device statuses are also defined, some of which can only be used as options.

 

On the contrary, high-integration SOC processors often use irqs as the wake-up source (so the driver must call enable_irq_wake (), and the DMA interrupt can be used as the wake-up event (sometimes the DMA can remain active, only CPU and some peripherals go to sleep ).

 

Some details can be specific to the platform. In some sleep states, some devices in the system can be activated. For example, when the system has mild sleep, the LCD display will continue to be refreshed using DMA, frame Buffer may even have a DSP or another non-Linux CPU to refresh, but the CPU running Linux can be in the idle state.

 

In addition, depending on the status of different target systems, some special things may happen. Some target system statuses allow many device operations, while others may require hard shutdown and re-initializing during resume. In addition, two different target systems can use the same device in different ways; like the LCD mentioned above, it can remain active under a product's "standby, however, different products using the same SOC may have different ways of working.

 

Power Management notification message

------------------------------

Some operations cannot be carried out in the Power Management callback method discussed above, because the callback occurs too late or too early. To handle these situations, subsystems and drivers can register power management notifications to call an operation before or after a process is frozen. In general, the PM notification mechanism is suitable for executing activities that can be used by the user space, or at least not interfering with the activities in the user space.

 

For more information, see documentation/power/notifiers.txt.

 

Runtime Power Management

======================================

Many devices can be turned off dynamically when the system is running. This feature is especially useful for unused devices and allows the running system to save energy more efficiently. These devices usually support a certain range of runtime Power states, such as "off", "Sleep", "idle", "active", etc, these statuses are sometimes constrained by the bus used by the device, and usually contain the hardware States used for system-level sleep.

 

System-level power status migration can begin when some devices enter low power consumption due to rumtimpm. System sleep PM callbacks should be able to identify this situation and reactivate them in appropriate ways, but these actions are specific to various subsystems.

 

Sometimes this is determined by the level of the subsystem, and sometimes the device driver determines it. When the system-level power status is migrated, the suspend device can be retained, in other cases, the device may temporarily return to the full power status, for example, to prevent it from awakening the system. These are all dependent on the design of specific hardware and subsystems, and are a concern for drivers.

 

When the system wakes up from sleep, it is best to bring the device back to full power. For more information, see documentation/power/runtime_pm.txt. This document provides more detailed discussions on these issues and explains the general architecture of runtime power management.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.