Linux Dynamic Frequency adjustment system cpufreq 2: Core Architecture and API

Source: Internet
Author: User

In the previous section, we roughly explained the sysfs interface and several important data structures of cpufreq in the user space, the cpufreq subsystem organizes some common code logic into the core of cpufreq, which provides necessary APIs to cpufreq and other kernel modules, such as cpufreq_governor, cpufreq_driver and other modules use these APIs to complete a complete cpufreq system. In this section, we will discuss the code architecture of the core architecture and how to use these public API interfaces.

/*************************************** **************************************** **********************/

Statement: the content of this blog is created at http://blog.csdn.net/droidphone. please refer to it for help. Thank you!

/*************************************** **************************************** **********************/

The core code is in/Drivers/cpufreq. C. In this series of articles, the kernel version I used is 3.10.0.

1. initialization of the cpufreq Subsystem

Let's take a look at the specific code:

static int __init cpufreq_core_init(void){        int cpu;        if (cpufreq_disabled())                return -ENODEV;        for_each_possible_cpu(cpu) {                per_cpu(cpufreq_policy_cpu, cpu) = -1;                init_rwsem(&per_cpu(cpu_policy_rwsem, cpu));        }        cpufreq_global_kobject = kobject_create_and_add("cpufreq", &cpu_subsys.dev_root->kobj);        BUG_ON(!cpufreq_global_kobject);        register_syscore_ops(&cpufreq_syscore_ops);        return 0;}core_initcall(cpufreq_core_init);

It can be seen that in the system startup phase, through the initcall mechanism, cpufreq_core_init is called to complete the initialization of the core part, where:

Cpufreq_policy_cpuIt is a per_cpu variable. In the SMP system, each CPU can have its own independent FM policy, or all CPUs can use a policy, at this time, one of the CPUs may manage a policy, and the other CPUs also use the same policy, so the policies of these CPUs are managed by the management CPU, the per_cpu variable is used to record the policy of each CPU, which is actually managed by that CPU. -1 indicates that policy management has not started yet.

The following kobject_create_and_add function establishes a cpufreq node under the/sys/devices/system/CPU node. Some configuration parameters of the current governor will be placed below the node. The cpu_subsys parameter is a global variable of the kernel and is initialized at the early stage. The code is in drivers/base/CPU. C:

struct bus_type cpu_subsys = {        .name = "cpu",        .dev_name = "cpu",};EXPORT_SYMBOL_GPL(cpu_subsys);void __init cpu_dev_init(void){        if (subsys_system_register(&cpu_subsys, cpu_root_attr_groups))                panic("Failed to register CPU subsystem");        cpu_dev_register_generic();}

This will create a CPU bus with all the CPUs in the system hanging under the bus. The root directory of the CPU bus device is located at:/sys/devices/system/CPU, a cpu bus Node also appears under/sys/bus. The CPU 0, cpu1,... cpux node appears in the root directory of the CPU bus device. Each CPU corresponds to one of the device nodes. The cpufreq subsystem uses cpu_subsys to obtain the CPU devices in the system and create the corresponding cpufreq object under these CPU devices. We will discuss this later.

In this case, the initialization of the cpufreq subsystem does not actually do anything important, but only initializes several per_cpu variables and creates a cpufreq file node. Is the sequence diagram of the initialization process:

Figure 1.1 core layer Initialization

2. Register cpufreq_governor

Multiple Governor policies can exist in the system at the same time. A policy is associated with a Governor through the Governor pointer in the cpufreq_policy structure. To enable a governor to be used by the policy, first register the Governor to the core of cpufreq. We can complete the registration through the APIS provided by the core layer:

int cpufreq_register_governor(struct cpufreq_governor *governor){        int err;        ......        governor->initialized = 0;        err = -EBUSY;        if (__find_governor(governor->name) == NULL) {                err = 0;                list_add(&governor->governor_list, &cpufreq_governor_list);        }        ......        return err;}

The core layer defines a global linked list variable: cpufreq_governor_list. The registration function first queries whether the governor has been registered by using the _ find_governor () function based on the Governor name, if not, add the struct representing the Governor to the cpufreq_governor_list linked list. As mentioned in the previous article, the current kernel version provides five types of Governor for our use. We can use the Kernel configuration handler to select the Governor to be compiled, specify a default governor. In cpufreq. in H, the cpufreq_default_governor macro is directed to the address of the default Governor struct variable based on the selection of the configuration handler. In the registration stage, you need to use this macro to set the Governor used by the system by default.

3. Register a cpufreq_driver driver

Unlike governor, there will only be one cpufreq_driver driver in the system. According to one of the previous Linux Dynamic Frequency adjustment systems cpufreq: Overview, cpufreq_driver is platform-related, it is responsible for the adjustment of the final frequency, and the strategy for selecting the working frequency is completed by Governor. Therefore, you only need to register a cpufreq_driver in the system. It is only responsible for knowing how to control the Clock System of the platform and thus setting the working frequency determined by Governor. Registering the cpufreq_driver driver will trigger a series of additional initialization actions on the cpufreq core. The core initialization work described in section 1 is very simple. In fact, more initialization actions are completed in the registration of cpufreq_driver. The core provides an API: cpufreq_register_driver to complete registration. Next we will analyze the working process of this function:

int cpufreq_register_driver(struct cpufreq_driver *driver_data){        ......        if (cpufreq_disabled())                return -ENODEV;        if (!driver_data || !driver_data->verify || !driver_data->init ||            ((!driver_data->setpolicy) && (!driver_data->target)))                return -EINVAL;

This API has only one parameter: A cpufreq_driver pointer, driver_data, which is defined in the driver code in advance and passed in as a parameter when calling this API. The function first checks whether the system currently disables the FM function, and then checks whether several callback functions of cpufreq_driver are implemented. The Code shows that verify and init callback functions must be implemented, the setpolicy and target callback must be implemented at least one of them. For more information about the functions of these callbacks, see the first article in this series. Next:

 write_lock_irqsave(&cpufreq_driver_lock, flags);        if (cpufreq_driver) {                write_unlock_irqrestore(&cpufreq_driver_lock, flags);                return -EBUSY;        }        cpufreq_driver = driver_data;        write_unlock_irqrestore(&cpufreq_driver_lock, flags);

Check whether the global variable cpufreq_driver has been assigned a value. If not, the input parameter is assigned to the global variable cpufreq_driver. This ensures that only one cpufreq_driver driver is registered in the system. Then:

        ret = subsys_interface_register(&cpufreq_interface);                ......        ......         register_hotcpu_notifier(&cpufreq_cpu_notifier);

Use subsys_interface_register to create a cpufreq_policy for each CPU, and register the CPU Hot Plug notification so that when the CPU is hot plug, it can dynamically process the relationship between various CPU policies (for example, the CPU responsible for migration management ). Here we will focus on the subsys_interface_register process and return to section 1. We know that in the initialization phase, cpu_subsys is created, so that each CPU will create its own device under the CPU bus device: sys/devices/system/CPU/cpux. Subsys_interface_register registers public interfaces under the sub-device of the cpu_subsys sub-system. Let's take a look at the definition of the cpufreq_interface parameter:

static struct subsys_interface cpufreq_interface = {        .name           = "cpufreq",        .subsys         = &cpu_subsys,        .add_dev        = cpufreq_add_dev,        .remove_dev     = cpufreq_remove_dev,};

The code of the subsys_interface_register function will not be expanded. Its general function is to traverse every sub-device under the sub-system, and then use this sub-device as the parameter to call the add_dev callback function in the cpufrq_interface structure, the callback function is directed to cpufreq_add_dev. The method of working is discussed in the next section.

After the driver is registered, the driver is saved in the global variable cpufreq_driver for the core layer to use. At the same time, each CPU also establishes its own policy, and Governor starts to work, monitors the CPU load in real time and calculates the appropriate operating frequency, and then adjusts the actual operating frequency through the driver. It is a sequence diagram of the cpufreq_driver registration process:

Figure 3.1 registration process of cpufreq_driver

4. Set a frequency adjustment policy for each CPU)

The subsys_interface_registe function of the cpufreq_driver phase is actually registered for each CPU. As mentioned in the previous section, this function will eventually call the cpufreq_add_dev callback function. Now let's analyze this function:

Because subsys_interface_registe will enumerate various CPU devices, no matter whether the CPU is offline or online, cpufreq_add_dev will be called, so the function will return directly if the CPU is offline at the beginning.

static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif){        ......        if (cpu_is_offline(cpu))                return 0;

In the SMP system, the policy of the CPU may use the same policy with other CPUs and entrust the policy to another CPU called CPU management for management. The following Code determines this situation, if other CPUs have been delegated for management, the system returns directly. The core layer defines another per_cpu variable: cpufreq_cpu_data, which is used to save the pointer to the cpufreq_policy structure used by each CPU, the cpufreq_cpu_get function obtains the pointer through the per_cpu variable. if the pointer is not 0, indicates that the CPU has established its own policy (which may be created together with the policy set up by its previous CPU management ).

        policy = cpufreq_cpu_get(cpu);        if (unlikely(policy)) {                cpufreq_cpu_put(policy);                return 0;        }

During CPU Hot Plug, cpufreq_add_dev will also be called. The following code snippet checks whether the CPU has been hot-unpluged. If yes, find one of the related CPUs (these related CPUs are delegated to the same managed CPU for management, and call the cpufreq_add_policy_cpu function. This function just creates a cpufreq link, link to the CPU freq node that manages the CPU.

       for_each_online_cpu(sibling) {                struct cpufreq_policy *cp = per_cpu(cpufreq_cpu_data, sibling);                if (cp && cpumask_test_cpu(cpu, cp->related_cpus)) {                        read_unlock_irqrestore(&cpufreq_driver_lock, flags);                        return cpufreq_add_policy_cpu(cpu, sibling, dev);                }        }

When cpufreq_add_dev is called for the first time in the system initialization phase (the first CPU that is enumerated by subsys_interface_register, usually cpu0), cpufreq_cpu_data should be null, so we need to allocate a cpufreq_policy structure for such CPU, initialize the CPU managed by this policy, including the online CPUs field and the online + offline cpu_related field, and set yourself as the CPU managed by this policy, use the default governor to initialize the policy-> Governor field, and add it to the CPUs field of online:

        policy = kzalloc(sizeof(struct cpufreq_policy), GFP_KERNEL);        if (!policy)                goto nomem_out;        if (!alloc_cpumask_var(&policy->cpus, GFP_KERNEL))                goto err_free_policy;        if (!zalloc_cpumask_var(&policy->related_cpus, GFP_KERNEL))                goto err_free_cpumask;        policy->cpu = cpu;        policy->governor = CPUFREQ_DEFAULT_GOVERNOR;        cpumask_copy(policy->cpus, cpumask_of(cpu));        /* Initially set CPU itself as the policy_cpu */        per_cpu(cpufreq_policy_cpu, cpu) = cpu;

Next, initialize a synchronization variable used for logging out of the kobject system and a workqueue. In some cases, you cannot immediately update the policy. You can use this workqueue to delay the execution.

        init_completion(&policy->kobj_unregister);        INIT_WORK(&policy->update, handle_update);

Next, call the init callback of cpufreq_driver to further initialize the policy:

       ret = cpufreq_driver->init(policy);        if (ret) {                pr_debug("initialization failed\n");                goto err_set_policy_cpu;        }

In the initialization of the above driver, you should do the following:

  • Set the maximum and minimum operating frequency of the CPU.
  • Set the maximum and minimum operating frequency of the policy.
  • Set the frequency shift that this policy can adjust
  • Set the latency of CPU adjustment frequency
  • Number of CPUs that can be managed by this policy (Policy-> CPUs)
Continue:
        /* related cpus should atleast have policy->cpus */        cpumask_or(policy->related_cpus, policy->related_cpus, policy->cpus);

The note has been clearly written. Add the online CPU to the related field representing online + offline. Next, remove the offline CPU:

        cpumask_and(policy->cpus, policy->cpus, cpu_online_mask);

Then, the cpufreq_start notification is sent:

        blocking_notifier_call_chain(&cpufreq_policy_notifier_list,                                     CPUFREQ_START, policy);

If it is the CPU added by hot-plug, find the governor it used last time:

#ifdef CONFIG_HOTPLUG_CPU        gov = __find_governor(per_cpu(cpufreq_cpu_governor, cpu));        if (gov) {                policy->governor = gov;                pr_debug("Restoring governor %s for cpu %d\n",                       policy->governor->name, cpu);        }#endif

Finally, set up the sysfs file node: cpufreq under the CPU device. The complete path is/sys/devices/system/CPU/cpux/cpufreq, the corresponding sysfs node is also created. For details about the node, refer to the first article in this series: One of the Linux Dynamic Frequency adjustment system cpufreq: Overview:

       ret = cpufreq_add_dev_interface(cpu, policy, dev);

So far, a CPU policy has been set up, its frequency restrictions, the Governor policy used, and the sysfs file nodes have been set up. Note that the number of CPUs in the system and the number of times the cpufreq_add_dev function is called. At last, each CPU will establish its own policy. Of course, it is also possible that only some CPUs have established real policies, while other CPUs entrust these CPUs to manage policies. In this regard, it may be a bit difficult to read the code at the beginning, to clarify the relationship between them, let's look at the cpufreq_add_dev_interface function again:

static int cpufreq_add_dev_interface(unsigned int cpu,                                     struct cpufreq_policy *policy,                                     struct device *dev){        ......        /* prepare interface data */        ret = kobject_init_and_add(&policy->kobj, &ktype_cpufreq,                                   &dev->kobj, "cpufreq");        ......        /* set up files for this cpu device */        drv_attr = cpufreq_driver->attr;        while ((drv_attr) && (*drv_attr)) {                ret = sysfs_create_file(&policy->kobj, &((*drv_attr)->attr));                if (ret)                        goto err_out_kobj_put;                drv_attr++;        }

At the beginning of the function, create a cpufreq file node, and then create a series of nodes under it. You can control some parameters of the policy through these file nodes. This is not our focus. Let's look at the following code:

        for_each_cpu(j, policy->cpus) {                per_cpu(cpufreq_cpu_data, j) = policy;                per_cpu(cpufreq_policy_cpu, j) = policy->cpu;        }

The previous Code has already set the online CPU managed by this policy: Policy-> CPUs. With two per_cpu variables, the policy of each online CPU is set to the current CPU (CPU management) policy, and all online CPU management CPUs are also specified as the current CPU. Then, cpufreq_add_dev_symlink is called. All CPU resources specified by the policy-> CPUs will establish a cpufreq link pointing to the real cpufreq node of the current CPU (CPU management:

        ret = cpufreq_add_dev_symlink(cpu, policy);

Note: If the CPU is cpu0, that is to say, CPU freq_add_dev of other CPUs is not called yet, but in cpufreq_cpu_data, the corresponding policy pointer has been assigned the policy corresponding to cpu0. In this way, return to the beginning of the cpufreq_add_dev function, when other CPUs that are considered to use cpu0 to host their policies will also enter the cpufreq_add_dev function, however, because the policy corresponding to cpufreq_cpu_data has been assigned a value during the setup phase of cpu0, therefore, these CPUs will not go through all the processes. In the judgment part at the beginning of the function, the system will directly return the result of judging that the policy corresponding to the CPU in cpufreq_cpu_data has been assigned a value. Next, let's look at the code of cpufreq_add_dev_interface:

        memcpy(&new_policy, policy, sizeof(struct cpufreq_policy));        /* assure that the starting sequence is run in __cpufreq_set_policy */        policy->governor = NULL;        /* set default policy */        ret = __cpufreq_set_policy(policy, &new_policy);        policy->user_policy.policy = policy->policy;        policy->user_policy.governor = policy->governor;

Use the _ cpufreq_set_policy function to make the policy take effect. At this point, the policy for each CPU has been set up and started to work. The code for _ cpufreq_set_policy is not expanded here. I only show its sequence diagram:

Figure 4.1 set a cpufreq_policy5. Other APIs

In addition to the Governor registration and cpufreq_driver APIs discussed in the previous sections, the core layer of cpufreq also provides other auxiliary APIs to facilitate the use of other modules.

  • Int cpufreq_register_notifier (struct notifier_block * NB, unsigned int list );
  • Int cpufreq_unregister_notifier (struct notifier_block * NB, unsigned int list );

The above two APIs are used to register and deregister the notification message of the cpufreq system. The second parameter can be set to the notification type, which can be either of the following:

  • Cpufreq_transition_notifier receives Frequency Change Notification
  • Cpufreq_policy_notifier receives the policy update notification.
  • Int cpufreq_driver_target (struct cpufreq_policy * policy,
    Unsigned int target_freq,
    Unsigned int relation );
  • Int _ cpufreq_driver_target (struct cpufreq_policy * policy,
    Unsigned int target_freq,
    Unsigned int relation );
The preceding two APIs are used to set the CPU operating frequency. The difference is that cpufreq_driver_target is a locked version, and _ cpufreq_driver_target is a non-locked version. If it is determined that it is in the context of Governor, use a version without a lock. Otherwise, use a version with a lock.
  • Void cpufreq_verify_within_limits (struct cpufreq_policy * policy, unsigned int min, unsigned int max );
This API is used to check and reset the maximum and minimum frequency of the policy.
  • Int cpufreq_update_policy (unsigned int CPU );
This API is used to trigger the policy update operation on the CPU freq core.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.