目錄:
一、電源管理框架
? ? ? 1.1 電源狀態(tài)管理
? ? ? 1.2 省電管理
? ? ? 1.3 電源管理質量
二、睡眠與休眠
? ? ? 2.1 凍結進程
? ? ? 2.2 睡眠流程
? ? ? 2.3 休眠流程
? ? ? 2.4 自動睡眠
三、關機與重啟
? ? ? 3.1 用戶空間處理
? ? ? 3.2 內核處理
四、CPU動態(tài)調頻
? ? ? 4.1 CPUFreq Core
? ? ? 4.2 Govener介紹
? ? ? 4.3 Driver介紹
五、CPU休閑
? ? ? 5.1 CPUIdle Core
? ? ? 5.2 決策者介紹
? ? ? 5.3 Driver介紹
六、電源管理質量
? ? ? 6.1 系統(tǒng)級約束
? ? ? 6.2 設備級約束
七、總結回顧
? 一、電源管理框架 ? ?
計算機運行在物理世界中,物理世界中的一切活動都需要消耗能量。能量的形式有很多種,如熱能、核能、化學能等。計算機消耗的是電能,其來源是電池或者外電源。計算機內部有一個部件叫做電源管理芯片(PMIC),它接收外部的電能,然后轉化為不同電壓的電流,向系統(tǒng)的各個硬件供電。什么硬件需要多少伏的電壓,都是由相應的電氣標準規(guī)定好了的,各個硬件廠商按照標準生成硬件就可以了。上電的過程是由硬件自動完成的,不需要軟件的參與。因為硬件不上電的話,軟件也沒法運行啊。但是當硬件運行起來之后,軟件就可以對硬件的電源狀態(tài)進行管理了。電源管理的內容包括電源狀態(tài)管理和省電管理。電源狀態(tài)管理是對整個系統(tǒng)的供電狀態(tài)進行管理,內容包括睡眠、休眠、關機、重啟等操作。省電管理是因為電能不是免費的,我們應該盡量地節(jié)省能源,尤其是對于一些手持設備來說,電能雖然并不昂貴但是卻非常珍貴,因為電池的容量非常有限。不過省電管理也不能一味地省電,還要考慮性能問題,在性能與功耗之間達到平衡。
1.1 電源狀態(tài)管理
計算機只有開機之后才能使用,但是我們并不是一直都在使用計算機。當我們短時間不使用計算機時,可以把它置入睡眠或者休眠狀態(tài),這樣可以省電,而且當我們想使用時還可以快速地恢復到可用狀態(tài)。當我們長時間不使用計算機時,就可以把它關機,這樣更省電,當然再使用它時還需要重新開機。有時候我們覺得系統(tǒng)太卡或者系統(tǒng)狀態(tài)不對的時候,還可以對計算機進行重啟,讓系統(tǒng)重新恢復到一個干凈穩(wěn)定的狀態(tài)。
睡眠(Sleep)也叫做Suspend to RAM(STR),掛起到內存。休眠(Hibernate)也叫做Suspend to Disk(STD)。有時候我們會把睡眠叫做掛起(Suspend),但是有時候我們也會把睡眠和休眠統(tǒng)稱為掛起(Suspend)。系統(tǒng)睡眠的時候會把系統(tǒng)的狀態(tài)信息保存到內存,然后內存要保持供電,其它設備都可以斷電。系統(tǒng)休眠的時候會把系統(tǒng)的狀態(tài)信息保存到磁盤,此時整個系統(tǒng)都可以斷電,就和關機一樣。系統(tǒng)無論睡眠還是休眠,都可以被喚醒。對于睡眠來說很多外設都可以喚醒整個系統(tǒng),比如鍵盤。對于休眠來說,就只有電源按鈕能喚醒系統(tǒng)了。休眠一方面和睡眠比較像,都保存了系統(tǒng)的狀態(tài)信息,一方面又和關機比較像,整個系統(tǒng)都斷電了。
重啟和關機的關系比較密切,重啟相當于是關機再開機。二者都是用reboot系統(tǒng)調用來實現的,其參數cmd用來指定是關機還是重啟。關機和重啟是需要init進程來處理的,無論我們是使用命令還是使用系統(tǒng)的關機按鈕還是直接按電源鍵,事件最終都會被傳遞給init進程。Init接收到關機或重啟命令后,會進行一些保存處理,然后停止所有的服務進程、殺死所有的普通進程,最后調用系統(tǒng)調用reboot進行關機或者重啟。
1.2 省電管理
我們不使用電腦時可以進行睡眠、休眠甚至關機來進行省電,但是我們使用電腦時也可以有很多辦法來省電。這些省電方法又可以分為兩類,使用省電和閑暇省電。閑暇省電是指計算機在宏觀上整體上還在使用,但是在微觀上局部上有的設備暫時不在使用。使用省電的方法就是動態(tài)調頻,包括CPU動態(tài)調頻(CPUFreq)和設備動態(tài)調頻(DevFreq)。你正在使用著還想要省電,那唯一的方法就是降低頻率了。降低頻率就會降低性能,所以還要考慮性能,結合當時的負載進行動態(tài)調頻。閑暇省電的方法就比較多了,包括CPU休閑(CPUIdle)、CPU熱插拔(CPU Hotplug)、CPU隔離(Core Isolate)和動態(tài)PM(Runtime PM)。CPUIdle指的是當某個CPU上沒有進程可調度的時候可以暫時局部關掉這個CPU的電源,從而達到省電的目的,當再有進程需要執(zhí)行的時候再恢復電源。CPU Hotplug指的是我們可以把某個CPU熱移除,然后系統(tǒng)就不會再往這個CPU上派任務了,這個CPU就可以放心地完全關閉電源了,當把這個CPU再熱插入之后,就對這個CPU恢復供電,這個CPU就可以正常執(zhí)行任務了。CPU隔離指的是我們把某個CPU隔離開來,系統(tǒng)不再把它作為進程調度的目標,這樣這個CPU就可以長久地進入Idle狀態(tài)了,達到省電的目的。不過CPU隔離并不是專門的省電機制,我們把CPU隔離之后還可以通過set_affinity把進程專門遷移到這個CPU上,這個CPU還會繼續(xù)運行。CPU隔離能達到一種介于CPUIdle和CPU熱插拔之間的效果。Runtime PM指的是設備的動態(tài)電源管理,系統(tǒng)中存在很多設備,但是并不是每種設備都在一直使用,比如相機可能在大部分時間都不會使用,所以我們可以在大部分時間把相機的電源關閉,在需用相機的時候,再給相機供電。
1.3 電源管理質量
省電管理可以達到省電的目的,但是也會降低系統(tǒng)的性能,包括響應延遲、帶寬、吞吐量等。所以內核又提供了一個PM QoS框架,QoS是Quality Of Service(服務質量)。PM QoS框架一面向顧客提供接口,顧客可以通過這些接口對系統(tǒng)的性能提出要求,一面向各種省電機制下發(fā)要求,省電機制在省電的同時也要滿足這些性能要求。PM QoS的顧客包括內核和進程:對于內核,PM QoS提供了接口函數可以直接調用;對于進程,PM QoS提供了一些設備文件可以讓用戶空間進行讀寫。PM QoS對某一項性能指標的要求叫做一個約束,約束分為系統(tǒng)級約束和設備級約束。系統(tǒng)級約束針對的是整個系統(tǒng)的性能要求,設備級約束針對的是某個設備的性能要求。
下面我們畫個圖總結一下電源管理:
? 二、睡眠與休眠 ? ?
睡眠和休眠的整體過程是相似的,都是暫停系統(tǒng)的運行、保存系統(tǒng)信息、關閉全部或大部分硬件的供電,當被喚醒時的過程正好相反,先恢復供電,然后恢復系統(tǒng)的運行,再恢復之前保存的信息,然后就可以正常使用了。暫停系統(tǒng)運行包括以下操作:同步文件數據到磁盤、凍結幾乎所有進程、暫停devfreq和cpufreq、掛起所有設備(調用所有設備的suspend函數)、禁用大部分外設的中斷、下線所有非當前CPU。對于睡眠來說,內存是不斷電的,所以不用保存信息。對于休眠來說整個系統(tǒng)是要斷電的,所以要把很多系統(tǒng)關鍵信息都保存到swap中。然后系統(tǒng)就可以斷電進入睡眠或者休眠狀態(tài)了。對于睡眠來說有很多外設都可以喚醒系統(tǒng),對于休眠來說只有電源鍵能喚醒系統(tǒng)。當系統(tǒng)被喚醒時就開始了恢復操作,睡眠的恢復和休眠的恢復操作是不太一樣的。睡眠基本上是上面操作的反操作,休眠是先正常啟動,然后在啟動的末尾從swap區(qū)恢復狀態(tài)信息。
2.1 凍結進程
睡眠和休眠都有凍結進程的流程,我們就先來看一看凍結進程的過程。凍結進程是先凍結普通進程,再凍結內核進程,其中有些特殊進程不凍結,當前進程不凍結。凍結的方法是先把一個全局變量pm_freezing設置為true,然后給每個進程都發(fā)送一個偽信號,也就是把所有進程都喚醒。進程喚醒之后會運行,在其即將返回用戶空間時會進行信號處理,在信號處理的流程中,會先進行凍結檢測,如果發(fā)現pm_freezing為true而且當前進程也不是免凍進程,那么就會凍結該進程。凍結方法也很簡單,就是把進程的運行狀態(tài)設置為不可運行,然后調度其它進程。
下面我們看一下凍結的流程,代碼進行了極度刪減,只保留最關鍵的部分。
linux-src/kernel/power/process.c
?
int freeze_processes(void){ pm_freezing = true; try_to_freeze_tasks(true);} static int try_to_freeze_tasks(bool user_only){ for_each_process_thread(g, p) { freeze_task(p) }}
?
linux-src/kernel/freezer.c
?
bool freeze_task(struct task_struct *p){ fake_signal_wake_up(p);} static void fake_signal_wake_up(struct task_struct *p){ unsigned long flags; if (lock_task_sighand(p, &flags)) { signal_wake_up(p, 0); unlock_task_sighand(p, &flags); }}
?
linux-src/arch/x86/kernel/signal.c
?
void arch_do_signal_or_restart(struct pt_regs *regs, bool has_signal){ struct ksignal ksig; if (has_signal && get_signal(&ksig)) { /* Whee! Actually deliver the signal. */ handle_signal(&ksig, regs); return; }}
?
linux-src/kernel/signal.c
?
bool get_signal(struct ksignal *ksig){ try_to_freeze();}
?
linux-src/include/linux/freezer.h
?
static inline bool try_to_freeze(void){ return try_to_freeze_unsafe();} static inline bool try_to_freeze_unsafe(void){ if (likely(!freezing(current))) return false; return __refrigerator(false);} static inline bool freezing(struct task_struct *p){ if (likely(!atomic_read(&system_freezing_cnt))) return false; return freezing_slow_path(p);}
?
linux-src/kernel/freezer.c
?
bool freezing_slow_path(struct task_struct *p){ if (p->flags & (PF_NOFREEZE | PF_SUSPEND_TASK)) return false; if (test_tsk_thread_flag(p, TIF_MEMDIE)) return false; if (pm_nosig_freezing || cgroup_freezing(p)) return true; if (pm_freezing && !(p->flags & PF_KTHREAD)) return true; return false;} bool __refrigerator(bool check_kthr_stop){ unsigned int save = get_current_state(); for (;;) { set_current_state(TASK_UNINTERRUPTIBLE); was_frozen = true; schedule(); } set_current_state(save); return was_frozen;}
?
凍結流程并不是一條線執(zhí)行完成的,分為發(fā)送凍結信號把每個進程都喚醒,然后每個進程自己在運行的時候自己把自己凍結了。
2.2 睡眠流程
下面我們來看一下睡眠流程的代碼:
linux-src/kernel/power/suspend.c
?
int pm_suspend(suspend_state_t state){ int error; if (state <= PM_SUSPEND_ON || state >= PM_SUSPEND_MAX) return -EINVAL; pr_info("suspend entry (%s) ", mem_sleep_labels[state]); error = enter_state(state); if (error) { suspend_stats.fail++; dpm_save_failed_errno(error); } else { suspend_stats.success++; } pr_info("suspend exit "); return error;} static int enter_state(suspend_state_t state){ int error; if (sync_on_suspend_enabled) { ksys_sync_helper(); } error = suspend_prepare(state); error = suspend_devices_and_enter(state); return error;} static int suspend_prepare(suspend_state_t state){ int error; trace_suspend_resume(TPS("freeze_processes"), 0, true); error = suspend_freeze_processes(); trace_suspend_resume(TPS("freeze_processes"), 0, false); return error;} int suspend_devices_and_enter(suspend_state_t state){ int error; error = platform_suspend_begin(state); suspend_console(); suspend_test_start(); error = dpm_suspend_start(PMSG_SUSPEND); do { error = suspend_enter(state, &wakeup); } while (!error && !wakeup && platform_suspend_again(state)); Resume_devices: dpm_resume_end(PMSG_RESUME); suspend_test_finish("resume devices"); resume_console(); Close: platform_resume_end(state); pm_suspend_target_state = PM_SUSPEND_ON; return error;}
?
linux-src/drivers/base/power/main.c
?
int dpm_suspend_start(pm_message_t state){ ktime_t starttime = ktime_get(); int error; error = dpm_prepare(state); if (error) { suspend_stats.failed_prepare++; dpm_save_failed_step(SUSPEND_PREPARE); } else error = dpm_suspend(state); dpm_show_time(starttime, state, error, "start"); return error;} int dpm_suspend(pm_message_t state){ int error = 0; devfreq_suspend(); cpufreq_suspend(); while (!list_empty(&dpm_prepared_list)) { struct device *dev = to_device(dpm_prepared_list.prev); get_device(dev); error = device_suspend(dev); } return error;}
?
linux-src/kernel/power/suspend.c
?
static int suspend_enter(suspend_state_t state, bool *wakeup){ int error; error = platform_suspend_prepare(state); error = dpm_suspend_late(PMSG_SUSPEND); error = platform_suspend_prepare_late(state); error = dpm_suspend_noirq(PMSG_SUSPEND); error = platform_suspend_prepare_noirq(state); error = suspend_disable_secondary_cpus(); arch_suspend_disable_irqs(); BUG_ON(!irqs_disabled()); system_state = SYSTEM_SUSPEND; error = syscore_suspend(); if (!error) { *wakeup = pm_wakeup_pending(); if (!(suspend_test(TEST_CORE) || *wakeup)) { error = suspend_ops->enter(state); } else if (*wakeup) { error = -EBUSY; } syscore_resume(); } system_state = SYSTEM_RUNNING; arch_suspend_enable_irqs(); BUG_ON(irqs_disabled()); Enable_cpus: suspend_enable_secondary_cpus(); Platform_wake: platform_resume_noirq(state); dpm_resume_noirq(PMSG_RESUME); Platform_early_resume: platform_resume_early(state); Devices_early_resume: dpm_resume_early(PMSG_RESUME); Platform_finish: platform_resume_finish(state); return error;}
?
2.3 休眠流程
下面我們來看一下休眠流程的代碼:
linux-src/kernel/power/hibernate.c
?
int hibernate(void){ int error; lock_system_sleep(); pm_prepare_console(); ksys_sync_helper(); error = freeze_processes(); lock_device_hotplug(); error = create_basic_memory_bitmaps(); error = hibernation_snapshot(hibernation_mode == HIBERNATION_PLATFORM); if (in_suspend) { pm_pr_dbg("Writing hibernation image. "); error = swsusp_write(flags); swsusp_free(); if (!error) { power_down(); } } return error;}
?
linux-src/kernel/power/snapshot.c
?
int create_basic_memory_bitmaps(void){ struct memory_bitmap *bm1, *bm2; int error = 0; bm1 = kzalloc(sizeof(struct memory_bitmap), GFP_KERNEL); error = memory_bm_create(bm1, GFP_KERNEL, PG_ANY); bm2 = kzalloc(sizeof(struct memory_bitmap), GFP_KERNEL); error = memory_bm_create(bm2, GFP_KERNEL, PG_ANY); forbidden_pages_map = bm1; free_pages_map = bm2; mark_nosave_pages(forbidden_pages_map); return 0;}
?
linux-src/kernel/power/hibernate.c
?
int hibernation_snapshot(int platform_mode){ int error; error = platform_begin(platform_mode); error = hibernate_preallocate_memory(); error = freeze_kernel_threads(); error = dpm_prepare(PMSG_FREEZE); suspend_console(); pm_restrict_gfp_mask(); error = dpm_suspend(PMSG_FREEZE); error = create_image(platform_mode); msg = in_suspend ? (error ? PMSG_RECOVER : PMSG_THAW) : PMSG_RESTORE; dpm_resume(msg); resume_console(); dpm_complete(msg); Close: platform_end(platform_mode); return error;} static void power_down(void){ switch (hibernation_mode) { case HIBERNATION_REBOOT: kernel_restart(NULL); break; case HIBERNATION_PLATFORM: hibernation_platform_enter(); fallthrough; case HIBERNATION_SHUTDOWN: if (pm_power_off) kernel_power_off(); break; } kernel_halt(); /* * Valid image is on the disk, if we continue we risk serious data * corruption after resume. */ pr_crit("Power down manually "); while (1) cpu_relax();}
?
上面是休眠的過程,下面我們來看一下休眠恢復的過程,休眠恢復是先正常開機,然后從swap分區(qū)中加載之前保存的數據。
linux-src/kernel/power/hibernate.c
?
late_initcall_sync(software_resume); static int software_resume(void){ int error; if (swsusp_resume_device) goto Check_image; if (resume_delay) { pr_info("Waiting %dsec before reading resume device ... ", resume_delay); ssleep(resume_delay); } /* Check if the device is there */ swsusp_resume_device = name_to_dev_t(resume_file); if (!swsusp_resume_device) { wait_for_device_probe(); if (resume_wait) { while ((swsusp_resume_device = name_to_dev_t(resume_file)) == 0) msleep(10); async_synchronize_full(); } swsusp_resume_device = name_to_dev_t(resume_file); if (!swsusp_resume_device) { error = -ENODEV; goto Unlock; } } Check_image: pm_pr_dbg("Hibernation image partition %d:%d present ", MAJOR(swsusp_resume_device), MINOR(swsusp_resume_device)); pm_pr_dbg("Looking for hibernation image. "); error = swsusp_check(); if (error) goto Unlock; /* The snapshot device should not be opened while we're running */ if (!hibernate_acquire()) { error = -EBUSY; swsusp_close(FMODE_READ | FMODE_EXCL); goto Unlock; } error = freeze_processes(); error = freeze_kernel_threads(); error = load_image_and_restore(); thaw_processes(); Finish: pm_notifier_call_chain(PM_POST_RESTORE); Restore: pm_restore_console(); pr_info("resume failed (%d) ", error); hibernate_release(); /* For success case, the suspend path will release the lock */ Unlock: mutex_unlock(&system_transition_mutex); pm_pr_dbg("Hibernation image not present or could not be loaded. "); return error; Close_Finish: swsusp_close(FMODE_READ | FMODE_EXCL); goto Finish;} static int load_image_and_restore(void){ int error; lock_device_hotplug(); error = create_basic_memory_bitmaps(); error = swsusp_read(&flags); swsusp_close(FMODE_READ | FMODE_EXCL); error = hibernation_restore(flags & SF_PLATFORM_MODE); swsusp_free(); free_basic_memory_bitmaps(); Unlock: unlock_device_hotplug(); return error;} int hibernation_restore(int platform_mode){ int error; pm_prepare_console(); suspend_console(); pm_restrict_gfp_mask(); error = dpm_suspend_start(PMSG_QUIESCE); if (!error) { error = resume_target_kernel(platform_mode); /* * The above should either succeed and jump to the new kernel, * or return with an error. Otherwise things are just * undefined, so let's be paranoid. */ BUG_ON(!error); } dpm_resume_end(PMSG_RECOVER); pm_restore_gfp_mask(); resume_console(); pm_restore_console(); return error;}
?
2.4 自動睡眠
隨著智能手機的普及,手機的電量問題也越來越嚴重。之前的手機都是充一次能用三到五天甚至七天以上,但是對于智能手機來說,充一次只能用一天或者半天。手機電池技術遲遲沒有大的突破,為此也只能從軟件上下手解決了。安卓系統(tǒng)為此采取的辦法是投機性睡眠,也就是說對于手機來說,睡眠是常態(tài),運行不是常態(tài),這也符合手機的使用習慣,一天24小時大部分時間是不用手機的。安卓在內核中添加了wakelock模塊,內核默認情況下總是嘗試去睡眠,除非受到了wakelock的阻止。用戶空間的各個模塊都可以向內核添加wakelock,以表明自己需要運行,系統(tǒng)不能去睡眠。當用戶空間都把自己的wakelock移除之后,內核沒了wakelock就會去睡眠了。Wakelock推出之后,受到了很多內核核心維護者的強烈批評,wakelock的源碼也一直沒有合入標準內核。后來內核又重新實現了wakelock的邏輯,叫做自動睡眠。
其代碼如下:
linux-src/kernel/power/autosleep.c
?
int __init pm_autosleep_init(void){ autosleep_ws = wakeup_source_register(NULL, "autosleep"); if (!autosleep_ws) return -ENOMEM; autosleep_wq = alloc_ordered_workqueue("autosleep", 0); if (autosleep_wq) return 0; wakeup_source_unregister(autosleep_ws); return -ENOMEM;} static void try_to_suspend(struct work_struct *work){ unsigned int initial_count, final_count; if (!pm_get_wakeup_count(&initial_count, true)) goto out; mutex_lock(&autosleep_lock); if (!pm_save_wakeup_count(initial_count) || system_state != SYSTEM_RUNNING) { mutex_unlock(&autosleep_lock); goto out; } if (autosleep_state == PM_SUSPEND_ON) { mutex_unlock(&autosleep_lock); return; } if (autosleep_state >= PM_SUSPEND_MAX) hibernate(); else pm_suspend(autosleep_state); mutex_unlock(&autosleep_lock); if (!pm_get_wakeup_count(&final_count, false)) goto out; /* * If the wakeup occurred for an unknown reason, wait to prevent the * system from trying to suspend and waking up in a tight loop. */ if (final_count == initial_count) schedule_timeout_uninterruptible(HZ / 2); out: queue_up_suspend_work();}
?
? 三、關機與重啟 ? ?
關機和重啟是我們平時使用電腦時用的最多的操作了。重啟也是一種關機,只不是關機之后再開機,所以把它們放在一起講,實際上它們的代碼也是在一起實現的。后文中我們用關機來同時指代關機和重啟。關機的過程分為兩個部分,用戶空間處理和內核處理。正常的關機的話,我們肯定不能直接拔電源,也不能讓內核直接去關機,因為用戶空間也運行著大量的進程,也要對它們進行妥善的處理。由于init進程是所有用戶空間進程的祖先,所以由init進程處理關機命令是最合適不過的。實際上無論你是用命令行關機還是圖形界面按鈕關機還是長按電源鍵關機,最終的關機命令都會發(fā)給init進程來處理。Init進程首先會stop各個服務進程,然后殺死其它用戶空間進程,最后使用reboot系統(tǒng)調用請求內核進行最后的關機操作。
3.1 用戶空間處理
我們使用命令reboot或者圖形界面關機時,最終都會把命令發(fā)給init進程來處理。Init進程會首先關閉各個服務進程(deamon),然后發(fā)送信號SIGTERM給所有其他進程,給其一次優(yōu)雅地退出的機會,并sleep一段時間(一般是3s)來等待其退出,接著再發(fā)送信號SIGKILL給那么還是沒有退出的進程,強制其退出。最后Init進程會調用sync把內存中的文件數據同步到磁盤,最終通過reboot系統(tǒng)調用請求內核來關機。
3.2 內核處理
我們來看一下內核總reboot系統(tǒng)調用的實現:
linux-src/kernel/reboot.c
?
SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd, void __user *, arg){ struct pid_namespace *pid_ns = task_active_pid_ns(current); char buffer[256]; int ret = 0; /* We only trust the superuser with rebooting the system. */ if (!ns_capable(pid_ns->user_ns, CAP_SYS_BOOT)) return -EPERM; /* For safety, we require "magic" arguments. */ if (magic1 != LINUX_REBOOT_MAGIC1 || (magic2 != LINUX_REBOOT_MAGIC2 && magic2 != LINUX_REBOOT_MAGIC2A && magic2 != LINUX_REBOOT_MAGIC2B && magic2 != LINUX_REBOOT_MAGIC2C)) return -EINVAL; /* * If pid namespaces are enabled and the current task is in a child * pid_namespace, the command is handled by reboot_pid_ns() which will * call do_exit(). */ ret = reboot_pid_ns(pid_ns, cmd); if (ret) return ret; /* Instead of trying to make the power_off code look like * halt when pm_power_off is not set do it the easy way. */ if ((cmd == LINUX_REBOOT_CMD_POWER_OFF) && !pm_power_off) cmd = LINUX_REBOOT_CMD_HALT; mutex_lock(&system_transition_mutex); switch (cmd) { case LINUX_REBOOT_CMD_RESTART: kernel_restart(NULL); break; case LINUX_REBOOT_CMD_CAD_ON: C_A_D = 1; break; case LINUX_REBOOT_CMD_CAD_OFF: C_A_D = 0; break; case LINUX_REBOOT_CMD_HALT: kernel_halt(); do_exit(0); panic("cannot halt"); case LINUX_REBOOT_CMD_POWER_OFF: kernel_power_off(); do_exit(0); break; case LINUX_REBOOT_CMD_RESTART2: ret = strncpy_from_user(&buffer[0], arg, sizeof(buffer) - 1); if (ret < 0) { ret = -EFAULT; break; } buffer[sizeof(buffer) - 1] = '?'; kernel_restart(buffer); break; #ifdef CONFIG_KEXEC_CORE case LINUX_REBOOT_CMD_KEXEC: ret = kernel_kexec(); break;#endif #ifdef CONFIG_HIBERNATION case LINUX_REBOOT_CMD_SW_SUSPEND: ret = hibernate(); break;#endif default: ret = -EINVAL; break; } mutex_unlock(&system_transition_mutex); return ret;} void kernel_power_off(void){ kernel_shutdown_prepare(SYSTEM_POWER_OFF); if (pm_power_off_prepare) pm_power_off_prepare(); migrate_to_reboot_cpu(); syscore_shutdown(); pr_emerg("Power down "); kmsg_dump(KMSG_DUMP_SHUTDOWN); machine_power_off();}
?
關機命令最終會由平臺相關的代碼來執(zhí)行。
? 四、CPU動態(tài)調頻 ? ?
早期的CPU的頻率都是固定的,但是有一些極客玩家會去超頻。后來CPU廠商官方支持CPU動態(tài)調頻了。但是什么時候調,由誰去調,調到多少,這些問題就交給了內核。Linux內核設計了一個CPUFreq框架,此框架明確區(qū)分了各個角色,不同的角色職責不同。CPUFreq框架由3部分組成,CPUFreq Govenor、CPUFreq Core和CPUFreq Driver。Govenor是決策者,負責決定什么時候進行調頻,調到多少,Driver是執(zhí)行者,和具體的硬件打交道,Core是中間人,負責居中協調。一個系統(tǒng)可以有多個候選決策者,但是只能有一個當前決策者,每個候選決策者都向Core注冊自己,用戶空間可以選擇哪個決策者作為當前決策者。一個系統(tǒng)必須有且只有一個執(zhí)行者,執(zhí)行者由CPU廠商開發(fā),編譯哪個平臺的代碼就會編譯哪個執(zhí)行者,此執(zhí)行者會向Core注冊自己。下面我們畫個圖來看一下CPUFreq的整體框架。
可以發(fā)現圖中有一個CPUFreq Policy,這是什么意思呢?這是因為CPU調頻并不能為每一個CPU單獨調頻,有些CPU必須作為一個整體進行調頻,所以抽象出了一個CPUFreq Policy的概念,方便操作。
4.1 CPUFreq Core
Core中定義了一個全局變量cpufreq_governor_list,可以使用接口cpufreq_register_governor來注冊決策者,系統(tǒng)中可以同時注冊很多決策者,對于每個policy來說只有一個當前決策者生效。Core還定義了一個全局變量cpufreq_driver,可以使用接口cpufreq_register_driver來注冊執(zhí)行者,對于一個系統(tǒng)來說有且只能有一個決策者被注冊,第二個注冊的會返回錯誤。Core還定義全局變量cpufreq_policy_list,代表的是policy的列表,policy代表是多個必須一起改變頻率的CPU的集合。
我們先來看一下決策者的定義和注冊函數:
linux-src/include/linux/cpufreq.h
?
struct cpufreq_governor { char name[CPUFREQ_NAME_LEN]; int (*init)(struct cpufreq_policy *policy); void (*exit)(struct cpufreq_policy *policy); int (*start)(struct cpufreq_policy *policy); void (*stop)(struct cpufreq_policy *policy); void (*limits)(struct cpufreq_policy *policy); ssize_t (*show_setspeed) (struct cpufreq_policy *policy, char *buf); int (*store_setspeed) (struct cpufreq_policy *policy, unsigned int freq); struct list_head governor_list; struct module *owner; u8 flags;};
?
linux-src/drivers/cpufreq/cpufreq.c
?
int cpufreq_register_governor(struct cpufreq_governor *governor){ int err; if (!governor) return -EINVAL; if (cpufreq_disabled()) return -ENODEV; mutex_lock(&cpufreq_governor_mutex); err = -EBUSY; if (!find_governor(governor->name)) { err = 0; list_add(&governor->governor_list, &cpufreq_governor_list); } mutex_unlock(&cpufreq_governor_mutex); return err;}
?
可以看到注冊過程很簡單,就是把決策者往list中一放就可以了。我們來看一下決策者的幾個函數指針,init是在把決策者設置給policy的時候會調用,exit是在舊的決策者被替換的時候被調用。Start是在決策者開始生效的時候調用,stop是在決策者不再生效的時候調用,limits是在Core需要調頻的時候會調用。
我們再來看一下決策者的定義和注冊函數:
linux-src/include/linux/cpufreq.h
?
struct cpufreq_driver { char name[CPUFREQ_NAME_LEN]; u16 flags; void *driver_data; /* needed by all drivers */ int (*init)(struct cpufreq_policy *policy); int (*verify)(struct cpufreq_policy_data *policy); /* define one out of two */ int (*setpolicy)(struct cpufreq_policy *policy); int (*target)(struct cpufreq_policy *policy, unsigned int target_freq, unsigned int relation); /* Deprecated */ int (*target_index)(struct cpufreq_policy *policy, unsigned int index); unsigned int (*fast_switch)(struct cpufreq_policy *policy, unsigned int target_freq); /* * ->fast_switch() replacement for drivers that use an internal * representation of performance levels and can pass hints other than * the target performance level to the hardware. */ void (*adjust_perf)(unsigned int cpu, unsigned long min_perf, unsigned long target_perf, unsigned long capacity); /* * Only for drivers with target_index() and CPUFREQ_ASYNC_NOTIFICATION * unset. * * get_intermediate should return a stable intermediate frequency * platform wants to switch to and target_intermediate() should set CPU * to that frequency, before jumping to the frequency corresponding * to 'index'. Core will take care of sending notifications and driver * doesn't have to handle them in target_intermediate() or * target_index(). * * Drivers can return '0' from get_intermediate() in case they don't * wish to switch to intermediate frequency for some target frequency. * In that case core will directly call ->target_index(). */ unsigned int (*get_intermediate)(struct cpufreq_policy *policy, unsigned int index); int (*target_intermediate)(struct cpufreq_policy *policy, unsigned int index); /* should be defined, if possible */ unsigned int (*get)(unsigned int cpu); /* Called to update policy limits on firmware notifications. */ void (*update_limits)(unsigned int cpu); /* optional */ int (*bios_limit)(int cpu, unsigned int *limit); int (*online)(struct cpufreq_policy *policy); int (*offline)(struct cpufreq_policy *policy); int (*exit)(struct cpufreq_policy *policy); int (*suspend)(struct cpufreq_policy *policy); int (*resume)(struct cpufreq_policy *policy); struct freq_attr **attr; /* platform specific boost support code */ bool boost_enabled; int (*set_boost)(struct cpufreq_policy *policy, int state); /* * Set by drivers that want to register with the energy model after the * policy is properly initialized, but before the governor is started. */ void (*register_em)(struct cpufreq_policy *policy);};
?
linux-src/drivers/cpufreq/cpufreq.c
?
int cpufreq_register_driver(struct cpufreq_driver *driver_data){ unsigned long flags; int ret; if (cpufreq_disabled()) return -ENODEV; /* * The cpufreq core depends heavily on the availability of device * structure, make sure they are available before proceeding further. */ if (!get_cpu_device(0)) return -EPROBE_DEFER; if (!driver_data || !driver_data->verify || !driver_data->init || !(driver_data->setpolicy || driver_data->target_index || driver_data->target) || (driver_data->setpolicy && (driver_data->target_index || driver_data->target)) || (!driver_data->get_intermediate != !driver_data->target_intermediate) || (!driver_data->online != !driver_data->offline)) return -EINVAL; pr_debug("trying to register driver %s ", driver_data->name); /* Protect against concurrent CPU online/offline. */ cpus_read_lock(); write_lock_irqsave(&cpufreq_driver_lock, flags); if (cpufreq_driver) { write_unlock_irqrestore(&cpufreq_driver_lock, flags); ret = -EEXIST; goto out; } cpufreq_driver = driver_data; write_unlock_irqrestore(&cpufreq_driver_lock, flags); /* * Mark support for the scheduler's frequency invariance engine for * drivers that implement target(), target_index() or fast_switch(). */ if (!cpufreq_driver->setpolicy) { static_branch_enable_cpuslocked(&cpufreq_freq_invariance); pr_debug("supports frequency invariance"); } if (driver_data->setpolicy) driver_data->flags |= CPUFREQ_CONST_LOOPS; if (cpufreq_boost_supported()) { ret = create_boost_sysfs_file(); if (ret) goto err_null_driver; } ret = subsys_interface_register(&cpufreq_interface); if (ret) goto err_boost_unreg; if (unlikely(list_empty(&cpufreq_policy_list))) { /* if all ->init() calls failed, unregister */ ret = -ENODEV; pr_debug("%s: No CPU initialized for driver %s ", __func__, driver_data->name); goto err_if_unreg; } ret = cpuhp_setup_state_nocalls_cpuslocked(CPUHP_AP_ONLINE_DYN, "cpufreq:online", cpuhp_cpufreq_online, cpuhp_cpufreq_offline); if (ret < 0) goto err_if_unreg; hp_online = ret; ret = 0; pr_debug("driver %s up and running ", driver_data->name); goto out; err_if_unreg: subsys_interface_unregister(&cpufreq_interface);err_boost_unreg: remove_boost_sysfs_file();err_null_driver: write_lock_irqsave(&cpufreq_driver_lock, flags); cpufreq_driver = NULL; write_unlock_irqrestore(&cpufreq_driver_lock, flags);out: cpus_read_unlock(); return ret;}
?
可以看到注冊函數也很簡單,主要就是為全局變量cpufreq_driver賦值。我們來看一下執(zhí)行者的函數指針,其中最重要的函數指針是target和target_index,它們是具體負責設置目標policy的頻率的,target是老的接口,是為了兼容才保留下來的,現在建議使用接口target_index。
我們再來看一下policy的定義和注冊函數:
linux-src/include/linux/cpufreq.h
?
struct cpufreq_policy { /* CPUs sharing clock, require sw coordination */ cpumask_var_t cpus; /* Online CPUs only */ cpumask_var_t related_cpus; /* Online + Offline CPUs */ cpumask_var_t real_cpus; /* Related and present */ unsigned int shared_type; /* ACPI: ANY or ALL affected CPUs should set cpufreq */ unsigned int cpu; /* cpu managing this policy, must be online */ struct clk *clk; struct cpufreq_cpuinfo cpuinfo;/* see above */ unsigned int min; /* in kHz */ unsigned int max; /* in kHz */ unsigned int cur; /* in kHz, only needed if cpufreq * governors are used */ unsigned int suspend_freq; /* freq to set during suspend */ unsigned int policy; /* see above */ unsigned int last_policy; /* policy before unplug */ struct cpufreq_governor *governor; /* see below */ void *governor_data; char last_governor[CPUFREQ_NAME_LEN]; /* last governor used */ struct work_struct update; /* if update_policy() needs to be * called, but you're in IRQ context */ struct freq_constraints constraints; struct freq_qos_request *min_freq_req; struct freq_qos_request *max_freq_req; struct cpufreq_frequency_table *freq_table; enum cpufreq_table_sorting freq_table_sorted; struct list_head policy_list; struct kobject kobj; struct completion kobj_unregister; /* * The rules for this semaphore: * - Any routine that wants to read from the policy structure will * do a down_read on this semaphore. * - Any routine that will write to the policy structure and/or may take away * the policy altogether (eg. CPU hotplug), will hold this lock in write * mode before doing so. */ struct rw_semaphore rwsem; /* * Fast switch flags: * - fast_switch_possible should be set by the driver if it can * guarantee that frequency can be changed on any CPU sharing the * policy and that the change will affect all of the policy CPUs then. * - fast_switch_enabled is to be set by governors that support fast * frequency switching with the help of cpufreq_enable_fast_switch(). */ bool fast_switch_possible; bool fast_switch_enabled; /* * Set if the CPUFREQ_GOV_STRICT_TARGET flag is set for the current * governor. */ bool strict_target; /* * Preferred average time interval between consecutive invocations of * the driver to set the frequency for this policy. To be set by the * scaling driver (0, which is the default, means no preference). */ unsigned int transition_delay_us; /* * Remote DVFS flag (Not added to the driver structure as we don't want * to access another structure from scheduler hotpath). * * Should be set if CPUs can do DVFS on behalf of other CPUs from * different cpufreq policies. */ bool dvfs_possible_from_any_cpu; /* Cached frequency lookup from cpufreq_driver_resolve_freq. */ unsigned int cached_target_freq; unsigned int cached_resolved_idx; /* Synchronization for frequency transitions */ bool transition_ongoing; /* Tracks transition status */ spinlock_t transition_lock; wait_queue_head_t transition_wait; struct task_struct *transition_task; /* Task which is doing the transition */ /* cpufreq-stats */ struct cpufreq_stats *stats; /* For cpufreq driver's internal use */ void *driver_data; /* Pointer to the cooling device if used for thermal mitigation */ struct thermal_cooling_device *cdev; struct notifier_block nb_min; struct notifier_block nb_max;};
?
linux-src/drivers/cpufreq/cpufreq.c
?
static int cpufreq_online(unsigned int cpu){ struct cpufreq_policy *policy; bool new_policy; unsigned long flags; unsigned int j; int ret; pr_debug("%s: bringing CPU%u online ", __func__, cpu); /* Check if this CPU already has a policy to manage it */ policy = per_cpu(cpufreq_cpu_data, cpu); if (policy) { WARN_ON(!cpumask_test_cpu(cpu, policy->related_cpus)); if (!policy_is_inactive(policy)) return cpufreq_add_policy_cpu(policy, cpu); /* This is the only online CPU for the policy. Start over. */ new_policy = false; down_write(&policy->rwsem); policy->cpu = cpu; policy->governor = NULL; up_write(&policy->rwsem); } else { new_policy = true; policy = cpufreq_policy_alloc(cpu); if (!policy) return -ENOMEM; } if (!new_policy && cpufreq_driver->online) { ret = cpufreq_driver->online(policy); if (ret) { pr_debug("%s: %d: initialization failed ", __func__, __LINE__); goto out_exit_policy; } /* Recover policy->cpus using related_cpus */ cpumask_copy(policy->cpus, policy->related_cpus); } else { cpumask_copy(policy->cpus, cpumask_of(cpu)); /* * Call driver. From then on the cpufreq must be able * to accept all calls to ->verify and ->setpolicy for this CPU. */ ret = cpufreq_driver->init(policy); if (ret) { pr_debug("%s: %d: initialization failed ", __func__, __LINE__); goto out_free_policy; } /* * The initialization has succeeded and the policy is online. * If there is a problem with its frequency table, take it * offline and drop it. */ ret = cpufreq_table_validate_and_sort(policy); if (ret) goto out_offline_policy; /* related_cpus should at least include policy->cpus. */ cpumask_copy(policy->related_cpus, policy->cpus); } down_write(&policy->rwsem); /* * affected cpus must always be the one, which are online. We aren't * managing offline cpus here. */ cpumask_and(policy->cpus, policy->cpus, cpu_online_mask); if (new_policy) { for_each_cpu(j, policy->related_cpus) { per_cpu(cpufreq_cpu_data, j) = policy; add_cpu_dev_symlink(policy, j, get_cpu_device(j)); } policy->min_freq_req = kzalloc(2 * sizeof(*policy->min_freq_req), GFP_KERNEL); if (!policy->min_freq_req) { ret = -ENOMEM; goto out_destroy_policy; } ret = freq_qos_add_request(&policy->constraints, policy->min_freq_req, FREQ_QOS_MIN, FREQ_QOS_MIN_DEFAULT_VALUE); if (ret < 0) { /* * So we don't call freq_qos_remove_request() for an * uninitialized request. */ kfree(policy->min_freq_req); policy->min_freq_req = NULL; goto out_destroy_policy; } /* * This must be initialized right here to avoid calling * freq_qos_remove_request() on uninitialized request in case * of errors. */ policy->max_freq_req = policy->min_freq_req + 1; ret = freq_qos_add_request(&policy->constraints, policy->max_freq_req, FREQ_QOS_MAX, FREQ_QOS_MAX_DEFAULT_VALUE); if (ret < 0) { policy->max_freq_req = NULL; goto out_destroy_policy; } blocking_notifier_call_chain(&cpufreq_policy_notifier_list, CPUFREQ_CREATE_POLICY, policy); } if (cpufreq_driver->get && has_target()) { policy->cur = cpufreq_driver->get(policy->cpu); if (!policy->cur) { ret = -EIO; pr_err("%s: ->get() failed ", __func__); goto out_destroy_policy; } } /* * Sometimes boot loaders set CPU frequency to a value outside of * frequency table present with cpufreq core. In such cases CPU might be * unstable if it has to run on that frequency for long duration of time * and so its better to set it to a frequency which is specified in * freq-table. This also makes cpufreq stats inconsistent as * cpufreq-stats would fail to register because current frequency of CPU * isn't found in freq-table. * * Because we don't want this change to effect boot process badly, we go * for the next freq which is >= policy->cur ('cur' must be set by now, * otherwise we will end up setting freq to lowest of the table as 'cur' * is initialized to zero). * * We are passing target-freq as "policy->cur - 1" otherwise * __cpufreq_driver_target() would simply fail, as policy->cur will be * equal to target-freq. */ if ((cpufreq_driver->flags & CPUFREQ_NEED_INITIAL_FREQ_CHECK) && has_target()) { unsigned int old_freq = policy->cur; /* Are we running at unknown frequency ? */ ret = cpufreq_frequency_table_get_index(policy, old_freq); if (ret == -EINVAL) { ret = __cpufreq_driver_target(policy, old_freq - 1, CPUFREQ_RELATION_L); /* * Reaching here after boot in a few seconds may not * mean that system will remain stable at "unknown" * frequency for longer duration. Hence, a BUG_ON(). */ BUG_ON(ret); pr_info("%s: CPU%d: Running at unlisted initial frequency: %u KHz, changing to: %u KHz ", __func__, policy->cpu, old_freq, policy->cur); } } if (new_policy) { ret = cpufreq_add_dev_interface(policy); if (ret) goto out_destroy_policy; cpufreq_stats_create_table(policy); write_lock_irqsave(&cpufreq_driver_lock, flags); list_add(&policy->policy_list, &cpufreq_policy_list); write_unlock_irqrestore(&cpufreq_driver_lock, flags); /* * Register with the energy model before * sched_cpufreq_governor_change() is called, which will result * in rebuilding of the sched domains, which should only be done * once the energy model is properly initialized for the policy * first. * * Also, this should be called before the policy is registered * with cooling framework. */ if (cpufreq_driver->register_em) cpufreq_driver->register_em(policy); } ret = cpufreq_init_policy(policy); if (ret) { pr_err("%s: Failed to initialize policy for cpu: %d (%d) ", __func__, cpu, ret); goto out_destroy_policy; } up_write(&policy->rwsem); kobject_uevent(&policy->kobj, KOBJ_ADD); if (cpufreq_thermal_control_enabled(cpufreq_driver)) policy->cdev = of_cpufreq_cooling_register(policy); pr_debug("initialization complete "); return 0; out_destroy_policy: for_each_cpu(j, policy->real_cpus) remove_cpu_dev_symlink(policy, get_cpu_device(j)); up_write(&policy->rwsem); out_offline_policy: if (cpufreq_driver->offline) cpufreq_driver->offline(policy); out_exit_policy: if (cpufreq_driver->exit) cpufreq_driver->exit(policy); out_free_policy: cpufreq_policy_free(policy); return ret;}
?
Policy代表的是必須得一起調節(jié)頻率的CPU的集合,對于物理CPU來說,并不是每個核都可以單獨調節(jié)頻率的。系統(tǒng)中有多少個policy是和CPU的具體情況有關。
4.2 Govener介紹
系統(tǒng)中一個存在6個決策者,下面我們一一介紹一下。
1.performance
performance的策略非常簡單,就是一直把CPU的頻率設置為最大值。代碼如下:
linux-src/drivers/cpufreq/cpufreq_performance.c
?
static struct cpufreq_governor cpufreq_gov_performance = { .name = "performance", .owner = THIS_MODULE, .flags = CPUFREQ_GOV_STRICT_TARGET, .limits = cpufreq_gov_performance_limits,}; static void cpufreq_gov_performance_limits(struct cpufreq_policy *policy){ pr_debug("setting to %u kHz ", policy->max); __cpufreq_driver_target(policy, policy->max, CPUFREQ_RELATION_H);}
?
2.powersave
powersave的策略也非常簡單,就是一直把CPU的頻率設置為最小值。代碼如下:
linux-src/drivers/cpufreq/cpufreq_powersave.c
?
static struct cpufreq_governor cpufreq_gov_powersave = { .name = "powersave", .limits = cpufreq_gov_powersave_limits, .owner = THIS_MODULE, .flags = CPUFREQ_GOV_STRICT_TARGET,}; static void cpufreq_gov_powersave_limits(struct cpufreq_policy *policy){ pr_debug("setting to %u kHz ", policy->min); __cpufreq_driver_target(policy, policy->min, CPUFREQ_RELATION_L);}
?
3.conservative
Conservative,包括模式,總是把頻率往policy的最大值和最小值之間調整。代碼如下:
linux-src/drivers/cpufreq/cpufreq_conservative.c
?
static struct dbs_governor cs_governor = { .gov = CPUFREQ_DBS_GOVERNOR_INITIALIZER("conservative"), .kobj_type = { .default_attrs = cs_attributes }, .gov_dbs_update = cs_dbs_update, .alloc = cs_alloc, .free = cs_free, .init = cs_init, .exit = cs_exit, .start = cs_start,};
?
linux-src/drivers/cpufreq/cpufreq_governor.h
?
#define CPUFREQ_DBS_GOVERNOR_INITIALIZER(_name_) { .name = _name_, .flags = CPUFREQ_GOV_DYNAMIC_SWITCHING, .owner = THIS_MODULE, .init = cpufreq_dbs_governor_init, .exit = cpufreq_dbs_governor_exit, .start = cpufreq_dbs_governor_start, .stop = cpufreq_dbs_governor_stop, .limits = cpufreq_dbs_governor_limits, ??}
?
linux-src/drivers/cpufreq/cpufreq_governor.c
?
void cpufreq_dbs_governor_limits(struct cpufreq_policy *policy){ struct policy_dbs_info *policy_dbs; /* Protect gov->gdbs_data against cpufreq_dbs_governor_exit() */ mutex_lock(&gov_dbs_data_mutex); policy_dbs = policy->governor_data; if (!policy_dbs) goto out; mutex_lock(&policy_dbs->update_mutex); cpufreq_policy_apply_limits(policy); gov_update_sample_delay(policy_dbs, 0); mutex_unlock(&policy_dbs->update_mutex); out: mutex_unlock(&gov_dbs_data_mutex);}
?
linux-src/include/linux/cpufreq.h
?
static inline void cpufreq_policy_apply_limits(struct cpufreq_policy *policy){ if (policy->max < policy->cur) __cpufreq_driver_target(policy, policy->max, CPUFREQ_RELATION_H); else if (policy->min > policy->cur) __cpufreq_driver_target(policy, policy->min, CPUFREQ_RELATION_L);}
?
4.userspace
Userspace,按照用戶空間設置的值進行調節(jié),代碼如下:
linux-src/drivers/cpufreq/cpufreq_userspace.c
?
static struct cpufreq_governor cpufreq_gov_userspace = { .name = "userspace", .init = cpufreq_userspace_policy_init, .exit = cpufreq_userspace_policy_exit, .start = cpufreq_userspace_policy_start, .stop = cpufreq_userspace_policy_stop, .limits = cpufreq_userspace_policy_limits, .store_setspeed = cpufreq_set, .show_setspeed = show_speed, .owner = THIS_MODULE,}; static void cpufreq_userspace_policy_limits(struct cpufreq_policy *policy){ unsigned int *setspeed = policy->governor_data; mutex_lock(&userspace_mutex); pr_debug("limit event for cpu %u: %u - %u kHz, currently %u kHz, last set to %u kHz ", policy->cpu, policy->min, policy->max, policy->cur, *setspeed); if (policy->max < *setspeed) __cpufreq_driver_target(policy, policy->max, CPUFREQ_RELATION_H); else if (policy->min > *setspeed) __cpufreq_driver_target(policy, policy->min, CPUFREQ_RELATION_L); else __cpufreq_driver_target(policy, *setspeed, CPUFREQ_RELATION_L); mutex_unlock(&userspace_mutex);}
?
5.ondemand
Ondemand,按需調整,默認運行在較低頻率,系統(tǒng)負載增大時就運行在高頻率,代碼如下:
linux-src/drivers/cpufreq/cpufreq_ondemand.c
?
static struct dbs_governor od_dbs_gov = { .gov = CPUFREQ_DBS_GOVERNOR_INITIALIZER("ondemand"), .kobj_type = { .default_attrs = od_attributes }, .gov_dbs_update = od_dbs_update, .alloc = od_alloc, .free = od_free, .init = od_init, .exit = od_exit, .start = od_start,};
?
6.schedutil
Schedutil,根據CPU使用率動態(tài)調整頻率,代碼如下:
linux-src/kernel/sched/cpufreq_schedutil.c
?
struct cpufreq_governor schedutil_gov = { .name = "schedutil", .owner = THIS_MODULE, .flags = CPUFREQ_GOV_DYNAMIC_SWITCHING, .init = sugov_init, .exit = sugov_exit, .start = sugov_start, .stop = sugov_stop, .limits = sugov_limits,}; static void sugov_limits(struct cpufreq_policy *policy){ struct sugov_policy *sg_policy = policy->governor_data; if (!policy->fast_switch_enabled) { mutex_lock(&sg_policy->work_lock); cpufreq_policy_apply_limits(policy); mutex_unlock(&sg_policy->work_lock); } sg_policy->limits_changed = true;}
?
4.3 Driver介紹
在x86上只有一個執(zhí)行者,叫做intel_pstate,我們來看一下它的代碼實現:
linux-src/drivers/cpufreq/intel_pstate.c
?
static struct cpufreq_driver intel_pstate = { .flags = CPUFREQ_CONST_LOOPS, .verify = intel_pstate_verify_policy, .setpolicy = intel_pstate_set_policy, .suspend = intel_pstate_suspend, .resume = intel_pstate_resume, .init = intel_pstate_cpu_init, .exit = intel_pstate_cpu_exit, .offline = intel_pstate_cpu_offline, .online = intel_pstate_cpu_online, .update_limits = intel_pstate_update_limits, .name = "intel_pstate",}; static int intel_pstate_set_policy(struct cpufreq_policy *policy){ struct cpudata *cpu; if (!policy->cpuinfo.max_freq) return -ENODEV; pr_debug("set_policy cpuinfo.max %u policy->max %u ", policy->cpuinfo.max_freq, policy->max); cpu = all_cpu_data[policy->cpu]; cpu->policy = policy->policy; mutex_lock(&intel_pstate_limits_lock); intel_pstate_update_perf_limits(cpu, policy->min, policy->max); if (cpu->policy == CPUFREQ_POLICY_PERFORMANCE) { /* * NOHZ_FULL CPUs need this as the governor callback may not * be invoked on them. */ intel_pstate_clear_update_util_hook(policy->cpu); intel_pstate_max_within_limits(cpu); } else { intel_pstate_set_update_util_hook(policy->cpu); } if (hwp_active) { /* * When hwp_boost was active before and dynamically it * was turned off, in that case we need to clear the * update util hook. */ if (!hwp_boost) intel_pstate_clear_update_util_hook(policy->cpu); intel_pstate_hwp_set(policy->cpu); } mutex_unlock(&intel_pstate_limits_lock); return 0;} static int __init intel_pstate_init(void){ const struct x86_cpu_id *id; int rc; if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) return -ENODEV; id = x86_match_cpu(hwp_support_ids); if (id) { bool hwp_forced = intel_pstate_hwp_is_enabled(); if (hwp_forced) pr_info("HWP enabled by BIOS "); else if (no_load) return -ENODEV; copy_cpu_funcs(&core_funcs); /* * Avoid enabling HWP for processors without EPP support, * because that means incomplete HWP implementation which is a * corner case and supporting it is generally problematic. * * If HWP is enabled already, though, there is no choice but to * deal with it. */ if ((!no_hwp && boot_cpu_has(X86_FEATURE_HWP_EPP)) || hwp_forced) { hwp_active++; hwp_mode_bdw = id->driver_data; intel_pstate.attr = hwp_cpufreq_attrs; intel_cpufreq.attr = hwp_cpufreq_attrs; intel_cpufreq.flags |= CPUFREQ_NEED_UPDATE_LIMITS; intel_cpufreq.adjust_perf = intel_cpufreq_adjust_perf; if (!default_driver) default_driver = &intel_pstate; if (boot_cpu_has(X86_FEATURE_HYBRID_CPU)) intel_pstate_cppc_set_cpu_scaling(); goto hwp_cpu_matched; } pr_info("HWP not enabled "); } else { if (no_load) return -ENODEV; id = x86_match_cpu(intel_pstate_cpu_ids); if (!id) { pr_info("CPU model not supported "); return -ENODEV; } copy_cpu_funcs((struct pstate_funcs *)id->driver_data); } if (intel_pstate_msrs_not_valid()) { pr_info("Invalid MSRs "); return -ENODEV; } /* Without HWP start in the passive mode. */ if (!default_driver) default_driver = &intel_cpufreq; hwp_cpu_matched: /* * The Intel pstate driver will be ignored if the platform * firmware has its own power management modes. */ if (intel_pstate_platform_pwr_mgmt_exists()) { pr_info("P-states controlled by the platform "); return -ENODEV; } if (!hwp_active && hwp_only) return -ENOTSUPP; pr_info("Intel P-state driver initializing "); all_cpu_data = vzalloc(array_size(sizeof(void *), num_possible_cpus())); if (!all_cpu_data) return -ENOMEM; intel_pstate_request_control_from_smm(); intel_pstate_sysfs_expose_params(); mutex_lock(&intel_pstate_driver_lock); rc = intel_pstate_register_driver(default_driver); mutex_unlock(&intel_pstate_driver_lock); if (rc) { intel_pstate_sysfs_remove(); return rc; } if (hwp_active) { const struct x86_cpu_id *id; id = x86_match_cpu(intel_pstate_cpu_ee_disable_ids); if (id) { set_power_ctl_ee_state(false); pr_info("Disabling energy efficiency optimization "); } pr_info("HWP enabled "); } else if (boot_cpu_has(X86_FEATURE_HYBRID_CPU)) { pr_warn("Problematic setup: Hybrid processor with disabled HWP "); } return 0;}device_initcall(intel_pstate_init);
?
? 五、CPU休閑? ? ?
CPU在無進程可執(zhí)行的情況下會進入idle狀態(tài),idle狀態(tài)的CPU可以選擇進入低功耗模式以節(jié)省能源。不同的低功耗模式被統(tǒng)稱為C-state,ACPI定義的有C0、C1、C2、C3、C4、C5、C6這幾種模式,CPU廠商可以選擇實現C0–Cn,n >= 3。下面是各種模式的定義:?
C0:CPU的正常工作模式,CPU處于100%運行狀態(tài)。
C1:通過軟件停止CPU內部主時鐘;總線接口單元和APIC仍然保持全速運行。?
C2:通過硬件停止CPU內部主時鐘;總線接口單元和APIC仍然保持全速運行。
C3:停止所有CPU內部時鐘。?
C4:降低CPU電壓。?
C5:大幅降低CPU電壓并關閉內存高速緩存。?
C6:將CPU內部電壓降低至任何值,包括0V。
那么誰來決定CPU該進入哪一級的idle狀態(tài),又該誰去執(zhí)行這個決定呢?為此Linux設計了CPUIdle框架,區(qū)分了不同的角色。決策者負責決定該進入哪一級idle狀態(tài),執(zhí)行者負責去執(zhí)行,Core負責居中調節(jié),下面我們畫個圖來看一下。
可以看到系統(tǒng)能同時注冊多個決策者,但是卻只能注冊一個執(zhí)行者。和CPUFreq不同的是每個CPU可以單獨調節(jié)自己的低功耗狀態(tài),所以Driver直接調節(jié)的就是CPU。
5.1 CPUIdle Core
Core定義了一個全局變量cpuidle_governors,是所有決策者的列表,可以通過接口cpuidle_register_governor注冊決策者,全局變量cpuidle_curr_governor代表當前決策者。Core還定義了cpuidle_curr_driver,代表當前執(zhí)行者,可以通過接口cpuidle_register_driver來注冊執(zhí)行者,一個系統(tǒng)只能注冊一個執(zhí)行者,后面注冊的會返回錯誤。
下面我們看一下決策者的定義和注冊函數:
linux-src/include/linux/cpuidle.h
?
struct cpuidle_governor { char name[CPUIDLE_NAME_LEN]; struct list_head governor_list; unsigned int rating; int (*enable) (struct cpuidle_driver *drv, struct cpuidle_device *dev); void (*disable) (struct cpuidle_driver *drv, struct cpuidle_device *dev); int (*select) (struct cpuidle_driver *drv, struct cpuidle_device *dev, bool *stop_tick); void (*reflect) (struct cpuidle_device *dev, int index);};
?
linux-src/drivers/cpuidle/governor.c
?
int cpuidle_register_governor(struct cpuidle_governor *gov){ int ret = -EEXIST; if (!gov || !gov->select) return -EINVAL; if (cpuidle_disabled()) return -ENODEV; mutex_lock(&cpuidle_lock); if (cpuidle_find_governor(gov->name) == NULL) { ret = 0; list_add_tail(&gov->governor_list, &cpuidle_governors); if (!cpuidle_curr_governor || !strncasecmp(param_governor, gov->name, CPUIDLE_NAME_LEN) || (cpuidle_curr_governor->rating < gov->rating && strncasecmp(param_governor, cpuidle_curr_governor->name, CPUIDLE_NAME_LEN))) cpuidle_switch_governor(gov); } mutex_unlock(&cpuidle_lock); return ret;}
?
下面我們再來看一下執(zhí)行者的定義和注冊函數:
linux-src/include/linux/cpuidle.h
?
struct cpuidle_driver { const char *name; struct module *owner; /* used by the cpuidle framework to setup the broadcast timer */ unsigned int bctimer:1; /* states array must be ordered in decreasing power consumption */ struct cpuidle_state states[CPUIDLE_STATE_MAX]; int state_count; int safe_state_index; /* the driver handles the cpus in cpumask */ struct cpumask *cpumask; /* preferred governor to switch at register time */ const char *governor;};
?
linux-src/drivers/cpuidle/driver.c
?
int cpuidle_register_driver(struct cpuidle_driver *drv){ struct cpuidle_governor *gov; int ret; spin_lock(&cpuidle_driver_lock); ret = __cpuidle_register_driver(drv); spin_unlock(&cpuidle_driver_lock); if (!ret && !strlen(param_governor) && drv->governor && (cpuidle_get_driver() == drv)) { mutex_lock(&cpuidle_lock); gov = cpuidle_find_governor(drv->governor); if (gov) { cpuidle_prev_governor = cpuidle_curr_governor; if (cpuidle_switch_governor(gov) < 0) cpuidle_prev_governor = NULL; } mutex_unlock(&cpuidle_lock); } return ret;}
?
5.2 決策者介紹
CPUIdle中默認有兩個決策者,ladder和menu。ladder是梯子的意思,CPU 隨著idle的時間其睡眠程度逐漸加深,適用于固定tick。menu是菜單的意思,預估一個CPU idle的時間,然后CPU一步到位地處于某種睡眠狀態(tài),適用于動態(tài)tick。下面我們分別來看看它們的實現。
1.ladder
linux-src/drivers/cpuidle/governors/ladder.c
?
static struct cpuidle_governor ladder_governor = { .name = "ladder", .rating = 10, .enable = ladder_enable_device, .select = ladder_select_state, .reflect = ladder_reflect,}; static int ladder_select_state(struct cpuidle_driver *drv, struct cpuidle_device *dev, bool *dummy){ struct ladder_device *ldev = this_cpu_ptr(&ladder_devices); struct ladder_device_state *last_state; int last_idx = dev->last_state_idx; int first_idx = drv->states[0].flags & CPUIDLE_FLAG_POLLING ? 1 : 0; s64 latency_req = cpuidle_governor_latency_req(dev->cpu); s64 last_residency; /* Special case when user has set very strict latency requirement */ if (unlikely(latency_req == 0)) { ladder_do_selection(dev, ldev, last_idx, 0); return 0; } last_state = &ldev->states[last_idx]; last_residency = dev->last_residency_ns - drv->states[last_idx].exit_latency_ns; /* consider promotion */ if (last_idx < drv->state_count - 1 && !dev->states_usage[last_idx + 1].disable && last_residency > last_state->threshold.promotion_time_ns && drv->states[last_idx + 1].exit_latency_ns <= latency_req) { last_state->stats.promotion_count++; last_state->stats.demotion_count = 0; if (last_state->stats.promotion_count >= last_state->threshold.promotion_count) { ladder_do_selection(dev, ldev, last_idx, last_idx + 1); return last_idx + 1; } } /* consider demotion */ if (last_idx > first_idx && (dev->states_usage[last_idx].disable || drv->states[last_idx].exit_latency_ns > latency_req)) { int i; for (i = last_idx - 1; i > first_idx; i--) { if (drv->states[i].exit_latency_ns <= latency_req) break; } ladder_do_selection(dev, ldev, last_idx, i); return i; } if (last_idx > first_idx && last_residency < last_state->threshold.demotion_time_ns) { last_state->stats.demotion_count++; last_state->stats.promotion_count = 0; if (last_state->stats.demotion_count >= last_state->threshold.demotion_count) { ladder_do_selection(dev, ldev, last_idx, last_idx - 1); return last_idx - 1; } } /* otherwise remain at the current state */ return last_idx;}
?
2.menu
linux-src/drivers/cpuidle/governors/menu.c
?
static struct cpuidle_governor menu_governor = { .name = "menu", .rating = 20, .enable = menu_enable_device, .select = menu_select, .reflect = menu_reflect,}; static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, bool *stop_tick){ struct menu_device *data = this_cpu_ptr(&menu_devices); s64 latency_req = cpuidle_governor_latency_req(dev->cpu); unsigned int predicted_us; u64 predicted_ns; u64 interactivity_req; unsigned int nr_iowaiters; ktime_t delta, delta_tick; int i, idx; if (data->needs_update) { menu_update(drv, dev); data->needs_update = 0; } /* determine the expected residency time, round up */ delta = tick_nohz_get_sleep_length(&delta_tick); if (unlikely(delta < 0)) { delta = 0; delta_tick = 0; } data->next_timer_ns = delta; nr_iowaiters = nr_iowait_cpu(dev->cpu); data->bucket = which_bucket(data->next_timer_ns, nr_iowaiters); if (unlikely(drv->state_count <= 1 || latency_req == 0) || ((data->next_timer_ns < drv->states[1].target_residency_ns || latency_req < drv->states[1].exit_latency_ns) && !dev->states_usage[0].disable)) { /* * In this case state[0] will be used no matter what, so return * it right away and keep the tick running if state[0] is a * polling one. */ *stop_tick = !(drv->states[0].flags & CPUIDLE_FLAG_POLLING); return 0; } /* Round up the result for half microseconds. */ predicted_us = div_u64(data->next_timer_ns * data->correction_factor[data->bucket] + (RESOLUTION * DECAY * NSEC_PER_USEC) / 2, RESOLUTION * DECAY * NSEC_PER_USEC); /* Use the lowest expected idle interval to pick the idle state. */ predicted_ns = (u64)min(predicted_us, get_typical_interval(data, predicted_us)) * NSEC_PER_USEC; if (tick_nohz_tick_stopped()) { /* * If the tick is already stopped, the cost of possible short * idle duration misprediction is much higher, because the CPU * may be stuck in a shallow idle state for a long time as a * result of it. In that case say we might mispredict and use * the known time till the closest timer event for the idle * state selection. */ if (predicted_ns < TICK_NSEC) predicted_ns = data->next_timer_ns; } else { /* * Use the performance multiplier and the user-configurable * latency_req to determine the maximum exit latency. */ interactivity_req = div64_u64(predicted_ns, performance_multiplier(nr_iowaiters)); if (latency_req > interactivity_req) latency_req = interactivity_req; } /* * Find the idle state with the lowest power while satisfying * our constraints. */ idx = -1; for (i = 0; i < drv->state_count; i++) { struct cpuidle_state *s = &drv->states[i]; if (dev->states_usage[i].disable) continue; if (idx == -1) idx = i; /* first enabled state */ if (s->target_residency_ns > predicted_ns) { /* * Use a physical idle state, not busy polling, unless * a timer is going to trigger soon enough. */ if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) && s->exit_latency_ns <= latency_req && s->target_residency_ns <= data->next_timer_ns) { predicted_ns = s->target_residency_ns; idx = i; break; } if (predicted_ns < TICK_NSEC) break; if (!tick_nohz_tick_stopped()) { /* * If the state selected so far is shallow, * waking up early won't hurt, so retain the * tick in that case and let the governor run * again in the next iteration of the loop. */ predicted_ns = drv->states[idx].target_residency_ns; break; } /* * If the state selected so far is shallow and this * state's target residency matches the time till the * closest timer event, select this one to avoid getting * stuck in the shallow one for too long. */ if (drv->states[idx].target_residency_ns < TICK_NSEC && s->target_residency_ns <= delta_tick) idx = i; return idx; } if (s->exit_latency_ns > latency_req) break; idx = i; } if (idx == -1) idx = 0; /* No states enabled. Must use 0. */ /* * Don't stop the tick if the selected state is a polling one or if the * expected idle duration is shorter than the tick period length. */ if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) || predicted_ns < TICK_NSEC) && !tick_nohz_tick_stopped()) { *stop_tick = false; if (idx > 0 && drv->states[idx].target_residency_ns > delta_tick) { /* * The tick is not going to be stopped and the target * residency of the state to be returned is not within * the time until the next timer event including the * tick, so try to correct that. */ for (i = idx - 1; i >= 0; i--) { if (dev->states_usage[i].disable) continue; idx = i; if (drv->states[i].target_residency_ns <= delta_tick) break; } } } return idx;}
?
5.3 Driver介紹
在x86上只有一個執(zhí)行者叫pseries_idle。
linux-src/drivers/cpuidle/cpuidle-pseries.c
?
static struct cpuidle_driver pseries_idle_driver = { .name = "pseries_idle", .owner = THIS_MODULE,}; static int __init pseries_processor_idle_init(void){ int retval; retval = pseries_idle_probe(); if (retval) return retval; pseries_cpuidle_driver_init(); retval = cpuidle_register(&pseries_idle_driver, NULL); if (retval) { printk(KERN_DEBUG "Registration of pseries driver failed. "); return retval; } retval = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "cpuidle/pseries:online", pseries_cpuidle_cpu_online, NULL); WARN_ON(retval < 0); retval = cpuhp_setup_state_nocalls(CPUHP_CPUIDLE_DEAD, "cpuidle/pseries:DEAD", NULL, pseries_cpuidle_cpu_dead); WARN_ON(retval < 0); printk(KERN_DEBUG "pseries_idle_driver registered "); return 0;} device_initcall(pseries_processor_idle_init);
?
? 六、電源質量管理 ? ?
我們前面講了很多省電機制,但是也不能一味地省電。畢竟我們用計算機的目的是為了用計算機,而不是為了省電。省電也不能犧牲太大的性能,從而影響了用戶體驗。所以內核里開發(fā)PM QoS模塊,專門用來處理電源管理的服務質量問題。PM QoS是一個框架,面向顧客(內核和進程),它提供了請求接口,顧客可以請求系統(tǒng)某一方面的性能不能低于某個標準;面向省電機制,它提供了查詢接口,省電機制在進行省電的時候要通過這個接口進行查詢,然后省電的同時也要滿足這個標準。
PM QoS把對某一項性能的最低要求抽象為一個約束,所有顧客都可以對某個約束發(fā)出請求,也可以修改請求、移除請求,PM QoS會把能滿足所有要求的數值發(fā)給省電機制。約束可以分為兩類,系統(tǒng)級約束和設備級約束,系統(tǒng)級約束針對的是系統(tǒng)的性能,設備級約束針對的是一個設備。內核中的顧客可以直接調用接口函數來添加約束請求,用戶空間的顧客可以通過設備節(jié)點文件來添加約束請求。
6.1 系統(tǒng)級約束
系統(tǒng)級約束有兩個,CPU頻率和CPU延遲。CPU頻率代表的是CPU運行時的性能,頻率越高,性能越強,功耗也越大。CPU延遲是CPU Idle之后從低功耗狀態(tài)恢復到運行的時間,CPU idle之后可以處于不同的低功耗狀態(tài),狀態(tài)越深越省電,但是恢復的延遲越大。
下面我們首先看一下CPU頻率約束的定義和請求函數:
linux-src/include/linux/pm_qos.h
?
struct freq_constraints { struct pm_qos_constraints min_freq; struct blocking_notifier_head min_freq_notifiers; struct pm_qos_constraints max_freq; struct blocking_notifier_head max_freq_notifiers;}; struct pm_qos_constraints { struct plist_head list; s32 target_value; /* Do not change to 64 bit */ s32 default_value; s32 no_constraint_value; enum pm_qos_type type; struct blocking_notifier_head *notifiers;}; struct freq_qos_request { enum freq_qos_req_type type; struct plist_node pnode; struct freq_constraints *qos;}; enum freq_qos_req_type { FREQ_QOS_MIN = 1, FREQ_QOS_MAX,};
?
這是頻率約束的相關定義。
linux-src/kernel/power/qos.c
?
int freq_qos_add_request(struct freq_constraints *qos, struct freq_qos_request *req, enum freq_qos_req_type type, s32 value){ int ret; if (IS_ERR_OR_NULL(qos) || !req) return -EINVAL; if (WARN(freq_qos_request_active(req), "%s() called for active request ", __func__)) return -EINVAL; req->qos = qos; req->type = type; ret = freq_qos_apply(req, PM_QOS_ADD_REQ, value); if (ret < 0) { req->qos = NULL; req->type = 0; } return ret;} int freq_qos_apply(struct freq_qos_request *req, enum pm_qos_req_action action, s32 value){ int ret; switch(req->type) { case FREQ_QOS_MIN: ret = pm_qos_update_target(&req->qos->min_freq, &req->pnode, action, value); break; case FREQ_QOS_MAX: ret = pm_qos_update_target(&req->qos->max_freq, &req->pnode, action, value); break; default: ret = -EINVAL; } return ret;}
?
這是CPU頻率約束的請求函數。
linux-src/kernel/power/qos.c
?
s32 freq_qos_read_value(struct freq_constraints *qos, enum freq_qos_req_type type){ s32 ret; switch (type) { case FREQ_QOS_MIN: ret = IS_ERR_OR_NULL(qos) ? FREQ_QOS_MIN_DEFAULT_VALUE : pm_qos_read_value(&qos->min_freq); break; case FREQ_QOS_MAX: ret = IS_ERR_OR_NULL(qos) ? FREQ_QOS_MAX_DEFAULT_VALUE : pm_qos_read_value(&qos->max_freq); break; default: WARN_ON(1); ret = 0; } return ret;}
?
CPUFreq模塊會通過接口freq_qos_read_value來讀取CPU頻率約束,以便在動態(tài)調頻的時候也滿足最低性能要求。
下面我們首先看一下CPU延遲約束的定義和請求函數:
linux-src/include/linux/pm_qos.h
?
struct pm_qos_constraints { struct plist_head list; s32 target_value; /* Do not change to 64 bit */ s32 default_value; s32 no_constraint_value; enum pm_qos_type type; struct blocking_notifier_head *notifiers;}; struct pm_qos_request { struct plist_node node; struct pm_qos_constraints *qos;};
?
這是CPU延遲約束的定義。
linux-src/kernel/power/qos.c
?
void cpu_latency_qos_add_request(struct pm_qos_request *req, s32 value){ if (!req) return; if (cpu_latency_qos_request_active(req)) { WARN(1, KERN_ERR "%s called for already added request ", __func__); return; } trace_pm_qos_add_request(value); req->qos = &cpu_latency_constraints; cpu_latency_qos_apply(req, PM_QOS_ADD_REQ, value);} static void cpu_latency_qos_apply(struct pm_qos_request *req, enum pm_qos_req_action action, s32 value){ int ret = pm_qos_update_target(req->qos, &req->node, action, value); if (ret > 0) wake_up_all_idle_cpus();}
?
這是CPU延遲約束的請求函數
linux-src/kernel/power/qos.c
?
s32 cpu_latency_qos_limit(void){ return pm_qos_read_value(&cpu_latency_constraints);} s32 pm_qos_read_value(struct pm_qos_constraints *c){ return READ_ONCE(c->target_value);}
?
CPUIdle模塊會通過這個接口來讀取對CPU延遲的最小要求。
6.2 設備級約束
暫略
linux-src/drivers/base/power/qos.c
? 七、總結回顧 ? ?
通過本文我們對計算機的電源管理有了一個基本的了解,下面我們再看圖回憶一下:
電源管理分為電源狀態(tài)管理和省電管理兩個重要組成部分。電源狀態(tài)管理是對計算機的電源狀態(tài)進行管理,包括睡眠、休眠、關機、重啟等。省電管理是內核中的一些省電機制,可以很好的幫我們節(jié)省電力。光一味地省電也不行,還得考慮計算機的性能,所以電源管理中還有PM QoS來保證電源管理的服務質量,使得計算機的運行還要滿足一定的性能需求。
評論
查看更多