現在android智能手機市場異常火熱,硬件升級非常迅猛,arm cortex A9 + 1GB DDR似乎已經跟不上主流配置了。雖說硬件是王道,可我們還是不禁還懷疑這么強大的硬件配置得到充分利用了嗎?因此以后我都會正對ARM平臺分析kernel的內容。?
正文
在linux內存管理中,有兩個資源非常重要,一個是虛擬地址,一個是物理地址。聽起來似乎是廢話,實際上內存管理主要就是圍繞這兩個概念展開的。如果對linux kernel如果管理虛擬地址和物理地址還沒有概念的,建議瀏覽一下文獻【2】,這是一本很棒的書,言簡意賅。文獻【1】會講更多的實現細節。
本文主要目的是對內核1GB虛擬地址空間映射有個總體了解,包括:
1. 1GB內核虛擬地址空間具體用于什么地方?
2. 其和實際物理地址的映射關系.
3. 一些板級相關的宏定義,為了便于日后查閱,我也將這些宏定義整理了出來。根據這些宏定義,你也可以輕松畫出你所用的平臺的內核虛擬地址空間映射關系。
首先申明,實例中的映射規劃不見得就是最優的,但它卻是一個實際的例子。實際上我個人覺得還是有很多值得商榷的地方。
從下圖我們可以看到,粉色部分0xbf80 0000 ~ 0xc000 0000是為modules及kpmap的,從下面的板級宏定義我們可以看到,modules放在這段位置是因為它需要和kernel code段在32MB尋址空間內。kpmap為什么放這段空間我還不清楚,這個是在map highmem時用到的。
橙色部分0xc000 0000 ~ 0xe000 0000映射?lowmem(低端內存,即zone[Normal])。這段映射是一對一的平坦映射,也就是說kernel初始化這段映射后,頁表將不會改變。這樣即可以省去不斷的修改頁表,刷新TLB(TLB可以認為是頁表的硬件cache,如果訪問的虛擬地址的頁表在這個cache中,則CPU無需訪問DDR尋址頁表了,這樣可以提高IO效率)了。顯然這段地址空間非常珍貴,因為這段映射的效率高。從圖中我們可以看到,在512MB映射空間中,有128MB預留給PMEM(android特有的連續物理內存管理機制),16MB預留CP(modem運行空間)。實際可用lowmem大致只有360MB。
藍色部分0xe000 0000 ~ 0xf000 0000銀蛇highmem(高端內存,即zone[HighMem])。因為示例為1GB DDR,因此需要高端內存映射部分物理地址空間。
綠色部分0xf000 0000 ~ 0xffc0 0000為IO映射區域。我們知道在內核空間,比如寫驅動的時候,需要訪問芯片的寄存器(IO空間),部分IO空間映射是通過ioremap在VMALLOC區域動態申請映射,還有部分是系統初始化時通過iotable_init靜態映射的。圖中我們可以看到在IO靜態映射區域有大約200MB的空間沒有使用。這個是不是太浪費了呢?
紫色部分沒什么花頭,ARM default定義就是這樣的。
下圖給出了內核虛擬地址空間和實際物理地址的映射關系。
下面開始玩點激情的,看看這個mapping存在什么問題。
實際上我在這個平臺上遇到一個bug,即在用monkey test做壓力測試的時候,系統運行很長時間后會出現vmalloc失敗。OMG,調用vmalloc都會失敗,而且此時還有足夠多的物理內存,神奇吧?
【錯誤log】系統的graphic模塊在用vmalloc申請1MB內存時失敗
【分析】
1. 首先查看此時基本的內存信息。通過/proc/meminfo可以看到,實際可用物理內存還剩156MB,內存此時并未耗盡。vmalloc所使用的VMALLOC虛擬地址還剩余22MB,也是夠用的。根據vmalloc實現原理,它會通過調用alloc_page()去buddy系統中取一個個孤立的page(即在2^0鏈表上取page)。page此時是足夠多的,為什么會申請失敗呢?vmalloc要求虛擬地址是連續的,難道是VMALLOC中沒有連續的1MB虛擬地址了?
2. 帶著這個問題,我們繼續分析/proc/vmallocinfo.
從/proc/vmallocinfo的信息看到,VMALLOC已經用到0xefeff00了,那么最大可用連續空間為0xf0000000 - 0xefeff000 = 0x101000. 還記得我們要申請的內存空間大小嗎?沒錯,是0x1a0000。哇,第一次發現kernel虛擬地址也能耗盡。那為什么從meminfo信息來看還有22MB VMALLOC虛擬地址呢?顯然這段虛擬地址空間也產生了大量碎片。
好吧,虛擬地址資源耗盡,我們似乎也沒辦法了,窮途末路。不過本著研究的精神,我們還得懷疑為什么VMALLOC這段虛擬地址使用這么多,畢竟我們給這段空間規劃了256MB。物理內存還有這么多,為什么不直接調用kmalloc或者get_free_pages呢?
3. 繼續分析看下此時物理內存分布情況
/proc/buddyinfo可以看到buddy系統總得內存分配狀態,?及更多關于碎片管理的信息。
大致了解下pagetypeinfo,kernel會將物理內存分為不同的zone, 在我的平臺上上,有zone[Normal]及zone[HighMem]。migrate type是為避免內存碎片而設計的,不明的可以參考文獻【1】。從/proc/pagetypeinfo看到我們可以得到的最大連續內存為2^7個page,即512KB。看來此時是滿足不了graphic需求,進一步驗證的graphic為什么會大量使用vmalloc.
/proc/buddyinfo信息。
4. 結論
根據上面分析,graphic通過get_free_pages()向kernel的buddy系統申請連續內存,經過一段時間,buddy系統產生了大量碎片,graphic無法獲取連續的物理內存,因此通過vmalloc想從buddy系統申請不連續的內存,不幸的是VMALLOC的虛擬地址空間耗盡,盡管這是還有大量物理內存,vmalloc申請失敗。
5. 從新審視內存映射
這里一個問題就是lowmem的規劃空間太小了,vmalloc默認會從zone[HighMem]申請內存,這樣很容易在highmem產生碎片。看到最開始我們kernel虛擬映射圖了嗎?我們不是有200MB的虛擬空間沒有使用嗎?如果把它mapping給lowmem多好啊。
下面我對這段映射做了修改。最大的變化就是lowmem從512MB增加到了720MB。200MB未使用的虛擬地址空間得到了充分利用。
修改后,我們再看看buddy信息吧,最大可申請的連續內存為2^15個page=128MB。這樣的規劃也增加內存利用效率。
下面列表是板級相關的一些宏定義,這些宏定義決定了如何規劃內核虛擬地址。現在一般也沒什么機會從零開始bringup一塊新的芯片,因此這些定義大家可能不會關注。不過在研究內存規劃時,這些定義還是非常重要的,我將它們整理出來也是為了日后方便查閱。大家也可以試著根據自己的板子填寫這些宏定義,這樣整個內核空間映射視圖就會展現出來。
Board specific macro definition
Refer to [Documentation/arm/Porting]
Decompressor Symbols
Macro name
description
example
ZTEXTADDR
[arch/arm/boot/compressed/Makefile]
Start address of decompressor.? There's no point in talking about virtual or physical addresses here, since the MMU will be off at the time when you call the decompressor code.? You normally call the kernel at this address to start it booting.? This doesn't have to be located in RAM, it can be in flash or other read-only or read-write addressable medium.
0x0
ZTEXTADDR??????? := $(CONFIG_ZBOOT_ROM_TEXT)
ONFIG_ZBOOT_ROM_TEXT=0x0
ZBSSADDR
[arch/arm/boot/compressed/Makefile]
Start address of zero-initialised work area for the decompressor. This must be pointing at RAM.? The decompressor will zero initialize this for you.? Again, the MMU will be off.
0x0
ZBSSADDR?? := $(CONFIG_ZBOOT_ROM_BSS)
CONFIG_ZBOOT_ROM_BSS=0x0
ZRELADDR
[arch/arm/boot/Makefile]
This is the address where the decompressed kernel will be written, and eventually executed.? The following constraint must be valid:
__virt_to_phys(TEXTADDR) == ZRELADDR
The initial part of the kernel is carefully coded to be position independent.
Note: the following conditions must always be true:
ZRELADDR == virt_to_phys(PAGE_OFFSET + TEXT_OFFSET)
0x81088000
ZRELADDR??? := $(zreladdr-y)
zreladdr-y?????? := $(__ZRELADDR)
__ZRELADDR = TEXT_OFFSET + 0x80000000
[arch/arm/mach-pxa/Makefile.boot]
INITRD_PHYS
Physical address to place the initial RAM disk.? Only relevant if you are using the bootpImage stuff (which only works on the old struct param_struct).
INITRD_PHYS must be in RAM
Not defined
INITRD_VIRT
Virtual address of the initial RAM disk.? The following constraint must be valid:
__virt_to_phys(INITRD_VIRT) == INITRD_PHYS
Not defined
PARAMS_PHYS
Physical address of the struct param_struct or tag list, giving the kernel various parameters about its execution environment.
PARAMS_PHYS must be within 4MB of ZRELADDR
Not defined
Kernel Symbols
PHYS_OFFSET
[arch/arm/include/asm/memory.h]
Physical start address of the first bank of RAM.
#define PHYS_OFFSET????? PLAT_PHYS_OFFSET
#define PLAT_PHYS_OFFSET??? UL(0x80000000)
[arch/arm/mach-pxa/include/mach/memory.h]
PAGE_OFFSET
[arch/arm/include/asm/memory.h]
Virtual start address of the first bank of RAM.? During the kernel boot phase, virtual address PAGE_OFFSET will be mapped to physical address PHYS_OFFSET, along with any other mappings you supply. This should be the same value as TASK_SIZE.
CONFIG_PAGE_OFFSET
=0xC0000000
TASK_SIZE
[arch/arm/include/asm/memory.h]
The maximum size of a user process in bytes.? Since user space always starts at zero, this is the maximum address that a user process can access+1.? The user space stack grows down from this address.
Any virtual address below TASK_SIZE is deemed to be user process area, and therefore managed dynamically on a process by process basis by the kernel.? I'll call this the user segment.
Anything above TASK_SIZE is common to all processes.? I'll call this the kernel segment.
(In other words, you can't put IO mappings below TASK_SIZE, and hence PAGE_OFFSET).
CONFIG_PAGE_OFFSET
-0x01000000
=0xBF000000
TASK_UNMAPPED_BASE
[arch/arm/include/asm/memory.h]
the lower boundary of the mmap VM area
CONFIG_PAGE_OFFSET/3
=0x40000000
MODULES_VADDR
[arch/arm/include/asm/memory.h]
The module space lives between the addresses given by TASK_SIZE and PAGE_OFFSET - it must be within 32MB of the kernel text.
TEXT_OFFSET does not allow to use 16MB modules area as ARM32 branches to kernel may go out of range taking into account the kernel .text size
PAGE_OFFSET
- 8*1024*1024
=0x0XBF800000
MODULES_END
[arch/arm/include/asm/memory.h]
The highmem pkmap virtual space shares the end of the module area.
0XBFE00000
#ifdef CONFIG_HIGHMEM
#define MODULES_END?????????? (PAGE_OFFSET - PMD_SIZE)
#else
#define MODULES_END?????????? (PAGE_OFFSET)
#endif
TEXTADDR
Virtual start address of kernel, normally PAGE_OFFSET + 0x8000.
This is where the kernel image ends up.? With the latest kernels, it must be located at 32768 bytes into a 128MB region.? Previous kernels placed a restriction of 256MB here.
DATAADDR
Virtual address for the kernel data segment.? Must not be defined when using the decompressor.
VMALLOC_START
VMALLOC_END
[arch/arm/mach-pxa/include/mach/vmalloc.h]
Virtual addresses bounding the vmalloc() area.? There must not be any static mappings in this area; vmalloc will overwrite them. The addresses must also be in the kernel segment (see above). Normally, the vmalloc() area starts VMALLOC_OFFSET bytes above the last virtual RAM address (found using variable high_memory).
#define VMALLOC_END?????? (0xf0000000UL)
The default vmalloc size is 128MB.
vmalloc_min = (VMALLOC_END - SZ_128M);
[defined in arch/arm/mm/mmu.c]
If vmalloc is configured passed by OSL, then it’s redefined.
early_param("vmalloc", early_vmalloc);
[defined in arch/arm/mm/mmu.c]
VMALLOC_OFFSET
[arch/arm/include/asm/pgtable.h]
Offset normally incorporated into VMALLOC_START to provide a hole between virtual RAM and the vmalloc area.? We do this to allow out of bounds memory accesses (eg, something writing off the end of the mapped memory map) to be caught.? Normally set to 8MB.
#define VMALLOC_OFFSET?????????????? (8*1024*1024)
CONSISTENT_DMA_SIZE
CONSISTENT_BASE
CONSISTENT_END
[arch/arm/include/asm/memory.h]
Size of DMA-consistent memory region.? Must be multiple of 2M, between 2MB and 14MB inclusive.
CONSISTENT_DMA_SIZE = 2MB
CONSISTENT_BASE = 0XFFC00000
CONSISTENT_END = 0XFFE00000
FIXADDR_START
FIXADDR_TOP
FIXADDR_SIZE
[arch/arm/include/asm/fixmap.h]
fixed virtual addresses
#define FIXADDR_START????????? 0xfff00000UL
#define FIXADDR_TOP????????????? 0xfffe0000UL
#define FIXADDR_SIZE????????????? (FIXADDR_TOP - FIXADDR_START)
PKMAP_BASE
[arch/arm/include/asm/highmen.h]
0XBFE00000
#define PKMAP_BASE?????????????? (PAGE_OFFSET - PMD_SIZE)
?
評論
查看更多