本文內(nèi)容來(lái)自先楫開(kāi)發(fā)者 @Xusiwei1236,介紹了如何在HPM6750上運(yùn)行邊緣AI框架,感興趣的小伙伴快點(diǎn)來(lái)看看
--------------- 以下為測(cè)評(píng)內(nèi)容---------------
TFLM是什么?
你或許都聽(tīng)說(shuō)過(guò)TensorFlow——由谷歌開(kāi)發(fā)并開(kāi)源的一個(gè)機(jī)器學(xué)習(xí)庫(kù),它支持模型訓(xùn)練和模型推理。
今天介紹的TFLM,全稱是TensorFlow Lite for Microcontrollers,翻譯過(guò)來(lái)就是“針對(duì)微控制器的TensorFlow Lite”。那TensorFlow Lite又是什么呢?
TensorFlow Lite(通常簡(jiǎn)稱TFLite)其實(shí)是TensorFlow團(tuán)隊(duì)為了將模型部署到移動(dòng)設(shè)備而開(kāi)發(fā)的一套解決方案,通俗的說(shuō)就是手機(jī)版的TensorFlow。下面是TensorFlow官網(wǎng)上關(guān)于TFLite的一段介紹:
“TensorFlow Lite 是一組工具,可幫助開(kāi)發(fā)者在移動(dòng)設(shè)備、嵌入式設(shè)備和 loT 設(shè)備上運(yùn)行模型,以便實(shí)現(xiàn)設(shè)備端機(jī)器學(xué)習(xí)。”
而我們今天要介紹的TensorFlow Lite for Microcontrollers(TFLM)則是 TensorFlow Lite的微控制器版本。這里是官網(wǎng)上的一段介紹:
“ TensorFlow Lite for Microcontrollers (以下簡(jiǎn)稱TFLM)是 TensorFlow Lite 的一個(gè)實(shí)驗(yàn)性移植版本,它適用于微控制器和其他一些僅有數(shù)千字節(jié)內(nèi)存的設(shè)備。它可以直接在“裸機(jī)”上運(yùn)行,不需要操作系統(tǒng)支持、任何標(biāo)準(zhǔn) C/C++ 庫(kù)和動(dòng)態(tài)內(nèi)存分配。核心運(yùn)行時(shí)(core runtime)在 Cortex M3 上運(yùn)行時(shí)僅需 16KB,加上足以用來(lái)運(yùn)行語(yǔ)音關(guān)鍵字檢測(cè)模型的操作,也只需 22KB 的空間。”
這三者一脈相承,都出自谷歌,區(qū)別是TensorFlow同時(shí)支持訓(xùn)練和推理,而后兩者只支持推理。TFLite主要用于支持手機(jī)、平板等移動(dòng)設(shè)備,TFLM則可以支持單片機(jī)。從發(fā)展歷程上來(lái)說(shuō),后兩者都是TensorFlow項(xiàng)目的“支線項(xiàng)目”。或者說(shuō)這三者是一個(gè)樹(shù)形的發(fā)展過(guò)程,具體來(lái)說(shuō),TFLite是從TensorFlow項(xiàng)目分裂出來(lái)的,TFLite-Micro是從TFLite分裂出來(lái)的,目前是三個(gè)并行發(fā)展的。在很長(zhǎng)一段時(shí)間內(nèi),這三個(gè)項(xiàng)目的源碼都在一個(gè)代碼倉(cāng)中維護(hù),從源碼目錄的包含關(guān)系上來(lái)說(shuō),TensorFlow包含后兩者,TFLite包含tflite-micro。
HPMSDK中的TFLM
- TFLM中間件
HPM SDK中集成了TFLM中間件(類似庫(kù),但是沒(méi)有單獨(dú)編譯為庫(kù)),位于hpm_sdk\middleware子目錄:
這個(gè)子目錄的代碼是由TFLM開(kāi)源項(xiàng)目裁剪而來(lái),刪除了很多不需要的文件。
TFLM示例
HPM SDK中也提供了TFLM示例,位于hpm_sdk\samples\tflm子目錄:
示例代碼是從官方的persion_detection示例修改而來(lái),添加了攝像頭采集圖像和LCD顯示結(jié)果。
由于我手里沒(méi)有配套的攝像頭和顯示屏,所以本篇沒(méi)有以這個(gè)示例作為實(shí)驗(yàn)。
在HPM6750上運(yùn)行TFLM基準(zhǔn)測(cè)試
接下來(lái)以person detection benchmark為例,講解如何在HPM6750上運(yùn)行TFLM基準(zhǔn)測(cè)試。
將person detection benchmark源代碼添加到HPM SDK環(huán)境
按照如下步驟,在HPM SDK環(huán)境中添加person detection benchmark源代碼文件:
在HPM SDK的samples子目錄創(chuàng)建tflm_person_detect_benchmark目錄,并在其中創(chuàng)建src目錄;
從上文描述的已經(jīng)運(yùn)行過(guò)person detection benchmark的tflite-micro目錄中拷貝如下文件到src目錄:
tensorflow\lite\micro\benchmarks\person_detection_benchmark.cc
tensorflow\lite\micro\benchmarks\micro_benchmark.h
tensorflow\lite\micro\examples\person_detection\model_settings.h
tensorflow\lite\micro\examples\person_detection\model_settings.cc
在src目錄創(chuàng)建testdata子目錄,并將tflite-micro目錄下如下目錄中的文件拷貝全部到testdata中:
tensorflow\lite\micro\tools\make\gen\linux_x86_64_default\genfiles\tensorflow\lite\micro\examples\person_detection\testdata
修改person_detection_benchmark.cc、model_settings.cc、no_person_image_data.cc、person_image_data.cc 文件中部分#include預(yù)處理指令的文件路徑(根據(jù)拷貝后的相對(duì)路徑修改);
person_detection_benchmark.cc文件中,main函數(shù)的一開(kāi)始添加一行board_init();、頂部添加一行#include "board.h”
添加CMakeLists.txt和app.yaml文件
在src平級(jí)創(chuàng)建CMakeLists.txt文件,內(nèi)容如下:
cmake_minimum_required(VERSION 3.13)
set(CONFIG_TFLM 1)
find_package(hpm-sdk REQUIRED HINTS $ENV{HPM_SDK_BASE})
project(tflm_person_detect_benchmark)
set(CMAKE_CXX_STANDARD 11)
sdk_app_src(src/model_settings.cc)
sdk_app_src(src/person_detection_benchmark.cc)
sdk_app_src(src/testdata/no_person_image_data.cc)
sdk_app_src(src/testdata/person_image_data.cc)
sdk_app_inc(src)
sdk_ld_options("-lm")
sdk_ld_options("--std=c++11")
sdk_compile_definitions(__HPMICRO__)
sdk_compile_definitions(-DINIT_EXT_RAM_FOR_DATA=1)
# sdk_compile_options("-mabi=ilp32f")
# sdk_compile_options("-march=rv32imafc")
sdk_compile_options("-O2")
# sdk_compile_options("-O3")
set(SEGGER_LEVEL_O3 1)
generate_ses_project()
在src平級(jí)創(chuàng)建app.yaml文件,內(nèi)容如下:
dependency:
- tflm
- 編譯和運(yùn)行TFLM基準(zhǔn)測(cè)試
接下來(lái)就是大家熟悉的——編譯運(yùn)行了。首先,使用generate_project生產(chǎn)項(xiàng)目:接著,將HPM6750開(kāi)發(fā)板連接到PC,在Embedded Studio中打卡剛剛生產(chǎn)的項(xiàng)目:這個(gè)項(xiàng)目因?yàn)橐肓薚FLM的源碼,文件較多,所以右邊的源碼導(dǎo)航窗里面的Indexing要執(zhí)行很久才能結(jié)束。
然后,就可以使用F7編譯、F5調(diào)試項(xiàng)目了:
編譯完成后,先打卡串口終端連接到設(shè)備串口,波特率115200。啟動(dòng)調(diào)試后,直接繼續(xù)運(yùn)行,就可以在串口終端中看到基準(zhǔn)測(cè)試的輸出了:
==============================
hpm6750evkmini clock summary
==============================
cpu0: 816000000Hz
cpu1: 816000000Hz
axi0: 200000000Hz
axi1: 200000000Hz
axi2: 200000000Hz
ahb: 200000000Hz
mchtmr0: 24000000Hz
mchtmr1: 1000000Hz
xpi0: 133333333Hz
xpi1: 400000000Hz
dram: 166666666Hz
display: 74250000Hz
cam0: 59400000Hz
cam1: 59400000Hz
jpeg: 200000000Hz
pdma: 200000000Hz
==============================
----------------------------------------------------------------------
$$\ $$\ $$$$$$$\ $$\ $$\ $$\
$$ | $$ |$$ __$$\ $$$\ $$$ |\__|
$$ | $$ |$$ | $$ |$$$$\ $$$$ |$$\ $$$$$$$\ $$$$$$\ $$$$$$\
$$$$$$$$ |$$$$$$$ |$$\$$\$$ $$ |$$ |$$ _____|$$ __$$\ $$ __$$\
$$ __$$ |$$ ____/ $$ \$$$ $$ |$$ |$$ / $$ | \__|$$ / $$ |
$$ | $$ |$$ | $$ |\$ /$$ |$$ |$$ | $$ | $$ | $$ |
$$ | $$ |$$ | $$ | \_/ $$ |$$ |\$$$$$$$\ $$ | \$$$$$$ |
\__| \__|\__| \__| \__|\__| \_______|\__| \______/
----------------------------------------------------------------------
InitializeBenchmarkRunner took 114969 ticks (4 ms).
WithPersonDataIterations(1) took 10694521 ticks (445 ms)
DEPTHWISE_CONV_2D took 275798 ticks (11 ms).
DEPTHWISE_CONV_2D took 280579 ticks (11 ms).
CONV_2D took 516051 ticks (21 ms).
DEPTHWISE_CONV_2D took 139000 ticks (5 ms).
CONV_2D took 459646 ticks (19 ms).
DEPTHWISE_CONV_2D took 274903 ticks (11 ms).
CONV_2D took 868518 ticks (36 ms).
DEPTHWISE_CONV_2D took 68180 ticks (2 ms).
CONV_2D took 434392 ticks (18 ms).
DEPTHWISE_CONV_2D took 132918 ticks (5 ms).
CONV_2D took 843014 ticks (35 ms).
DEPTHWISE_CONV_2D took 33228 ticks (1 ms).
CONV_2D took 423288 ticks (17 ms).
DEPTHWISE_CONV_2D took 62040 ticks (2 ms).
CONV_2D took 833033 ticks (34 ms).
DEPTHWISE_CONV_2D took 62198 ticks (2 ms).
CONV_2D took 834644 ticks (34 ms).
DEPTHWISE_CONV_2D took 62176 ticks (2 ms).
CONV_2D took 838212 ticks (34 ms).
DEPTHWISE_CONV_2D took 62206 ticks (2 ms).
CONV_2D took 832857 ticks (34 ms).
DEPTHWISE_CONV_2D took 62194 ticks (2 ms).
CONV_2D took 832882 ticks (34 ms).
DEPTHWISE_CONV_2D took 16050 ticks (0 ms).
CONV_2D took 438774 ticks (18 ms).
DEPTHWISE_CONV_2D took 27494 ticks (1 ms).
CONV_2D took 974362 ticks (40 ms).
AVERAGE_POOL_2D took 2323 ticks (0 ms).
CONV_2D took 1128 ticks (0 ms).
RESHAPE took 184 ticks (0 ms).
SOFTMAX took 2249 ticks (0 ms).
NoPersonDataIterations(1) took 10694160 ticks (445 ms)
DEPTHWISE_CONV_2D took 274922 ticks (11 ms).
DEPTHWISE_CONV_2D took 281095 ticks (11 ms).
CONV_2D took 515380 ticks (21 ms).
DEPTHWISE_CONV_2D took 139428 ticks (5 ms).
CONV_2D took 460039 ticks (19 ms).
DEPTHWISE_CONV_2D took 275255 ticks (11 ms).
CONV_2D took 868787 ticks (36 ms).
DEPTHWISE_CONV_2D took 68384 ticks (2 ms).
CONV_2D took 434537 ticks (18 ms).
DEPTHWISE_CONV_2D took 133071 ticks (5 ms).
CONV_2D took 843202 ticks (35 ms).
DEPTHWISE_CONV_2D took 33291 ticks (1 ms).
CONV_2D took 423388 ticks (17 ms).
DEPTHWISE_CONV_2D took 62190 ticks (2 ms).
CONV_2D took 832978 ticks (34 ms).
DEPTHWISE_CONV_2D took 62205 ticks (2 ms).
CONV_2D took 834636 ticks (34 ms).
DEPTHWISE_CONV_2D took 62213 ticks (2 ms).
CONV_2D took 838212 ticks (34 ms).
DEPTHWISE_CONV_2D took 62239 ticks (2 ms).
CONV_2D took 832850 ticks (34 ms).
DEPTHWISE_CONV_2D took 62217 ticks (2 ms).
CONV_2D took 832856 ticks (34 ms).
DEPTHWISE_CONV_2D took 16040 ticks (0 ms).
CONV_2D took 438779 ticks (18 ms).
DEPTHWISE_CONV_2D took 27481 ticks (1 ms).
CONV_2D took 974354 ticks (40 ms).
AVERAGE_POOL_2D took 1812 ticks (0 ms).
CONV_2D took 1077 ticks (0 ms).
RESHAPE took 341 ticks (0 ms).
SOFTMAX took 901 ticks (0 ms).
WithPersonDataIterations(10) took 106960312 ticks (4456 ms)
NoPersonDataIterations(10) took 106964554 ticks (4456 ms)
可以看到,在HPM6750EVKMINI開(kāi)發(fā)板上,連續(xù)運(yùn)行10次人像檢測(cè)模型,總體耗時(shí)4456毫秒,每次平均耗時(shí)445.6毫秒。
在樹(shù)莓派3B+上運(yùn)行TFLM基準(zhǔn)測(cè)試
在樹(shù)莓派上運(yùn)行TFLM基準(zhǔn)測(cè)試
樹(shù)莓派3B+上可以和PC上類似,直接運(yùn)行PC端的測(cè)試命令,得到基準(zhǔn)測(cè)試結(jié)果:
可以看到,在樹(shù)莓派3B+上的,對(duì)于有人臉的圖片,連續(xù)運(yùn)行10次人臉檢測(cè)模型,總體耗時(shí)4186毫秒,每次平均耗時(shí)418.6毫秒;對(duì)于無(wú)人臉的圖片,連續(xù)運(yùn)行10次人臉檢測(cè)模型,耗時(shí)4190毫秒,每次平均耗時(shí)419毫秒。
HPM6750和樹(shù)莓派3B+、AMD R7 4800H上的基準(zhǔn)測(cè)試結(jié)果對(duì)比
這里將HPM6750EVKMINI開(kāi)發(fā)板、樹(shù)莓派3B+和AMD R7 4800H上運(yùn)行人臉檢測(cè)模型的平均耗時(shí)結(jié)果匯總?cè)缦拢?/span>
可以看到,在TFLM人臉檢測(cè)模型計(jì)算場(chǎng)景下,HPM6750EVKMINI和樹(shù)莓派3B+成績(jī)相當(dāng)。雖然HPM6750的816MHz CPU頻率比樹(shù)莓派3B+搭載的BCM2837 Cortex-A53 1.4GHz的主頻低,但是在單核心計(jì)算能力上沒(méi)有相差太多。
這里樹(shù)莓派3B+上的TFLM基準(zhǔn)測(cè)試程序是運(yùn)行在64位Debian Linux發(fā)行版上的,而HPM6750上的測(cè)試程序是直接運(yùn)行在裸機(jī)上的。由于操作系統(tǒng)內(nèi)核中任務(wù)調(diào)度器的存在,會(huì)對(duì)CPU的計(jì)算能力帶來(lái)一定損耗。所以,這里進(jìn)行的并不是一個(gè)嚴(yán)格意義上的對(duì)比測(cè)試,測(cè)試結(jié)果僅供參考。
(本文參考鏈接:http://m.eeworld.com.cn/bbs_thread-1208270-1-1.html)
-
AI
+關(guān)注
關(guān)注
87文章
31493瀏覽量
270097
發(fā)布評(píng)論請(qǐng)先 登錄
相關(guān)推薦
評(píng)論