国产欧美精品一区二区,中文字幕专区在线亚洲,国产精品美女网站在线观看,艾秋果冻传媒2021精品,在线免费一区二区,久久久久久青草大香综合精品,日韩美aaa特级毛片,欧美成人精品午夜免费影视

基于 FPGA 的深度可分離卷積加速器研究
DOI:
CSTR:
作者:
作者單位:

中北大學(xué)儀器與電子學(xué)院

作者簡(jiǎn)介:

通訊作者:

中圖分類(lèi)號:

基金項目:


Research on depth-separable convolution accelerators based on FPGA
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 圖/表
  • |
  • 訪(fǎng)問(wèn)統計
  • |
  • 參考文獻
  • |
  • 相似文獻
  • |
  • 引證文獻
  • |
  • 資源附件
  • |
  • 文章評論
    摘要:

    設計了一種基于FPGA的低功耗深度可分離卷積加速核;根據PW卷積和DW卷積計算中的共性,采用一種固定乘法陣列通過(guò)改變特征和權重輸入數據流的方式實(shí)現兩種卷積的計算結構,最大化DSP的利用率;針對8位非對稱(chēng)量化中符號位可能會(huì )溢出的問(wèn)題,采用符號位單獨處理的方法重新封裝了雙乘法器結構;通過(guò)層內7級流水結構保證每個(gè)周期數據處理的并行度;在Zynq UltraScale+系列FPGA上成功部署了加速結構;經(jīng)實(shí)驗測試,提出的加速結構在提高網(wǎng)絡(luò )推理速度的同時(shí)降低了片上資源的依賴(lài)度和整體功耗,原生MobilenetV2在所提FPGA加速器上的平均吞吐率高達130.6GOPS且整體功耗只有4.1w,滿(mǎn)足實(shí)時(shí)邊緣計算的要求;相比其他硬件平臺,能效比有明顯提升;與FPGA上的同類(lèi)型加速器相比,在性能密度(GOPS/LUT)、功率效率(GOPS/W)和DSP效率(GOPS/DSP)上均有優(yōu)勢。

    Abstract:

    A low power deep separable convolution accelerator kernel based on FPGA is designed. According to the commonality of PW convolution and DW convolution calculation, a fixed multiplicative array is used to realize the two convolution calculation structures by changing the feature and weight input data stream, so as to maximize the utilization of DSP. In order to solve the problem that the sign bit may overflow in 8-bit asymmetric quantization, the double multiplier structure is repackaged by using the sign bit processing method. The parallelism of data processing in each cycle is guaranteed by the 7-level pipelining structure in the layer. Successfully deployed the accelerator structure on the Zynq UltraScale+ series FPGA; The experimental results show that the proposed acceleration structure can improve the inference speed of the network and reduce the dependence of on-chip resources and the overall power consumption. The average throughput of the original MobilenetV2 on the proposed FPGA accelerator is as high as 130.6GOPS and the overall power consumption is only 4.1w, which meets the requirements of real-time edge computing. Compared with other hardware platforms, the energy efficiency ratio is significantly improved; Compared with the same type of accelerator on FPGA, it has advantages in performance density (GOPS/LUT), power efficiency (GOPS/W) and DSP efficiency (GOPS/DSP).

    參考文獻
    相似文獻
    引證文獻
引用本文

畫(huà)芊昊,李博,杜宸罡.基于 FPGA 的深度可分離卷積加速器研究計算機測量與控制[J].,2024,32(5):267-273.

復制
分享
文章指標
  • 點(diǎn)擊次數:
  • 下載次數:
  • HTML閱讀次數:
  • 引用次數:
歷史
  • 收稿日期:2023-12-21
  • 最后修改日期:2024-01-08
  • 錄用日期:2024-01-10
  • 在線(xiàn)發(fā)布日期: 2024-05-22
  • 出版日期:
文章二維碼
永顺县| 榆中县| 酉阳| 石门县| 无极县| 商都县| 胶州市| 恩施市| 南城县| 莱芜市| 吴江市| 余庆县| 荣昌县| 衡水市| 龙游县| 格尔木市| 凤山县| 贞丰县| 泸水县| 方正县| 通渭县| 东海县| 灵石县| 买车| 陆丰市| 福州市| 久治县| 中牟县| 云安县| 千阳县| 玉山县| 西充县| 泰宁县| 自治县| 宣城市| 尚义县| 聂拉木县| 衡山县| 安丘市| 鸡西市| 阳信县|