The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
GUINNESS(GUI 기반 이진화 신경망 합성기)는 GPU 훈련과 FPGA 추론을 모두 포함하는 GUI 기반 FPGA 구현을 위한 이진화 심층 신경망을 위한 오픈 소스 도구 흐름입니다. 모든 작업이 GUI에서 수행되므로 소프트웨어 디자이너는 신경망 구조, 훈련 동작을 설계하기 위한 스크립트를 작성할 필요가 없으며 하이퍼파라미터 값만 지정하면 됩니다. 훈련을 마친 후 Xilinx SDSoC 시스템 설계 도구 흐름을 사용하여 비트 스트림을 합성하는 C++ 코드를 자동으로 생성합니다. 따라서 우리의 툴 흐름은 FPGA 설계에 익숙하지 않은 소프트웨어 프로그래머에게 적합합니다. 도구 흐름에서는 이진화된 CNN 하드웨어에 대한 훈련 및 추론 모두의 훈련 알고리즘을 수정합니다. 하드웨어의 비트 정밀도는 제한되어 있으므로 훈련 시 최소한의 편향이 부족합니다. 또한, 하드웨어에 대한 추론을 위해서는 기존 배치 정규화 기법에 추가적인 하드웨어가 필요하다. 우리의 수정으로 이러한 문제가 해결되었습니다. 우리는 Digilent Inc. Zedboard에 VGG-11 벤치마크 CNN을 구현했습니다. 기존 FPGA의 이진화 구현과 비교하여 분류 정확도는 거의 동일했으며 전력 효율당 성능은 5.1배, 면적 효율은 8.0배, 메모리당 성능은 8.2배 더 좋습니다. 제안된 FPGA 설계를 CPU 및 GPU 설계와 비교합니다. ARM Cortex-A57과 비교하면 1776.3배 빠르고, 소비전력은 3.0배, 전력효율당 성능은 5706.3배 향상됐다. 또한 Maxwell GPU와 비교하면 11.5배 빠르며, 7.3배 더 낮은 전력 소모, 전력 효율당 성능은 83.0배 향상되었습니다. FPGA 기반 설계의 단점은 FPGA 실행 코드를 합성하는 데 추가 시간이 필요하다는 것입니다. 실험을 통해 75시간이 더 소요되었으며, 전체 FPGA 설계에는 XNUMX시간이 소요되었습니다. CNN의 훈련이 지배적이어서 그 규모가 상당하다.
Hiroki NAKAHARA
Tokyo Institute of Technology
Haruyoshi YONEKAWA
Tokyo Institute of Technology
Tomoya FUJII
Tokyo Institute of Technology
Masayuki SHIMODA
Tokyo Institute of Technology
Shimpei SATO
Tokyo Institute of Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
부
Hiroki NAKAHARA, Haruyoshi YONEKAWA, Tomoya FUJII, Masayuki SHIMODA, Shimpei SATO, "GUINNESS: A GUI Based Binarized Deep Neural Network Framework for Software Programmers" in IEICE TRANSACTIONS on Information,
vol. E102-D, no. 5, pp. 1003-1011, May 2019, doi: 10.1587/transinf.2018RCP0002.
Abstract: The GUINNESS (GUI based binarized neural network synthesizer) is an open-source tool flow for a binarized deep neural network toward FPGA implementation based on the GUI including both the training on the GPU and inference on the FPGA. Since all the operation is done on the GUI, the software designer is not necessary to write any scripts to design the neural network structure, training behavior, only specify the values for hyperparameters. After finishing the training, it automatically generates C++ codes to synthesis the bit-stream using the Xilinx SDSoC system design tool flow. Thus, our tool flow is suitable for the software programmers who are not familiar with the FPGA design. In our tool flow, we modify the training algorithms both the training and the inference for a binarized CNN hardware. Since the hardware has a limited number of bit precision, it lacks minimal bias in training. Also, for the inference on the hardware, the conventional batch normalization technique requires additional hardware. Our modifications solve these problems. We implemented the VGG-11 benchmark CNN on the Digilent Inc. Zedboard. Compared with the conventional binarized implementations on an FPGA, the classification accuracy was almost the same, the performance per power efficiency is 5.1 times better, as for the performance per area efficiency, it is 8.0 times better, and as for the performance per memory, it is 8.2 times better. We compare the proposed FPGA design with the CPU and the GPU designs. Compared with the ARM Cortex-A57, it was 1776.3 times faster, it dissipated 3.0 times lower power, and its performance per power efficiency was 5706.3 times better. Also, compared with the Maxwell GPU, it was 11.5 times faster, it dissipated 7.3 times lower power, and its performance per power efficiency was 83.0 times better. The disadvantage of our FPGA based design requires additional time to synthesize the FPGA executable codes. From the experiment, it consumed more three hours, and the total FPGA design took 75 hours. Since the training of the CNN is dominant, it is considerable.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2018RCP0002/_p
부
@ARTICLE{e102-d_5_1003,
author={Hiroki NAKAHARA, Haruyoshi YONEKAWA, Tomoya FUJII, Masayuki SHIMODA, Shimpei SATO, },
journal={IEICE TRANSACTIONS on Information},
title={GUINNESS: A GUI Based Binarized Deep Neural Network Framework for Software Programmers},
year={2019},
volume={E102-D},
number={5},
pages={1003-1011},
abstract={The GUINNESS (GUI based binarized neural network synthesizer) is an open-source tool flow for a binarized deep neural network toward FPGA implementation based on the GUI including both the training on the GPU and inference on the FPGA. Since all the operation is done on the GUI, the software designer is not necessary to write any scripts to design the neural network structure, training behavior, only specify the values for hyperparameters. After finishing the training, it automatically generates C++ codes to synthesis the bit-stream using the Xilinx SDSoC system design tool flow. Thus, our tool flow is suitable for the software programmers who are not familiar with the FPGA design. In our tool flow, we modify the training algorithms both the training and the inference for a binarized CNN hardware. Since the hardware has a limited number of bit precision, it lacks minimal bias in training. Also, for the inference on the hardware, the conventional batch normalization technique requires additional hardware. Our modifications solve these problems. We implemented the VGG-11 benchmark CNN on the Digilent Inc. Zedboard. Compared with the conventional binarized implementations on an FPGA, the classification accuracy was almost the same, the performance per power efficiency is 5.1 times better, as for the performance per area efficiency, it is 8.0 times better, and as for the performance per memory, it is 8.2 times better. We compare the proposed FPGA design with the CPU and the GPU designs. Compared with the ARM Cortex-A57, it was 1776.3 times faster, it dissipated 3.0 times lower power, and its performance per power efficiency was 5706.3 times better. Also, compared with the Maxwell GPU, it was 11.5 times faster, it dissipated 7.3 times lower power, and its performance per power efficiency was 83.0 times better. The disadvantage of our FPGA based design requires additional time to synthesize the FPGA executable codes. From the experiment, it consumed more three hours, and the total FPGA design took 75 hours. Since the training of the CNN is dominant, it is considerable.},
keywords={},
doi={10.1587/transinf.2018RCP0002},
ISSN={1745-1361},
month={May},}
부
TY - JOUR
TI - GUINNESS: A GUI Based Binarized Deep Neural Network Framework for Software Programmers
T2 - IEICE TRANSACTIONS on Information
SP - 1003
EP - 1011
AU - Hiroki NAKAHARA
AU - Haruyoshi YONEKAWA
AU - Tomoya FUJII
AU - Masayuki SHIMODA
AU - Shimpei SATO
PY - 2019
DO - 10.1587/transinf.2018RCP0002
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E102-D
IS - 5
JA - IEICE TRANSACTIONS on Information
Y1 - May 2019
AB - The GUINNESS (GUI based binarized neural network synthesizer) is an open-source tool flow for a binarized deep neural network toward FPGA implementation based on the GUI including both the training on the GPU and inference on the FPGA. Since all the operation is done on the GUI, the software designer is not necessary to write any scripts to design the neural network structure, training behavior, only specify the values for hyperparameters. After finishing the training, it automatically generates C++ codes to synthesis the bit-stream using the Xilinx SDSoC system design tool flow. Thus, our tool flow is suitable for the software programmers who are not familiar with the FPGA design. In our tool flow, we modify the training algorithms both the training and the inference for a binarized CNN hardware. Since the hardware has a limited number of bit precision, it lacks minimal bias in training. Also, for the inference on the hardware, the conventional batch normalization technique requires additional hardware. Our modifications solve these problems. We implemented the VGG-11 benchmark CNN on the Digilent Inc. Zedboard. Compared with the conventional binarized implementations on an FPGA, the classification accuracy was almost the same, the performance per power efficiency is 5.1 times better, as for the performance per area efficiency, it is 8.0 times better, and as for the performance per memory, it is 8.2 times better. We compare the proposed FPGA design with the CPU and the GPU designs. Compared with the ARM Cortex-A57, it was 1776.3 times faster, it dissipated 3.0 times lower power, and its performance per power efficiency was 5706.3 times better. Also, compared with the Maxwell GPU, it was 11.5 times faster, it dissipated 7.3 times lower power, and its performance per power efficiency was 83.0 times better. The disadvantage of our FPGA based design requires additional time to synthesize the FPGA executable codes. From the experiment, it consumed more three hours, and the total FPGA design took 75 hours. Since the training of the CNN is dominant, it is considerable.
ER -