The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
순환 신경망(RNN)은 시간 정보를 처리하는 기능 덕분에 시퀀스 기반 작업에 효과적인 것으로 입증되었습니다. 실제 시스템에서 Deep RNN은 대규모 음성 인식 및 기계 번역과 같은 복잡한 작업을 해결하는 데 더 널리 사용됩니다. 그러나 기존 하드웨어 플랫폼에서 Deep RNN을 구현하는 것은 RNN 내의 장거리 시간 의존성과 불규칙한 계산 패턴으로 인해 비효율적입니다. 이러한 비효율성은 CPU 및 GPU의 심층 RNN 레이어 수와 관련하여 RNN 추론 지연 시간의 비례적인 증가로 나타납니다. 이전 작업은 주로 개별 RNN 셀을 최적화하고 가속화하는 데 중점을 두었습니다. 심층 RNN 추론을 빠르고 효율적으로 수행하기 위해 FiC(Flow-in-Cloud)라는 다중 FPGA 플랫폼 기반 가속기를 제안합니다. 이 연구에서는 대규모 모델을 여러 FPGA로 분할하여 다중 FPGA 시스템이 제공하는 병렬성을 활용하여 심층 RNN의 추론을 확장할 수 있음을 보여줍니다. RNN 레이어 수가 증가합니다. 단일 레이어 및 31레이어 RNN의 경우 우리의 구현은 Intel CPU에 비해 61배 및 XNUMX배의 속도 향상을 달성합니다.
Yuxi SUN
Keio University
Hideharu AMANO
Keio University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
부
Yuxi SUN, Hideharu AMANO, "FiC-RNN: A Multi-FPGA Acceleration Framework for Deep Recurrent Neural Networks" in IEICE TRANSACTIONS on Information,
vol. E103-D, no. 12, pp. 2457-2462, December 2020, doi: 10.1587/transinf.2020PAP0003.
Abstract: Recurrent neural networks (RNNs) have been proven effective for sequence-based tasks thanks to their capability to process temporal information. In real-world systems, deep RNNs are more widely used to solve complicated tasks such as large-scale speech recognition and machine translation. However, the implementation of deep RNNs on traditional hardware platforms is inefficient due to long-range temporal dependence and irregular computation patterns within RNNs. This inefficiency manifests itself in the proportional increase in the latency of RNN inference with respect to the number of layers of deep RNNs on CPUs and GPUs. Previous work has focused mostly on optimizing and accelerating individual RNN cells. To make deep RNN inference fast and efficient, we propose an accelerator based on a multi-FPGA platform called Flow-in-Cloud (FiC). In this work, we show that the parallelism provided by the multi-FPGA system can be taken advantage of to scale up the inference of deep RNNs, by partitioning a large model onto several FPGAs, so that the latency stays close to constant with respect to increasing number of RNN layers. For single-layer and four-layer RNNs, our implementation achieves 31x and 61x speedup compared with an Intel CPU.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2020PAP0003/_p
부
@ARTICLE{e103-d_12_2457,
author={Yuxi SUN, Hideharu AMANO, },
journal={IEICE TRANSACTIONS on Information},
title={FiC-RNN: A Multi-FPGA Acceleration Framework for Deep Recurrent Neural Networks},
year={2020},
volume={E103-D},
number={12},
pages={2457-2462},
abstract={Recurrent neural networks (RNNs) have been proven effective for sequence-based tasks thanks to their capability to process temporal information. In real-world systems, deep RNNs are more widely used to solve complicated tasks such as large-scale speech recognition and machine translation. However, the implementation of deep RNNs on traditional hardware platforms is inefficient due to long-range temporal dependence and irregular computation patterns within RNNs. This inefficiency manifests itself in the proportional increase in the latency of RNN inference with respect to the number of layers of deep RNNs on CPUs and GPUs. Previous work has focused mostly on optimizing and accelerating individual RNN cells. To make deep RNN inference fast and efficient, we propose an accelerator based on a multi-FPGA platform called Flow-in-Cloud (FiC). In this work, we show that the parallelism provided by the multi-FPGA system can be taken advantage of to scale up the inference of deep RNNs, by partitioning a large model onto several FPGAs, so that the latency stays close to constant with respect to increasing number of RNN layers. For single-layer and four-layer RNNs, our implementation achieves 31x and 61x speedup compared with an Intel CPU.},
keywords={},
doi={10.1587/transinf.2020PAP0003},
ISSN={1745-1361},
month={December},}
부
TY - JOUR
TI - FiC-RNN: A Multi-FPGA Acceleration Framework for Deep Recurrent Neural Networks
T2 - IEICE TRANSACTIONS on Information
SP - 2457
EP - 2462
AU - Yuxi SUN
AU - Hideharu AMANO
PY - 2020
DO - 10.1587/transinf.2020PAP0003
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E103-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2020
AB - Recurrent neural networks (RNNs) have been proven effective for sequence-based tasks thanks to their capability to process temporal information. In real-world systems, deep RNNs are more widely used to solve complicated tasks such as large-scale speech recognition and machine translation. However, the implementation of deep RNNs on traditional hardware platforms is inefficient due to long-range temporal dependence and irregular computation patterns within RNNs. This inefficiency manifests itself in the proportional increase in the latency of RNN inference with respect to the number of layers of deep RNNs on CPUs and GPUs. Previous work has focused mostly on optimizing and accelerating individual RNN cells. To make deep RNN inference fast and efficient, we propose an accelerator based on a multi-FPGA platform called Flow-in-Cloud (FiC). In this work, we show that the parallelism provided by the multi-FPGA system can be taken advantage of to scale up the inference of deep RNNs, by partitioning a large model onto several FPGAs, so that the latency stays close to constant with respect to increasing number of RNN layers. For single-layer and four-layer RNNs, our implementation achieves 31x and 61x speedup compared with an Intel CPU.
ER -