The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
본 논문에서는 200MHz, 1.6-GOPS 임베디드 RISC 프로세서를 위한 멀티미디어 아키텍처 확장 설계를 제시합니다. 데이터 전송의 병렬 실행과 SIMD(단일 명령 스트림 다중 데이터 스트림) 병렬 산술 연산을 실현하는 프로세서의 데이터 경로 아키텍처가 설계되었습니다. 16개의 SIMD 병렬 16비트 MAC(곱셈 누산) 명령은 16비트 누산의 정확성을 최대화하는 대칭 반올림 방식으로 도입되었습니다. 64비트 데이터 경로의 이 병렬 2비트 MAC는 멀티미디어 RISC 프로세서의 상관 관계 및 행렬 벡터 곱셈과 같은 DSP 애플리케이션에 효율적으로 활용되는 것으로 나타났습니다. 대칭형 반올림 방식의 병렬 MAC 명령어를 사용함으로써 IEEE1180을 만족하는 202D-IDCT를 XNUMX주기로 구현할 수 있다.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
부
Ichiro KURODA, Kouhei NADEHARA, "A Multimedia Architecture Extension for an Embedded RISC Processor" in IEICE TRANSACTIONS on Fundamentals,
vol. E84-A, no. 9, pp. 2255-2260, September 2001, doi: .
Abstract: This paper presents a multimedia architecture extension design for a 200-MHz, 1.6-GOPS embedded RISC processor. The datapath architecture of the processor which realizes parallel execution of data transfer and SIMD (single instruction stream multiple data stream) parallel arithmetic operations is designed. Four SIMD parallel 16-bit MAC (multiply-accumulation) instructions are introduced with a symmetric rounding scheme which maximizes the accuracy of the 16-bit accumulation. This parallel 16-bit MAC on a 64-bit datapath is shown to be efficiently utilized for DSP applications such as the correlation and the matrix-vector multiplications in the multimedia RISC processor. By using the parallel MAC instruction with the symmetric rounding scheme, a 2D-IDCT which satisfies the IEEE1180 can be implemented in 202 cycles.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/e84-a_9_2255/_p
부
@ARTICLE{e84-a_9_2255,
author={Ichiro KURODA, Kouhei NADEHARA, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={A Multimedia Architecture Extension for an Embedded RISC Processor},
year={2001},
volume={E84-A},
number={9},
pages={2255-2260},
abstract={This paper presents a multimedia architecture extension design for a 200-MHz, 1.6-GOPS embedded RISC processor. The datapath architecture of the processor which realizes parallel execution of data transfer and SIMD (single instruction stream multiple data stream) parallel arithmetic operations is designed. Four SIMD parallel 16-bit MAC (multiply-accumulation) instructions are introduced with a symmetric rounding scheme which maximizes the accuracy of the 16-bit accumulation. This parallel 16-bit MAC on a 64-bit datapath is shown to be efficiently utilized for DSP applications such as the correlation and the matrix-vector multiplications in the multimedia RISC processor. By using the parallel MAC instruction with the symmetric rounding scheme, a 2D-IDCT which satisfies the IEEE1180 can be implemented in 202 cycles.},
keywords={},
doi={},
ISSN={},
month={September},}
부
TY - JOUR
TI - A Multimedia Architecture Extension for an Embedded RISC Processor
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 2255
EP - 2260
AU - Ichiro KURODA
AU - Kouhei NADEHARA
PY - 2001
DO -
JO - IEICE TRANSACTIONS on Fundamentals
SN -
VL - E84-A
IS - 9
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - September 2001
AB - This paper presents a multimedia architecture extension design for a 200-MHz, 1.6-GOPS embedded RISC processor. The datapath architecture of the processor which realizes parallel execution of data transfer and SIMD (single instruction stream multiple data stream) parallel arithmetic operations is designed. Four SIMD parallel 16-bit MAC (multiply-accumulation) instructions are introduced with a symmetric rounding scheme which maximizes the accuracy of the 16-bit accumulation. This parallel 16-bit MAC on a 64-bit datapath is shown to be efficiently utilized for DSP applications such as the correlation and the matrix-vector multiplications in the multimedia RISC processor. By using the parallel MAC instruction with the symmetric rounding scheme, a 2D-IDCT which satisfies the IEEE1180 can be implemented in 202 cycles.
ER -