The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
본 논문에서는 왜곡된 데이터로 인해 발생하는 MapReduce 셔플링 문제를 해결하기 위해 고안된 세 가지 인메모리 셔플링 방법을 제안하고 검토합니다. CSA(Coupled Shuffle Architecture)는 단일 쌍 전체 교환을 사용하여 해당 블록의 메타데이터가 포함된 두 블록, 셔플 전송 단위 및 메타 블록을 모두 섞습니다. 분리된 셔플 아키텍처(DSA)는 메타 블록과 블록의 셔플링을 분리하고 각 셔플링 프로세스에 서로 다른 전체 교환 알고리즘을 적용하여 심하게 편향된 분포에서 낙오자의 영향을 완화하려고 시도합니다. Skew-Aware Meta-Shuffle(SMS 포함 DSA)을 갖춘 분리된 Shuffle 아키텍처는 각 작업자 프로세스의 메모리 소비를 기반으로 블록의 적절한 배치를 자동으로 결정합니다. 이 접근 방식은 일부 작업자 프로세스가 노드 메모리 제한을 초과할 수 있는 극도로 왜곡된 상황을 대상으로 합니다. 이 연구에서는 InfiniBand 및 Intel Omni-Path와 같은 고성능 상호 연결을 사용하는 프로토타입 인메모리 MapReduce 엔진에서 세 가지 셔플링 방법의 구현을 평가합니다. 우리의 결과는 SMS가 포함된 DSA가 극도로 편향된 데이터 배포에 대한 유일하게 실행 가능한 솔루션임을 시사합니다. 또한 다양한 왜곡 상황에서 CSA 및 DSA의 성능에 대한 자세한 조사를 제시합니다.
Harunobu DAIKOKU
University of Tsukuba
Hideyuki KAWASHIMA
Keio University
Osamu TATEBE
University of Tsukuba
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
부
Harunobu DAIKOKU, Hideyuki KAWASHIMA, Osamu TATEBE, "Skew-Aware Collective Communication for MapReduce Shuffling" in IEICE TRANSACTIONS on Information,
vol. E102-D, no. 12, pp. 2389-2399, December 2019, doi: 10.1587/transinf.2019PAP0019.
Abstract: This paper proposes and examines the three in-memory shuffling methods designed to address problems in MapReduce shuffling caused by skewed data. Coupled Shuffle Architecture (CSA) employs a single pairwise all-to-all exchange to shuffle both blocks, units of shuffle transfer, and meta-blocks, which contain the metadata of corresponding blocks. Decoupled Shuffle Architecture (DSA) separates the shuffling of meta-blocks and blocks, and applies different all-to-all exchange algorithms to each shuffling process, attempting to mitigate the impact of stragglers in strongly skewed distributions. Decoupled Shuffle Architecture with Skew-Aware Meta-Shuffle (DSA w/ SMS) autonomously determines the proper placement of blocks based on the memory consumption of each worker process. This approach targets extremely skewed situations where some worker processes could exceed their node memory limitation. This study evaluates implementations of the three shuffling methods in our prototype in-memory MapReduce engine, which employs high performance interconnects such as InfiniBand and Intel Omni-Path. Our results suggest that DSA w/ SMS is the only viable solution for extremely skewed data distributions. We also present a detailed investigation of the performance of CSA and DSA in various skew situations.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2019PAP0019/_p
부
@ARTICLE{e102-d_12_2389,
author={Harunobu DAIKOKU, Hideyuki KAWASHIMA, Osamu TATEBE, },
journal={IEICE TRANSACTIONS on Information},
title={Skew-Aware Collective Communication for MapReduce Shuffling},
year={2019},
volume={E102-D},
number={12},
pages={2389-2399},
abstract={This paper proposes and examines the three in-memory shuffling methods designed to address problems in MapReduce shuffling caused by skewed data. Coupled Shuffle Architecture (CSA) employs a single pairwise all-to-all exchange to shuffle both blocks, units of shuffle transfer, and meta-blocks, which contain the metadata of corresponding blocks. Decoupled Shuffle Architecture (DSA) separates the shuffling of meta-blocks and blocks, and applies different all-to-all exchange algorithms to each shuffling process, attempting to mitigate the impact of stragglers in strongly skewed distributions. Decoupled Shuffle Architecture with Skew-Aware Meta-Shuffle (DSA w/ SMS) autonomously determines the proper placement of blocks based on the memory consumption of each worker process. This approach targets extremely skewed situations where some worker processes could exceed their node memory limitation. This study evaluates implementations of the three shuffling methods in our prototype in-memory MapReduce engine, which employs high performance interconnects such as InfiniBand and Intel Omni-Path. Our results suggest that DSA w/ SMS is the only viable solution for extremely skewed data distributions. We also present a detailed investigation of the performance of CSA and DSA in various skew situations.},
keywords={},
doi={10.1587/transinf.2019PAP0019},
ISSN={1745-1361},
month={December},}
부
TY - JOUR
TI - Skew-Aware Collective Communication for MapReduce Shuffling
T2 - IEICE TRANSACTIONS on Information
SP - 2389
EP - 2399
AU - Harunobu DAIKOKU
AU - Hideyuki KAWASHIMA
AU - Osamu TATEBE
PY - 2019
DO - 10.1587/transinf.2019PAP0019
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E102-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2019
AB - This paper proposes and examines the three in-memory shuffling methods designed to address problems in MapReduce shuffling caused by skewed data. Coupled Shuffle Architecture (CSA) employs a single pairwise all-to-all exchange to shuffle both blocks, units of shuffle transfer, and meta-blocks, which contain the metadata of corresponding blocks. Decoupled Shuffle Architecture (DSA) separates the shuffling of meta-blocks and blocks, and applies different all-to-all exchange algorithms to each shuffling process, attempting to mitigate the impact of stragglers in strongly skewed distributions. Decoupled Shuffle Architecture with Skew-Aware Meta-Shuffle (DSA w/ SMS) autonomously determines the proper placement of blocks based on the memory consumption of each worker process. This approach targets extremely skewed situations where some worker processes could exceed their node memory limitation. This study evaluates implementations of the three shuffling methods in our prototype in-memory MapReduce engine, which employs high performance interconnects such as InfiniBand and Intel Omni-Path. Our results suggest that DSA w/ SMS is the only viable solution for extremely skewed data distributions. We also present a detailed investigation of the performance of CSA and DSA in various skew situations.
ER -