The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
대규모 병렬 질의 처리를 위한 테스트베드로 PC 100대로 구성된 PC 클러스터 시스템을 개발했습니다. 각 PC는 200MHz Pentium Pro CPU를 사용하며 ATM 스위치를 통해 다른 PC와 연결됩니다. 질의 처리 응용 프로그램은 통신 지연에 민감하지 않고 주로 정수 연산을 수행하므로 ATM 연결 PC 클러스터 접근 방식은 저렴한 비용으로 고성능 데이터베이스 서버를 위한 합리적인 솔루션으로 간주될 수 있습니다. 그러나 저자가 알고 있는 한, 데이터베이스 애플리케이션을 위한 대규모 PC 클러스터를 구축하는 데 어려움은 없었습니다. 상용 컴포넌트를 최대한 활용했지만 병렬 질의 처리에서 고성능을 얻기 위한 핵심 컴포넌트인 DBMS 자체를 개발했는데, 우리의 요구를 충족할 수 있는 시스템이 없어 보였습니다. 각 PC 노드에서는 데이터베이스 커널 역할을 하는 서버 프로그램이 실행되어 다른 노드와 협력하여 쿼리를 처리합니다. 커널은 복잡한 의사결정 지원 유형 쿼리에서 높은 성능을 달성하기 위해 파이프라인 연산자를 실행하고 방대한 데이터를 효율적으로 처리하도록 설계되었습니다. 우리는 우리 시스템과 상용 병렬 시스템의 비교를 통해 우리 접근 방식의 타당성을 검증하기 위해 100GB 데이터베이스에서 표준 벤치마크인 TPC-D를 사용했습니다. 전체적으로 우리 시스템은 인덱스를 사용하지 않음에도 불구하고 현재 TPC-D 최고 기록과 경쟁할 만큼 충분히 높은 성능을 보였습니다. 선택성과 결합 가능성이 높은 벤치마크의 일부 무거운 쿼리의 경우 우리 시스템이 훨씬 더 나은 성능을 발휘했습니다. 또한, 추가적인 성능 향상을 위해 데이터베이스에 전치된 파일 구성을 적용했습니다. 전치된 파일 구성은 튜플을 수직으로 분할하여 관계에 대한 속성별 액세스를 가능하게 합니다. 이로 인해 디스크 I/O 양이 줄어들고 병목 현상이 계산으로 전환되어 성능이 크게 향상되었습니다.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
부
Takayuki TAMURA, Masato OGUCHI, Masaru KITSUREGAWA, "High Performance Parallel Query Processing on a 100 Node ATM Connected PC Cluster" in IEICE TRANSACTIONS on Information,
vol. E82-D, no. 1, pp. 54-63, January 1999, doi: .
Abstract: We developed a PC cluster system which consists of 100 PCs as a test bed for massively parallel query processing. Each PC employs the 200 MHz Pentium Pro CPU and is connected with others through an ATM switch. Because the query processing applications are insensitive to the communication latency and mainly perform integer operations, the ATM connected PC cluster approach can be considered a reasonable solution for high performance database servers with low costs. However, there has been no challenge to construct large scale PC clusters for database applications, as far as the authors know. Though we employed commodity components as much as possible, we developed the DBMS itself, because that was a key component for obtaining high performance in parallel query processing, and there seemed no system which could meet our demand. On each PC node, a server program which acts as a database kernel is running to process the queries in cooperation with other nodes. The kernel was designed to execute pipelined operators and handle voluminous data efficiently, to achieve high performance on complex decision support type queries. We used the standard benchmark, TPC-D, on a 100 GB database to verify the feasibility of our approach, through comparison of our system with commercial parallel systems. As a whole, our system exhibited sufficiently high performance which was competitive with the current TPC-D top records, in spite of not using indices. For some heavy queries in the benchmark, which have high selectivity and joinability, our system performed much better. In addition, we applied transposed file organization to the database for further performance improvement. The transposed file organization vertically partitions the tuples, enabling attribute-by-attribute access to the relations. This resulted in significant performance improvement by reducing the amount of disk I/O and shifting the bottleneck to computation.
URL: https://global.ieice.org/en_transactions/information/10.1587/e82-d_1_54/_p
부
@ARTICLE{e82-d_1_54,
author={Takayuki TAMURA, Masato OGUCHI, Masaru KITSUREGAWA, },
journal={IEICE TRANSACTIONS on Information},
title={High Performance Parallel Query Processing on a 100 Node ATM Connected PC Cluster},
year={1999},
volume={E82-D},
number={1},
pages={54-63},
abstract={We developed a PC cluster system which consists of 100 PCs as a test bed for massively parallel query processing. Each PC employs the 200 MHz Pentium Pro CPU and is connected with others through an ATM switch. Because the query processing applications are insensitive to the communication latency and mainly perform integer operations, the ATM connected PC cluster approach can be considered a reasonable solution for high performance database servers with low costs. However, there has been no challenge to construct large scale PC clusters for database applications, as far as the authors know. Though we employed commodity components as much as possible, we developed the DBMS itself, because that was a key component for obtaining high performance in parallel query processing, and there seemed no system which could meet our demand. On each PC node, a server program which acts as a database kernel is running to process the queries in cooperation with other nodes. The kernel was designed to execute pipelined operators and handle voluminous data efficiently, to achieve high performance on complex decision support type queries. We used the standard benchmark, TPC-D, on a 100 GB database to verify the feasibility of our approach, through comparison of our system with commercial parallel systems. As a whole, our system exhibited sufficiently high performance which was competitive with the current TPC-D top records, in spite of not using indices. For some heavy queries in the benchmark, which have high selectivity and joinability, our system performed much better. In addition, we applied transposed file organization to the database for further performance improvement. The transposed file organization vertically partitions the tuples, enabling attribute-by-attribute access to the relations. This resulted in significant performance improvement by reducing the amount of disk I/O and shifting the bottleneck to computation.},
keywords={},
doi={},
ISSN={},
month={January},}
부
TY - JOUR
TI - High Performance Parallel Query Processing on a 100 Node ATM Connected PC Cluster
T2 - IEICE TRANSACTIONS on Information
SP - 54
EP - 63
AU - Takayuki TAMURA
AU - Masato OGUCHI
AU - Masaru KITSUREGAWA
PY - 1999
DO -
JO - IEICE TRANSACTIONS on Information
SN -
VL - E82-D
IS - 1
JA - IEICE TRANSACTIONS on Information
Y1 - January 1999
AB - We developed a PC cluster system which consists of 100 PCs as a test bed for massively parallel query processing. Each PC employs the 200 MHz Pentium Pro CPU and is connected with others through an ATM switch. Because the query processing applications are insensitive to the communication latency and mainly perform integer operations, the ATM connected PC cluster approach can be considered a reasonable solution for high performance database servers with low costs. However, there has been no challenge to construct large scale PC clusters for database applications, as far as the authors know. Though we employed commodity components as much as possible, we developed the DBMS itself, because that was a key component for obtaining high performance in parallel query processing, and there seemed no system which could meet our demand. On each PC node, a server program which acts as a database kernel is running to process the queries in cooperation with other nodes. The kernel was designed to execute pipelined operators and handle voluminous data efficiently, to achieve high performance on complex decision support type queries. We used the standard benchmark, TPC-D, on a 100 GB database to verify the feasibility of our approach, through comparison of our system with commercial parallel systems. As a whole, our system exhibited sufficiently high performance which was competitive with the current TPC-D top records, in spite of not using indices. For some heavy queries in the benchmark, which have high selectivity and joinability, our system performed much better. In addition, we applied transposed file organization to the database for further performance improvement. The transposed file organization vertically partitions the tuples, enabling attribute-by-attribute access to the relations. This resulted in significant performance improvement by reducing the amount of disk I/O and shifting the bottleneck to computation.
ER -