The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
데이터 마이닝은 거대한 데이터베이스의 모든 데이터를 분석하여 데이터베이스 사용자에게 유용한 정보를 얻는 것입니다. 데이터 마이닝에서 잘 연구된 문제 중 하나는 막대한 양의 판매 거래가 포함된 장바구니 데이터베이스에서 의미 있는 연관 규칙을 검색하는 것입니다. 의미 있는 연관 규칙을 마이닝하는 문제는 먼저 모든 큰 항목 집합을 찾은 다음 큰 항목 집합에서 의미 있는 연관 규칙을 구성하는 것입니다. 이전 연구에서 우리는 주어진 크기의 큰 항목 집합이 존재하는지 여부를 결정하는 것이 NP-완전임을 보여주었습니다. 또한 우리는 다음과 같은 데이터베이스 하위 클래스를 제안했습니다. k- 모든 대규모 항목 집합을 효율적으로 찾을 수 있는 희소 데이터베이스. 직관적으로, k- 데이터베이스의 희소성은 크기의 항목 집합을 지원함을 의미합니다. k 또는 그 이상이 데이터베이스에서 충분히 낮습니다. 본 논문에서는 (의 개념을 소개한다.k,c)-희소성, 이는 다음보다 엄격하게 약합니다. k- 이전 작업의 희소성. 의 가치 c 희박함의 정도를 나타냅니다. (k,c)-희소성, 우리는 여전히 모든 대규모 항목 집합을 효율적으로 찾을 수 있는 더 큰 데이터베이스 하위 클래스를 제안합니다. 다음으로 지원에 대한 대안을 제안합니다. 각 측정값에 대해 항목 간 상관 관계를 나타내는 값이 지정된 임계값을 초과하는 경우 항목 집합을 고도로 동시 발생한다고 합니다. 본 논문에서 우리는 주어진 크기의 고도로 동시 발생하는 항목 집합이 존재하는지 여부를 결정하는 것으로 고도로 동시 발생하는 항목 집합 문제를 공식적으로 정의하고 문제가 어떤 척도로든 NP-완전하다는 것을 보여줍니다. 또한, (k,c)-희소성, 우리는 동시에 발생하는 모든 항목 집합을 효율적으로 찾을 수 있는 데이터베이스의 하위 클래스를 제안합니다.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
부
Yeon-Dae KWON, Yasunori ISHIHARA, Shougo SHIMIZU, Minoru ITO, "Computational Complexity of Finding Highly Co-occurrent Itemsets in Market Basket Databases" in IEICE TRANSACTIONS on Fundamentals,
vol. E83-A, no. 12, pp. 2723-2735, December 2000, doi: .
Abstract: Data mining is to analyze all the data in a huge database and to obtain useful information for database users. One of the well-studied problems in data mining is the search for meaningful association rules in a market basket database which contains massive amounts of sales transactions. The problem of mining meaningful association rules is to find all the large itemsets first, and then to construct meaningful association rules from the large itemsets. In our previous work, we have shown that it is NP-complete to decide whether there exists a large itemset with a given size. Also, we have proposed a subclass of databases, called k-sparse databases, for which we can efficiently find all the large itemsets. Intuitively, k-sparsity of a database means that the supports of itemsets of size k or more are sufficiently low in the database. In this paper, we introduce the notion of (k,c)-sparsity, which is strictly weaker than the k-sparsity in our previous work. The value of c represents a degree of sparsity. Using (k,c)-sparsity, we propose a larger subclass of databases for which we can still efficiently find all the large itemsets. Next, we propose alternative measures to the support. For each measure, an itemset is called highly co-occurrent if the value indicating the correlation among the items exceeds a given threshold. In this paper, we define the highly co-occurrent itemset problem formally as deciding whether there exists a highly co-occurrent itemset with a given size, and show that the problem is NP-complete under whichever measure. Furthermore, based on the notion of (k,c)-sparsity, we propose subclasses of databases for which we can efficiently find all the highly co-occurrent itemsets.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/e83-a_12_2723/_p
부
@ARTICLE{e83-a_12_2723,
author={Yeon-Dae KWON, Yasunori ISHIHARA, Shougo SHIMIZU, Minoru ITO, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Computational Complexity of Finding Highly Co-occurrent Itemsets in Market Basket Databases},
year={2000},
volume={E83-A},
number={12},
pages={2723-2735},
abstract={Data mining is to analyze all the data in a huge database and to obtain useful information for database users. One of the well-studied problems in data mining is the search for meaningful association rules in a market basket database which contains massive amounts of sales transactions. The problem of mining meaningful association rules is to find all the large itemsets first, and then to construct meaningful association rules from the large itemsets. In our previous work, we have shown that it is NP-complete to decide whether there exists a large itemset with a given size. Also, we have proposed a subclass of databases, called k-sparse databases, for which we can efficiently find all the large itemsets. Intuitively, k-sparsity of a database means that the supports of itemsets of size k or more are sufficiently low in the database. In this paper, we introduce the notion of (k,c)-sparsity, which is strictly weaker than the k-sparsity in our previous work. The value of c represents a degree of sparsity. Using (k,c)-sparsity, we propose a larger subclass of databases for which we can still efficiently find all the large itemsets. Next, we propose alternative measures to the support. For each measure, an itemset is called highly co-occurrent if the value indicating the correlation among the items exceeds a given threshold. In this paper, we define the highly co-occurrent itemset problem formally as deciding whether there exists a highly co-occurrent itemset with a given size, and show that the problem is NP-complete under whichever measure. Furthermore, based on the notion of (k,c)-sparsity, we propose subclasses of databases for which we can efficiently find all the highly co-occurrent itemsets.},
keywords={},
doi={},
ISSN={},
month={December},}
부
TY - JOUR
TI - Computational Complexity of Finding Highly Co-occurrent Itemsets in Market Basket Databases
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 2723
EP - 2735
AU - Yeon-Dae KWON
AU - Yasunori ISHIHARA
AU - Shougo SHIMIZU
AU - Minoru ITO
PY - 2000
DO -
JO - IEICE TRANSACTIONS on Fundamentals
SN -
VL - E83-A
IS - 12
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - December 2000
AB - Data mining is to analyze all the data in a huge database and to obtain useful information for database users. One of the well-studied problems in data mining is the search for meaningful association rules in a market basket database which contains massive amounts of sales transactions. The problem of mining meaningful association rules is to find all the large itemsets first, and then to construct meaningful association rules from the large itemsets. In our previous work, we have shown that it is NP-complete to decide whether there exists a large itemset with a given size. Also, we have proposed a subclass of databases, called k-sparse databases, for which we can efficiently find all the large itemsets. Intuitively, k-sparsity of a database means that the supports of itemsets of size k or more are sufficiently low in the database. In this paper, we introduce the notion of (k,c)-sparsity, which is strictly weaker than the k-sparsity in our previous work. The value of c represents a degree of sparsity. Using (k,c)-sparsity, we propose a larger subclass of databases for which we can still efficiently find all the large itemsets. Next, we propose alternative measures to the support. For each measure, an itemset is called highly co-occurrent if the value indicating the correlation among the items exceeds a given threshold. In this paper, we define the highly co-occurrent itemset problem formally as deciding whether there exists a highly co-occurrent itemset with a given size, and show that the problem is NP-complete under whichever measure. Furthermore, based on the notion of (k,c)-sparsity, we propose subclasses of databases for which we can efficiently find all the highly co-occurrent itemsets.
ER -