The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
클래스 불균형은 머신러닝 분야에서 직면한 과제 중 하나입니다. 기존 분류기는 소수 클래스 데이터를 예측하기가 어렵습니다. 불균형한 데이터를 처리하지 않으면 분류기의 효과가 크게 감소합니다. 기존 분류기가 다수 클래스 데이터에 치중하고 소수 클래스 데이터를 무시하는 문제를 해결하기 위해 반복적 자기 조직화 데이터 분석 기술 알고리즘(ISODATA) 클러스터링을 기반으로 한 불균형 데이터 오버샘플링 방법을 제안한다. 소수 클래스는 ISODATA에 따라 서로 다른 하위 클러스터로 나뉘며 각 하위 클러스터는 샘플링 비율에 따라 오버 샘플링되므로 샘플링된 소수 클래스 데이터도 원래 소수 클래스 데이터의 불균형을 따릅니다. 새로운 소수 클래스 데이터와 다수 클래스 데이터로 구성된 새로운 불균형 데이터는 SVM과 Random Forest 분류기를 통해 분류됩니다. KEEL 데이터 세트의 12개 데이터 세트에 대한 실험에서는 이 방법이 더 나은 G-평균과 F-값을 가지며 분류 정확도가 향상됨을 보여줍니다.
Zhenzhe LV
Yantai University
Qicheng LIU
Yantai University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
부
Zhenzhe LV, Qicheng LIU, "Imbalanced Data Over-Sampling Method Based on ISODATA Clustering" in IEICE TRANSACTIONS on Information,
vol. E106-D, no. 9, pp. 1528-1536, September 2023, doi: 10.1587/transinf.2022EDP7190.
Abstract: Class imbalance is one of the challenges faced in the field of machine learning. It is difficult for traditional classifiers to predict the minority class data. If the imbalanced data is not processed, the effect of the classifier will be greatly reduced. Aiming at the problem that the traditional classifier tends to the majority class data and ignores the minority class data, imbalanced data over-sampling method based on iterative self-organizing data analysis technique algorithm(ISODATA) clustering is proposed. The minority class is divided into different sub-clusters by ISODATA, and each sub-cluster is over-sampled according to the sampling ratio, so that the sampled minority class data also conforms to the imbalance of the original minority class data. The new imbalanced data composed of new minority class data and majority class data is classified by SVM and Random Forest classifier. Experiments on 12 datasets from the KEEL datasets show that the method has better G-means and F-value, improving the classification accuracy.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2022EDP7190/_p
부
@ARTICLE{e106-d_9_1528,
author={Zhenzhe LV, Qicheng LIU, },
journal={IEICE TRANSACTIONS on Information},
title={Imbalanced Data Over-Sampling Method Based on ISODATA Clustering},
year={2023},
volume={E106-D},
number={9},
pages={1528-1536},
abstract={Class imbalance is one of the challenges faced in the field of machine learning. It is difficult for traditional classifiers to predict the minority class data. If the imbalanced data is not processed, the effect of the classifier will be greatly reduced. Aiming at the problem that the traditional classifier tends to the majority class data and ignores the minority class data, imbalanced data over-sampling method based on iterative self-organizing data analysis technique algorithm(ISODATA) clustering is proposed. The minority class is divided into different sub-clusters by ISODATA, and each sub-cluster is over-sampled according to the sampling ratio, so that the sampled minority class data also conforms to the imbalance of the original minority class data. The new imbalanced data composed of new minority class data and majority class data is classified by SVM and Random Forest classifier. Experiments on 12 datasets from the KEEL datasets show that the method has better G-means and F-value, improving the classification accuracy.},
keywords={},
doi={10.1587/transinf.2022EDP7190},
ISSN={1745-1361},
month={September},}
부
TY - JOUR
TI - Imbalanced Data Over-Sampling Method Based on ISODATA Clustering
T2 - IEICE TRANSACTIONS on Information
SP - 1528
EP - 1536
AU - Zhenzhe LV
AU - Qicheng LIU
PY - 2023
DO - 10.1587/transinf.2022EDP7190
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E106-D
IS - 9
JA - IEICE TRANSACTIONS on Information
Y1 - September 2023
AB - Class imbalance is one of the challenges faced in the field of machine learning. It is difficult for traditional classifiers to predict the minority class data. If the imbalanced data is not processed, the effect of the classifier will be greatly reduced. Aiming at the problem that the traditional classifier tends to the majority class data and ignores the minority class data, imbalanced data over-sampling method based on iterative self-organizing data analysis technique algorithm(ISODATA) clustering is proposed. The minority class is divided into different sub-clusters by ISODATA, and each sub-cluster is over-sampled according to the sampling ratio, so that the sampled minority class data also conforms to the imbalance of the original minority class data. The new imbalanced data composed of new minority class data and majority class data is classified by SVM and Random Forest classifier. Experiments on 12 datasets from the KEEL datasets show that the method has better G-means and F-value, improving the classification accuracy.
ER -