The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
최근 몇 년 동안 심층 신경망(DNN)은 다양한 연구 분야와 응용 분야에 큰 영향을 미쳤습니다. DNN의 한 가지 단점은 훈련을 위해 엄청난 양의 데이터 세트가 필요하다는 것입니다. 전문가에게 데이터 라벨링을 요청하는 것은 매우 비용이 많이 들기 때문에 웹 크롤링과 같은 비전문가의 데이터 수집 방법이 많이 제안되었습니다. 그러나 비전문가가 만든 데이터 세트에는 손상된 레이블이 포함되어 있는 경우가 많으며 이러한 데이터 세트에 대해 훈련된 DNN은 신뢰할 수 없습니다. DNN에는 엄청난 수의 매개변수가 있으므로 잡음이 많은 레이블에 과적합되는 경향이 있어 일반화 성능이 저하됩니다. 이 문제를 LNL(Learning with Noisy Labels)이라고 합니다. 최근 연구에 따르면 DNN은 단순한 패턴을 먼저 학습하기 때문에 잡음이 있는 레이블에 과적합되기 전에 학습 초기 단계의 잡음이 있는 레이블에 강력합니다. 따라서 DNN은 학습 초기 단계에서 잡음이 있는 레이블이 있는 샘플에 대해 실제 레이블을 출력하는 경향이 있으며, 잡음이 있는 레이블이 있는 샘플의 잘못된 예측 수가 깨끗한 레이블이 있는 샘플보다 높습니다. 이러한 관찰을 바탕으로 우리는 잘못된 예측 수를 사용하여 LNL에 대한 새로운 샘플 선택 접근 방식을 제안합니다. 우리의 방법은 훈련 중 잘못된 예측 기록을 주기적으로 수집하고 최근 기록에서 잘못된 예측 수가 적은 샘플을 선택합니다. 그런 다음 우리의 방법은 업데이트된 데이터 세트를 사용하여 샘플 선택과 DNN 모델 교육을 반복적으로 수행합니다. 모델은 더 깨끗한 샘플로 훈련되고 샘플 선택에 대해 더 정확한 잘못된 예측을 기록하므로 모델의 일반화 성능이 점차 향상됩니다. 우리는 합성적으로 생성된 노이즈 레이블을 사용하여 CIFAR-10 및 CIFAR-100이라는 두 가지 벤치마크 데이터 세트에 대한 방법을 평가했으며, 얻은 결과는 최첨단 접근 방식보다 우수하거나 비교되었습니다.
Yuichiro NOMURA
Hiroshima University
Takio KURITA
Hiroshima University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
부
Yuichiro NOMURA, Takio KURITA, "Sample Selection Approach with Number of False Predictions for Learning with Noisy Labels" in IEICE TRANSACTIONS on Information,
vol. E105-D, no. 10, pp. 1759-1768, October 2022, doi: 10.1587/transinf.2022EDP7033.
Abstract: In recent years, deep neural networks (DNNs) have made a significant impact on a variety of research fields and applications. One drawback of DNNs is that it requires a huge amount of dataset for training. Since it is very expensive to ask experts to label the data, many non-expert data collection methods such as web crawling have been proposed. However, dataset created by non-experts often contain corrupted labels, and DNNs trained on such dataset are unreliable. Since DNNs have an enormous number of parameters, it tends to overfit to noisy labels, resulting in poor generalization performance. This problem is called Learning with Noisy labels (LNL). Recent studies showed that DNNs are robust to the noisy labels in the early stage of learning before over-fitting to noisy labels because DNNs learn the simple patterns first. Therefore DNNs tend to output true labels for samples with noisy labels in the early stage of learning, and the number of false predictions for samples with noisy labels is higher than for samples with clean labels. Based on these observations, we propose a new sample selection approach for LNL using the number of false predictions. Our method periodically collects the records of false predictions during training, and select samples with a low number of false predictions from the recent records. Then our method iteratively performs sample selection and training a DNNs model using the updated dataset. Since the model is trained with more clean samples and records more accurate false predictions for sample selection, the generalization performance of the model gradually increases. We evaluated our method on two benchmark datasets, CIFAR-10 and CIFAR-100 with synthetically generated noisy labels, and the obtained results which are better than or comparative to the-state-of-the-art approaches.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2022EDP7033/_p
부
@ARTICLE{e105-d_10_1759,
author={Yuichiro NOMURA, Takio KURITA, },
journal={IEICE TRANSACTIONS on Information},
title={Sample Selection Approach with Number of False Predictions for Learning with Noisy Labels},
year={2022},
volume={E105-D},
number={10},
pages={1759-1768},
abstract={In recent years, deep neural networks (DNNs) have made a significant impact on a variety of research fields and applications. One drawback of DNNs is that it requires a huge amount of dataset for training. Since it is very expensive to ask experts to label the data, many non-expert data collection methods such as web crawling have been proposed. However, dataset created by non-experts often contain corrupted labels, and DNNs trained on such dataset are unreliable. Since DNNs have an enormous number of parameters, it tends to overfit to noisy labels, resulting in poor generalization performance. This problem is called Learning with Noisy labels (LNL). Recent studies showed that DNNs are robust to the noisy labels in the early stage of learning before over-fitting to noisy labels because DNNs learn the simple patterns first. Therefore DNNs tend to output true labels for samples with noisy labels in the early stage of learning, and the number of false predictions for samples with noisy labels is higher than for samples with clean labels. Based on these observations, we propose a new sample selection approach for LNL using the number of false predictions. Our method periodically collects the records of false predictions during training, and select samples with a low number of false predictions from the recent records. Then our method iteratively performs sample selection and training a DNNs model using the updated dataset. Since the model is trained with more clean samples and records more accurate false predictions for sample selection, the generalization performance of the model gradually increases. We evaluated our method on two benchmark datasets, CIFAR-10 and CIFAR-100 with synthetically generated noisy labels, and the obtained results which are better than or comparative to the-state-of-the-art approaches.},
keywords={},
doi={10.1587/transinf.2022EDP7033},
ISSN={1745-1361},
month={October},}
부
TY - JOUR
TI - Sample Selection Approach with Number of False Predictions for Learning with Noisy Labels
T2 - IEICE TRANSACTIONS on Information
SP - 1759
EP - 1768
AU - Yuichiro NOMURA
AU - Takio KURITA
PY - 2022
DO - 10.1587/transinf.2022EDP7033
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E105-D
IS - 10
JA - IEICE TRANSACTIONS on Information
Y1 - October 2022
AB - In recent years, deep neural networks (DNNs) have made a significant impact on a variety of research fields and applications. One drawback of DNNs is that it requires a huge amount of dataset for training. Since it is very expensive to ask experts to label the data, many non-expert data collection methods such as web crawling have been proposed. However, dataset created by non-experts often contain corrupted labels, and DNNs trained on such dataset are unreliable. Since DNNs have an enormous number of parameters, it tends to overfit to noisy labels, resulting in poor generalization performance. This problem is called Learning with Noisy labels (LNL). Recent studies showed that DNNs are robust to the noisy labels in the early stage of learning before over-fitting to noisy labels because DNNs learn the simple patterns first. Therefore DNNs tend to output true labels for samples with noisy labels in the early stage of learning, and the number of false predictions for samples with noisy labels is higher than for samples with clean labels. Based on these observations, we propose a new sample selection approach for LNL using the number of false predictions. Our method periodically collects the records of false predictions during training, and select samples with a low number of false predictions from the recent records. Then our method iteratively performs sample selection and training a DNNs model using the updated dataset. Since the model is trained with more clean samples and records more accurate false predictions for sample selection, the generalization performance of the model gradually increases. We evaluated our method on two benchmark datasets, CIFAR-10 and CIFAR-100 with synthetically generated noisy labels, and the obtained results which are better than or comparative to the-state-of-the-art approaches.
ER -