The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
본 논문은 소규모 문서 분류에 적용하기 위한 명목 속성 추정을 위한 확장된 Relief-F 알고리즘을 제시합니다. 릴리프 알고리즘은 데이터 분류 및 회귀를 위한 일반적이고 성공적인 인스턴스 기반 기능 필터링 알고리즘입니다. 많은 개선된 릴리프 알고리즘이 중복성 및 관련 없는 노이즈 기능 문제와 다중 클래스 데이터 세트에 대한 알고리즘의 제한 사항에 대한 솔루션으로 도입되었습니다. 그러나 이러한 알고리즘은 텍스트 분류에 거의 적용되지 않았습니다. 다중 클래스 데이터 세트의 수많은 기능으로 인해 시간이 매우 복잡해지기 때문입니다. 따라서 텍스트 특징 필터링 및 분류에 대한 적용을 고려하여 2007년에 수치 속성 추정을 위한 확장된 Relief-F 알고리즘(E-Relief-F)을 제시했습니다. 그러나 이에 대한 한계와 몇 가지 문제점을 발견했습니다. 따라서 본 논문에서는 인스턴스의 적은 수의 피처로 인해 발생하는 계산 유사성과 가중치의 부정적인 영향, 일부 인스턴스에 대한 가장 가까운 적중 및 누락이 없는 문제를 포함하여 텍스트 피처 필터링을 위한 릴리프 알고리즘의 추가 문제를 소개합니다. 엄청난 시간 복잡도. 그런 다음 이러한 문제를 해결하기 위해 명목 속성 추정(E-Relief-Fd)을 위한 새로운 확장 Relief-F 알고리즘을 제안하고 이를 소규모 텍스트 문서 분류에 적용합니다. 다양한 데이터 세트의 특징 품질, 분류에의 적용 및 기존 Relief 알고리즘과의 성능을 비교하기 위해 실험에 알고리즘을 사용했습니다. 실험 결과는 새로운 E-Relief-Fd 알고리즘이 E-Relief-F를 포함한 이전 Relief 알고리즘보다 더 나은 성능을 제공한다는 것을 보여줍니다.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
부
Heum PARK, Hyuk-Chul KWON, "Extended Relief-F Algorithm for Nominal Attribute Estimation in Small-Document Classification" in IEICE TRANSACTIONS on Information,
vol. E92-D, no. 12, pp. 2360-2368, December 2009, doi: 10.1587/transinf.E92.D.2360.
Abstract: This paper presents an extended Relief-F algorithm for nominal attribute estimation, for application to small-document classification. Relief algorithms are general and successful instance-based feature-filtering algorithms for data classification and regression. Many improved Relief algorithms have been introduced as solutions to problems of redundancy and irrelevant noisy features and to the limitations of the algorithms for multiclass datasets. However, these algorithms have only rarely been applied to text classification, because the numerous features in multiclass datasets lead to great time complexity. Therefore, in considering their application to text feature filtering and classification, we presented an extended Relief-F algorithm for numerical attribute estimation (E-Relief-F) in 2007. However, we found limitations and some problems with it. Therefore, in this paper, we introduce additional problems with Relief algorithms for text feature filtering, including the negative influence of computation similarities and weights caused by a small number of features in an instance, the absence of nearest hits and misses for some instances, and great time complexity. We then suggest a new extended Relief-F algorithm for nominal attribute estimation (E-Relief-Fd) to solve these problems, and we apply it to small text-document classification. We used the algorithm in experiments to estimate feature quality for various datasets, its application to classification, and its performance in comparison with existing Relief algorithms. The experimental results show that the new E-Relief-Fd algorithm offers better performance than previous Relief algorithms, including E-Relief-F.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E92.D.2360/_p
부
@ARTICLE{e92-d_12_2360,
author={Heum PARK, Hyuk-Chul KWON, },
journal={IEICE TRANSACTIONS on Information},
title={Extended Relief-F Algorithm for Nominal Attribute Estimation in Small-Document Classification},
year={2009},
volume={E92-D},
number={12},
pages={2360-2368},
abstract={This paper presents an extended Relief-F algorithm for nominal attribute estimation, for application to small-document classification. Relief algorithms are general and successful instance-based feature-filtering algorithms for data classification and regression. Many improved Relief algorithms have been introduced as solutions to problems of redundancy and irrelevant noisy features and to the limitations of the algorithms for multiclass datasets. However, these algorithms have only rarely been applied to text classification, because the numerous features in multiclass datasets lead to great time complexity. Therefore, in considering their application to text feature filtering and classification, we presented an extended Relief-F algorithm for numerical attribute estimation (E-Relief-F) in 2007. However, we found limitations and some problems with it. Therefore, in this paper, we introduce additional problems with Relief algorithms for text feature filtering, including the negative influence of computation similarities and weights caused by a small number of features in an instance, the absence of nearest hits and misses for some instances, and great time complexity. We then suggest a new extended Relief-F algorithm for nominal attribute estimation (E-Relief-Fd) to solve these problems, and we apply it to small text-document classification. We used the algorithm in experiments to estimate feature quality for various datasets, its application to classification, and its performance in comparison with existing Relief algorithms. The experimental results show that the new E-Relief-Fd algorithm offers better performance than previous Relief algorithms, including E-Relief-F.},
keywords={},
doi={10.1587/transinf.E92.D.2360},
ISSN={1745-1361},
month={December},}
부
TY - JOUR
TI - Extended Relief-F Algorithm for Nominal Attribute Estimation in Small-Document Classification
T2 - IEICE TRANSACTIONS on Information
SP - 2360
EP - 2368
AU - Heum PARK
AU - Hyuk-Chul KWON
PY - 2009
DO - 10.1587/transinf.E92.D.2360
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E92-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2009
AB - This paper presents an extended Relief-F algorithm for nominal attribute estimation, for application to small-document classification. Relief algorithms are general and successful instance-based feature-filtering algorithms for data classification and regression. Many improved Relief algorithms have been introduced as solutions to problems of redundancy and irrelevant noisy features and to the limitations of the algorithms for multiclass datasets. However, these algorithms have only rarely been applied to text classification, because the numerous features in multiclass datasets lead to great time complexity. Therefore, in considering their application to text feature filtering and classification, we presented an extended Relief-F algorithm for numerical attribute estimation (E-Relief-F) in 2007. However, we found limitations and some problems with it. Therefore, in this paper, we introduce additional problems with Relief algorithms for text feature filtering, including the negative influence of computation similarities and weights caused by a small number of features in an instance, the absence of nearest hits and misses for some instances, and great time complexity. We then suggest a new extended Relief-F algorithm for nominal attribute estimation (E-Relief-Fd) to solve these problems, and we apply it to small text-document classification. We used the algorithm in experiments to estimate feature quality for various datasets, its application to classification, and its performance in comparison with existing Relief algorithms. The experimental results show that the new E-Relief-Fd algorithm offers better performance than previous Relief algorithms, including E-Relief-F.
ER -