The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
단어 경계가 없는 언어에서 알려지지 않은 단어를 자동으로 인식하기 위해 분류 기술을 적용할 수 있지만 긍정적인 알려지지 않은 단어 후보의 수가 부정적인 후보의 수보다 압도적으로 적은 불균형 데이터 세트 문제에 직면합니다. 이 문제를 해결하기 위해 이 논문에서는 나중에 여러 후보 중에서 가장 가능성이 높은 알려지지 않은 단어를 선택하기 위해 협력하는 일련의 분류 모델을 생성하기 위해 소위 그룹 기반 순위 평가 기술을 앙상블 학습에 도입하는 코퍼스 기반 접근 방식을 제시합니다. . 분류 모델이 주어지면 그룹 기반 순위 평가(GRE)를 적용하여 알려지지 않은 단어의 후보를 하나로 간주할 때 각 후보의 순위와 정확성에 따라 가중치를 부여하여 후속 모델 학습을 위한 훈련 데이터 세트를 구성합니다. 그룹. 제안된 그룹 기반 순위 평가 접근법, 즉 V-GRE의 성능을 기존 Naive Bayes 분류기 및 앙상블 학습이 없는 바닐라 버전과 비교하여 평가하기 위해 대규모 태국 의학 텍스트에 대해 여러 가지 실험이 수행되었습니다. 그 결과 제안한 방법은 90.93의 정확도를 달성하였다.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
부
Jakkrit TECHO, Cholwich NATTEE, Thanaruk THEERAMUNKONG, "A Corpus-Based Approach for Automatic Thai Unknown Word Recognition Using Boosting Techniques" in IEICE TRANSACTIONS on Information,
vol. E92-D, no. 12, pp. 2321-2333, December 2009, doi: 10.1587/transinf.E92.D.2321.
Abstract: While classification techniques can be applied for automatic unknown word recognition in a language without word boundary, it faces with the problem of unbalanced datasets where the number of positive unknown word candidates is dominantly smaller than that of negative candidates. To solve this problem, this paper presents a corpus-based approach that introduces a so-called group-based ranking evaluation technique into ensemble learning in order to generate a sequence of classification models that later collaborate to select the most probable unknown word from multiple candidates. Given a classification model, the group-based ranking evaluation (GRE) is applied to construct a training dataset for learning the succeeding model, by weighing each of its candidates according to their ranks and correctness when the candidates of an unknown word are considered as one group. A number of experiments have been conducted on a large Thai medical text to evaluate performance of the proposed group-based ranking evaluation approach, namely V-GRE, compared to the conventional naive Bayes classifier and our vanilla version without ensemble learning. As the result, the proposed method achieves an accuracy of 90.93
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E92.D.2321/_p
부
@ARTICLE{e92-d_12_2321,
author={Jakkrit TECHO, Cholwich NATTEE, Thanaruk THEERAMUNKONG, },
journal={IEICE TRANSACTIONS on Information},
title={A Corpus-Based Approach for Automatic Thai Unknown Word Recognition Using Boosting Techniques},
year={2009},
volume={E92-D},
number={12},
pages={2321-2333},
abstract={While classification techniques can be applied for automatic unknown word recognition in a language without word boundary, it faces with the problem of unbalanced datasets where the number of positive unknown word candidates is dominantly smaller than that of negative candidates. To solve this problem, this paper presents a corpus-based approach that introduces a so-called group-based ranking evaluation technique into ensemble learning in order to generate a sequence of classification models that later collaborate to select the most probable unknown word from multiple candidates. Given a classification model, the group-based ranking evaluation (GRE) is applied to construct a training dataset for learning the succeeding model, by weighing each of its candidates according to their ranks and correctness when the candidates of an unknown word are considered as one group. A number of experiments have been conducted on a large Thai medical text to evaluate performance of the proposed group-based ranking evaluation approach, namely V-GRE, compared to the conventional naive Bayes classifier and our vanilla version without ensemble learning. As the result, the proposed method achieves an accuracy of 90.93
keywords={},
doi={10.1587/transinf.E92.D.2321},
ISSN={1745-1361},
month={December},}
부
TY - JOUR
TI - A Corpus-Based Approach for Automatic Thai Unknown Word Recognition Using Boosting Techniques
T2 - IEICE TRANSACTIONS on Information
SP - 2321
EP - 2333
AU - Jakkrit TECHO
AU - Cholwich NATTEE
AU - Thanaruk THEERAMUNKONG
PY - 2009
DO - 10.1587/transinf.E92.D.2321
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E92-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2009
AB - While classification techniques can be applied for automatic unknown word recognition in a language without word boundary, it faces with the problem of unbalanced datasets where the number of positive unknown word candidates is dominantly smaller than that of negative candidates. To solve this problem, this paper presents a corpus-based approach that introduces a so-called group-based ranking evaluation technique into ensemble learning in order to generate a sequence of classification models that later collaborate to select the most probable unknown word from multiple candidates. Given a classification model, the group-based ranking evaluation (GRE) is applied to construct a training dataset for learning the succeeding model, by weighing each of its candidates according to their ranks and correctness when the candidates of an unknown word are considered as one group. A number of experiments have been conducted on a large Thai medical text to evaluate performance of the proposed group-based ranking evaluation approach, namely V-GRE, compared to the conventional naive Bayes classifier and our vanilla version without ensemble learning. As the result, the proposed method achieves an accuracy of 90.93
ER -