The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
본 논문에서는 웹에서 신뢰할 수 있는 단어를 추출하고 이를 추가 리소스로 사용하여 단어 간격 문제에 대한 새로운 접근 방식을 제시합니다. 자동 단어 간격에 대한 기존 접근 방식은 노이즈 없는 데이터를 사용하여 단어 간격 모델에 대한 매개변수를 교육합니다. 그러나 학습 예제의 부족함과 관련성은 항상 자동 단어 간격과 관련된 주요 병목 현상입니다. 데이터 희소성 문제를 완화하기 위해 본 논문에서는 웹에서 신뢰할 수 있는 단어를 찾아 어휘력을 확장하는 알고리즘과 해당 단어를 추가 자원으로 활용하는 모델을 제안합니다. 제안된 접근 방식은 새로운 영역에 적응하는 데 매우 간단하고 실용적입니다. 실험 결과는 제안된 접근 방식이 기존의 단어 간격 접근 방식에 비해 더 나은 성능을 달성한다는 것을 보여줍니다.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
부
Gumwon HONG, Jeong-Hoon LEE, Young-In SONG, Do-Gil LEE, Hae-Chang RIM, "Utilizing the Web for Automatic Word Spacing" in IEICE TRANSACTIONS on Information,
vol. E92-D, no. 12, pp. 2553-2556, December 2009, doi: 10.1587/transinf.E92.D.2553.
Abstract: This paper presents a new approach to word spacing problems by mining reliable words from the Web and use them as additional resources. Conventional approaches to automatic word spacing use noise-free data to train parameters for word spacing models. However, the insufficiency and irrelevancy of training examples is always the main bottleneck associated with automatic word spacing. To mitigate the data-sparseness problem, this paper proposes an algorithm to discover reliable words on the Web to expand the vocabularies and a model to utilize the words as additional resources. The proposed approach is very simple and practical to adapt to new domains. Experimental results show that the proposed approach achieves better performance compared to the conventional word spacing approaches.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.E92.D.2553/_p
부
@ARTICLE{e92-d_12_2553,
author={Gumwon HONG, Jeong-Hoon LEE, Young-In SONG, Do-Gil LEE, Hae-Chang RIM, },
journal={IEICE TRANSACTIONS on Information},
title={Utilizing the Web for Automatic Word Spacing},
year={2009},
volume={E92-D},
number={12},
pages={2553-2556},
abstract={This paper presents a new approach to word spacing problems by mining reliable words from the Web and use them as additional resources. Conventional approaches to automatic word spacing use noise-free data to train parameters for word spacing models. However, the insufficiency and irrelevancy of training examples is always the main bottleneck associated with automatic word spacing. To mitigate the data-sparseness problem, this paper proposes an algorithm to discover reliable words on the Web to expand the vocabularies and a model to utilize the words as additional resources. The proposed approach is very simple and practical to adapt to new domains. Experimental results show that the proposed approach achieves better performance compared to the conventional word spacing approaches.},
keywords={},
doi={10.1587/transinf.E92.D.2553},
ISSN={1745-1361},
month={December},}
부
TY - JOUR
TI - Utilizing the Web for Automatic Word Spacing
T2 - IEICE TRANSACTIONS on Information
SP - 2553
EP - 2556
AU - Gumwon HONG
AU - Jeong-Hoon LEE
AU - Young-In SONG
AU - Do-Gil LEE
AU - Hae-Chang RIM
PY - 2009
DO - 10.1587/transinf.E92.D.2553
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E92-D
IS - 12
JA - IEICE TRANSACTIONS on Information
Y1 - December 2009
AB - This paper presents a new approach to word spacing problems by mining reliable words from the Web and use them as additional resources. Conventional approaches to automatic word spacing use noise-free data to train parameters for word spacing models. However, the insufficiency and irrelevancy of training examples is always the main bottleneck associated with automatic word spacing. To mitigate the data-sparseness problem, this paper proposes an algorithm to discover reliable words on the Web to expand the vocabularies and a model to utilize the words as additional resources. The proposed approach is very simple and practical to adapt to new domains. Experimental results show that the proposed approach achieves better performance compared to the conventional word spacing approaches.
ER -