The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
프레임 추출이라고도 알려진 다중 슬롯 정보 추출은 여러 관련 개체를 동시에 식별하는 작업입니다. 이 작업에 대한 대부분의 연구는 구조화되지 않은 문서에서 관련 엔터티를 추출하기 위해 IE 패턴(규칙)을 적용하는 것과 관련이 있습니다. 이 작업의 성공을 위한 중요한 장애물은 관심 있는 정보가 포함된 텍스트 부분이 어디에 있는지 알지 못하는 것입니다. 이 문제는 문장 경계가 모호한 언어(예: 태국어)와 관련될 때 더욱 복잡해집니다. 모든 합리적인 텍스트 부분에 IE 규칙을 적용하면 이 장애물의 효과가 저하될 수 있지만 잘못된(원치 않는) 추출이라는 또 다른 문제가 발생합니다. 본 논문의 목적은 이러한 잘못된 추출을 제거하는 방법을 제시하는 것입니다. 이 방법에서 추출은 직관적 퍼지 세트로 표현되며 IFS에 대한 유사성 측정은 분류되지 않은 추출의 IFS와 이미 분류된 각 추출의 IFS 사이의 거리를 계산하는 데 사용됩니다. 개념 k 분류되지 않은 추출이 올바른지 여부를 설계하기 위해 가장 가까운 이웃이 채택됩니다. 다양한 영역에서의 실험을 통해 제안된 기법은 재현율을 만족스럽게 유지하면서 추출 정밀도를 향상시켰다.
Peerasak INTARAPAIBOON
Thammasat University
Thanaruk THEERAMUNKONG
Thammasat University
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
부
Peerasak INTARAPAIBOON, Thanaruk THEERAMUNKONG, "An Application of Intuitionistic Fuzzy Sets to Improve Information Extraction from Thai Unstructured Text" in IEICE TRANSACTIONS on Information,
vol. E101-D, no. 9, pp. 2334-2345, September 2018, doi: 10.1587/transinf.2017EDP7423.
Abstract: Multi-slot information extraction, also known as frame extraction, is a task that identify several related entities simultaneously. Most researches on this task are concerned with applying IE patterns (rules) to extract related entities from unstructured documents. An important obstacle for the success in this task is unknowing where text portions containing interested information are. This problem is more complicated when involving languages with sentence boundary ambiguity, e.g. the Thai language. Applying IE rules to all reasonable text portions can degrade the effect of this obstacle, but it raises another problem that is incorrect (unwanted) extractions. This paper aims to present a method for removing these incorrect extractions. In the method, extractions are represented as intuitionistic fuzzy sets, and a similarity measure for IFSs is used to calculate distance between IFS of an unclassified extraction and that of each already-classified extraction. The concept of k nearest neighbor is adopted to design whether the unclassified extraction is correct or not. From the experiment on various domains, the proposed technique improves extraction precision while satisfactorily preserving recall.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2017EDP7423/_p
부
@ARTICLE{e101-d_9_2334,
author={Peerasak INTARAPAIBOON, Thanaruk THEERAMUNKONG, },
journal={IEICE TRANSACTIONS on Information},
title={An Application of Intuitionistic Fuzzy Sets to Improve Information Extraction from Thai Unstructured Text},
year={2018},
volume={E101-D},
number={9},
pages={2334-2345},
abstract={Multi-slot information extraction, also known as frame extraction, is a task that identify several related entities simultaneously. Most researches on this task are concerned with applying IE patterns (rules) to extract related entities from unstructured documents. An important obstacle for the success in this task is unknowing where text portions containing interested information are. This problem is more complicated when involving languages with sentence boundary ambiguity, e.g. the Thai language. Applying IE rules to all reasonable text portions can degrade the effect of this obstacle, but it raises another problem that is incorrect (unwanted) extractions. This paper aims to present a method for removing these incorrect extractions. In the method, extractions are represented as intuitionistic fuzzy sets, and a similarity measure for IFSs is used to calculate distance between IFS of an unclassified extraction and that of each already-classified extraction. The concept of k nearest neighbor is adopted to design whether the unclassified extraction is correct or not. From the experiment on various domains, the proposed technique improves extraction precision while satisfactorily preserving recall.},
keywords={},
doi={10.1587/transinf.2017EDP7423},
ISSN={1745-1361},
month={September},}
부
TY - JOUR
TI - An Application of Intuitionistic Fuzzy Sets to Improve Information Extraction from Thai Unstructured Text
T2 - IEICE TRANSACTIONS on Information
SP - 2334
EP - 2345
AU - Peerasak INTARAPAIBOON
AU - Thanaruk THEERAMUNKONG
PY - 2018
DO - 10.1587/transinf.2017EDP7423
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E101-D
IS - 9
JA - IEICE TRANSACTIONS on Information
Y1 - September 2018
AB - Multi-slot information extraction, also known as frame extraction, is a task that identify several related entities simultaneously. Most researches on this task are concerned with applying IE patterns (rules) to extract related entities from unstructured documents. An important obstacle for the success in this task is unknowing where text portions containing interested information are. This problem is more complicated when involving languages with sentence boundary ambiguity, e.g. the Thai language. Applying IE rules to all reasonable text portions can degrade the effect of this obstacle, but it raises another problem that is incorrect (unwanted) extractions. This paper aims to present a method for removing these incorrect extractions. In the method, extractions are represented as intuitionistic fuzzy sets, and a similarity measure for IFSs is used to calculate distance between IFS of an unclassified extraction and that of each already-classified extraction. The concept of k nearest neighbor is adopted to design whether the unclassified extraction is correct or not. From the experiment on various domains, the proposed technique improves extraction precision while satisfactorily preserving recall.
ER -