The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
주어진 짧은 텍스트로 동일한 카테고리의 문서를 획득하는 기술을 도입합니다. 주어진 텍스트를 트레이닝 문서로 간주하여 시스템은 문서 도메인(또는 전체 웹) 중에서 가장 유사한 문서, 또는 충분히 유사한 문서를 마크업합니다. 그런 다음 시스템은 표시된 문서를 훈련 세트에 추가하여 세트를 학습하고 더 이상 표시되는 문서가 없을 때까지 이 프로세스를 반복합니다. 단조롭게 증가하는 속성을 학습하면서 유사도로 설정하면 시스템은 1) 더 이상 표시할 문서가 남지 않도록 올바른 타이밍을 감지하고 2) 분류기가 사용하는 임계값을 결정할 수 있습니다. 또한 어떤 용어의 가중치를 가중치의 p-norm으로 나누는지에 대한 정규화 과정이 제한된다는 조건에서 훈련 문서를 이진 방식으로 인덱싱하는 선형 분류기는 단조 증가 특성을 만족하는 유일한 인스턴스이다. . 제안된 기법의 타당성은 웹에서 무작위로 선택된 영어 및 독일어 문서를 사용하여 이진 유사성 검사를 통해 확인되었습니다.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
부
Izumi SUZUKI, Yoshiki MIKAMI, Ario OHSATO, "Monotone Increasing Binary Similarity and Its Application to Automatic Document-Acquisition of a Category" in IEICE TRANSACTIONS on Information,
vol. E91-D, no. 11, pp. 2545-2551, November 2008, doi: 10.1093/ietisy/e91-d.11.2545.
Abstract: A technique that acquires documents in the same category with a given short text is introduced. Regarding the given text as a training document, the system marks up the most similar document, or sufficiently similar documents, from among the document domain (or entire Web). The system then adds the marked documents to the training set to learn the set, and this process is repeated until no more documents are marked. Setting a monotone increasing property to the similarity as it learns enables the system to 1) detect the correct timing so that no more documents remain to be marked and to 2) decide the threshold value that the classifier uses. In addition, under the condition that the normalization process is limited to what term weights are divided by a p-norm of the weights, the linear classifier in which training documents are indexed in a binary manner is the only instance that satisfies the monotone increasing property. The feasibility of the proposed technique was confirmed through an examination of binary similarity and using English and German documents randomly selected from the Web.
URL: https://global.ieice.org/en_transactions/information/10.1093/ietisy/e91-d.11.2545/_p
부
@ARTICLE{e91-d_11_2545,
author={Izumi SUZUKI, Yoshiki MIKAMI, Ario OHSATO, },
journal={IEICE TRANSACTIONS on Information},
title={Monotone Increasing Binary Similarity and Its Application to Automatic Document-Acquisition of a Category},
year={2008},
volume={E91-D},
number={11},
pages={2545-2551},
abstract={A technique that acquires documents in the same category with a given short text is introduced. Regarding the given text as a training document, the system marks up the most similar document, or sufficiently similar documents, from among the document domain (or entire Web). The system then adds the marked documents to the training set to learn the set, and this process is repeated until no more documents are marked. Setting a monotone increasing property to the similarity as it learns enables the system to 1) detect the correct timing so that no more documents remain to be marked and to 2) decide the threshold value that the classifier uses. In addition, under the condition that the normalization process is limited to what term weights are divided by a p-norm of the weights, the linear classifier in which training documents are indexed in a binary manner is the only instance that satisfies the monotone increasing property. The feasibility of the proposed technique was confirmed through an examination of binary similarity and using English and German documents randomly selected from the Web.},
keywords={},
doi={10.1093/ietisy/e91-d.11.2545},
ISSN={1745-1361},
month={November},}
부
TY - JOUR
TI - Monotone Increasing Binary Similarity and Its Application to Automatic Document-Acquisition of a Category
T2 - IEICE TRANSACTIONS on Information
SP - 2545
EP - 2551
AU - Izumi SUZUKI
AU - Yoshiki MIKAMI
AU - Ario OHSATO
PY - 2008
DO - 10.1093/ietisy/e91-d.11.2545
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E91-D
IS - 11
JA - IEICE TRANSACTIONS on Information
Y1 - November 2008
AB - A technique that acquires documents in the same category with a given short text is introduced. Regarding the given text as a training document, the system marks up the most similar document, or sufficiently similar documents, from among the document domain (or entire Web). The system then adds the marked documents to the training set to learn the set, and this process is repeated until no more documents are marked. Setting a monotone increasing property to the similarity as it learns enables the system to 1) detect the correct timing so that no more documents remain to be marked and to 2) decide the threshold value that the classifier uses. In addition, under the condition that the normalization process is limited to what term weights are divided by a p-norm of the weights, the linear classifier in which training documents are indexed in a binary manner is the only instance that satisfies the monotone increasing property. The feasibility of the proposed technique was confirmed through an examination of binary similarity and using English and German documents randomly selected from the Web.
ER -