The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
우리는 방대한 양의 문서를 검색할 때 검색 결과를 좁히기 위해 여러 개의 키워드를 지정하고 결합 쿼리를 사용하는 경우가 많습니다. 검색된 문서에는 모든 키워드가 포함되어 있지만 일반적으로 키워드의 위치는 고려되지 않습니다. 결과적으로 검색 결과에는 의미 없는 문서가 일부 포함되어 있습니다. 따라서 문서 내 키워드의 근접성에 따라 문서 순위를 매기는 것이 효과적입니다. 이 순위는 일종의 텍스트 데이터 마이닝으로 간주됩니다. 본 논문에서는 주어진 키워드가 모두 인접한 위치에 나타나는 문서를 찾는 두 가지 알고리즘을 제안합니다. 하나는 평면 스윕 알고리즘을 기반으로 하고 다른 하나는 분할 정복 접근 방식을 기반으로 합니다. 두 알고리즘 모두에서 실행됩니다. O(n 기록 n) 시간 n 주어진 키워드의 발생 횟수입니다. 우리는 대규모 HTML 파일 모음에 대해 알고리즘을 실행하고 그 효과를 확인합니다.
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
부
Kunihiko SADAKANE, Hiroshi IMAI, "Fast Algorithms for k-Word Proximity Search" in IEICE TRANSACTIONS on Fundamentals,
vol. E84-A, no. 9, pp. 2311-2318, September 2001, doi: .
Abstract: When we search from a huge amount of documents, we often specify several keywords and use conjunctive queries to narrow the result of the search. Though the searched documents contain all keywords, positions of the keywords are usually not considered. As a result, the search result contains some meaningless documents. It is therefore effective to rank documents according to proximity of keywords in the documents. This ranking is regarded as a kind of text data mining. In this paper, we propose two algorithms for finding documents in which all given keywords appear in neighboring places. One is based on plane-sweep algorithm and the other is based on divide-and-conquer approach. Both algorithms run in O(n log n) time where n is the number of occurrences of given keywords. We run the algorithms on a large collection of html files and verify its effectiveness.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/e84-a_9_2311/_p
부
@ARTICLE{e84-a_9_2311,
author={Kunihiko SADAKANE, Hiroshi IMAI, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={Fast Algorithms for k-Word Proximity Search},
year={2001},
volume={E84-A},
number={9},
pages={2311-2318},
abstract={When we search from a huge amount of documents, we often specify several keywords and use conjunctive queries to narrow the result of the search. Though the searched documents contain all keywords, positions of the keywords are usually not considered. As a result, the search result contains some meaningless documents. It is therefore effective to rank documents according to proximity of keywords in the documents. This ranking is regarded as a kind of text data mining. In this paper, we propose two algorithms for finding documents in which all given keywords appear in neighboring places. One is based on plane-sweep algorithm and the other is based on divide-and-conquer approach. Both algorithms run in O(n log n) time where n is the number of occurrences of given keywords. We run the algorithms on a large collection of html files and verify its effectiveness.},
keywords={},
doi={},
ISSN={},
month={September},}
부
TY - JOUR
TI - Fast Algorithms for k-Word Proximity Search
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 2311
EP - 2318
AU - Kunihiko SADAKANE
AU - Hiroshi IMAI
PY - 2001
DO -
JO - IEICE TRANSACTIONS on Fundamentals
SN -
VL - E84-A
IS - 9
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - September 2001
AB - When we search from a huge amount of documents, we often specify several keywords and use conjunctive queries to narrow the result of the search. Though the searched documents contain all keywords, positions of the keywords are usually not considered. As a result, the search result contains some meaningless documents. It is therefore effective to rank documents according to proximity of keywords in the documents. This ranking is regarded as a kind of text data mining. In this paper, we propose two algorithms for finding documents in which all given keywords appear in neighboring places. One is based on plane-sweep algorithm and the other is based on divide-and-conquer approach. Both algorithms run in O(n log n) time where n is the number of occurrences of given keywords. We run the algorithms on a large collection of html files and verify its effectiveness.
ER -