The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
최근 몇 년 동안 스팸 이메일의 수가 급격히 증가하고 있으며 스팸은 심각한 인터넷 위협으로 인식되고 있습니다. 가장 최근의 스팸 이메일은 봇넷 형태로 다른 사람과 협력하는 경우가 많은 봇에 의해 전송되고 있으며 숙련된 스패머는 스팸 분석기 및 스팸 탐지 기술로부터 자신의 활동을 숨기려고 합니다. 또한 대부분의 스팸 메시지에는 악성코드 감염, 피싱 공격 등 각종 사이버 공격을 수행할 목적으로 스팸 수신자를 악성 웹 서버로 유인하는 URL이 포함되어 있다. 스팸 기반 공격에 대응하기 위해 많은 노력이 이루어져 왔다. 스팸 이메일 간의 유사성을 기반으로 스팸 이메일을 클러스터링합니다. 스팸 메일을 클러스터링하여 얻은 스팸 클러스터를 이용하면 스팸 발송 시스템과 악성 웹 서버의 인프라, 이들이 어떻게 그룹화되고 연관되어 있는지 파악하고, 웹 페이지 분석에 소요되는 시간을 최소화할 수 있다. 따라서 스팸 기반 공격을 보다 정확하게 분석하기 위해서는 스팸 클러스터링의 정확도를 최대한 높이는 것이 매우 중요하다. 본 논문에서는 최적화된 스팸 클러스터링 방법을 제시한다. O-는 의미한다, 가장 널리 사용되는 클러스터링 방법 중 하나인 K-평균 클러스터링 방법을 기반으로 합니다. 당사의 SMTP 서버에 수집된 87주간의 스팸을 조사한 결과 O-평균 클러스터링 방법의 정확도는 약 12%로 이전 클러스터링 방법보다 우수하다는 것을 확인했습니다. 또한 스팸 이메일 간의 유사성을 비교하기 위해 XNUMX가지 통계적 특징을 정의하고 O-평균 클러스터링 방법을 보다 효과적으로 만드는 최적화된 특징 세트를 결정했습니다.
스팸, 클러스터링, 기능, K-평균 클러스터링 방법
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
부
Jungsuk SONG, Daisuke INOUE, Masashi ETO, Hyung Chan KIM, Koji NAKAO, "O-means: An Optimized Clustering Method for Analyzing Spam Based Attacks" in IEICE TRANSACTIONS on Fundamentals,
vol. E94-A, no. 1, pp. 245-254, January 2011, doi: 10.1587/transfun.E94.A.245.
Abstract: In recent years, the number of spam emails has been dramatically increasing and spam is recognized as a serious internet threat. Most recent spam emails are being sent by bots which often operate with others in the form of a botnet, and skillful spammers try to conceal their activities from spam analyzers and spam detection technology. In addition, most spam messages contain URLs that lure spam receivers to malicious Web servers for the purpose of carrying out various cyber attacks such as malware infection, phishing attacks, etc. In order to cope with spam based attacks, there have been many efforts made towards the clustering of spam emails based on similarities between them. The spam clusters obtained from the clustering of spam emails can be used to identify the infrastructure of spam sending systems and malicious Web servers, and how they are grouped and correlate with each other, and to minimize the time needed for analyzing Web pages. Therefore, it is very important to improve the accuracy of the spam clustering as much as possible so as to analyze spam based attacks more accurately. In this paper, we present an optimized spam clustering method, called O-means, based on the K-means clustering method, which is one of the most widely used clustering methods. By examining three weeks of spam gathered in our SMTP server, we observed that the accuracy of the O-means clustering method is about 87% which is superior to the previous clustering methods. In addition, we define 12 statistical features to compare similarity between spam emails, and we determined a set of optimized features which makes the O-means clustering method more effective.
URL: https://global.ieice.org/en_transactions/fundamentals/10.1587/transfun.E94.A.245/_p
부
@ARTICLE{e94-a_1_245,
author={Jungsuk SONG, Daisuke INOUE, Masashi ETO, Hyung Chan KIM, Koji NAKAO, },
journal={IEICE TRANSACTIONS on Fundamentals},
title={O-means: An Optimized Clustering Method for Analyzing Spam Based Attacks},
year={2011},
volume={E94-A},
number={1},
pages={245-254},
abstract={In recent years, the number of spam emails has been dramatically increasing and spam is recognized as a serious internet threat. Most recent spam emails are being sent by bots which often operate with others in the form of a botnet, and skillful spammers try to conceal their activities from spam analyzers and spam detection technology. In addition, most spam messages contain URLs that lure spam receivers to malicious Web servers for the purpose of carrying out various cyber attacks such as malware infection, phishing attacks, etc. In order to cope with spam based attacks, there have been many efforts made towards the clustering of spam emails based on similarities between them. The spam clusters obtained from the clustering of spam emails can be used to identify the infrastructure of spam sending systems and malicious Web servers, and how they are grouped and correlate with each other, and to minimize the time needed for analyzing Web pages. Therefore, it is very important to improve the accuracy of the spam clustering as much as possible so as to analyze spam based attacks more accurately. In this paper, we present an optimized spam clustering method, called O-means, based on the K-means clustering method, which is one of the most widely used clustering methods. By examining three weeks of spam gathered in our SMTP server, we observed that the accuracy of the O-means clustering method is about 87% which is superior to the previous clustering methods. In addition, we define 12 statistical features to compare similarity between spam emails, and we determined a set of optimized features which makes the O-means clustering method more effective.},
keywords={},
doi={10.1587/transfun.E94.A.245},
ISSN={1745-1337},
month={January},}
부
TY - JOUR
TI - O-means: An Optimized Clustering Method for Analyzing Spam Based Attacks
T2 - IEICE TRANSACTIONS on Fundamentals
SP - 245
EP - 254
AU - Jungsuk SONG
AU - Daisuke INOUE
AU - Masashi ETO
AU - Hyung Chan KIM
AU - Koji NAKAO
PY - 2011
DO - 10.1587/transfun.E94.A.245
JO - IEICE TRANSACTIONS on Fundamentals
SN - 1745-1337
VL - E94-A
IS - 1
JA - IEICE TRANSACTIONS on Fundamentals
Y1 - January 2011
AB - In recent years, the number of spam emails has been dramatically increasing and spam is recognized as a serious internet threat. Most recent spam emails are being sent by bots which often operate with others in the form of a botnet, and skillful spammers try to conceal their activities from spam analyzers and spam detection technology. In addition, most spam messages contain URLs that lure spam receivers to malicious Web servers for the purpose of carrying out various cyber attacks such as malware infection, phishing attacks, etc. In order to cope with spam based attacks, there have been many efforts made towards the clustering of spam emails based on similarities between them. The spam clusters obtained from the clustering of spam emails can be used to identify the infrastructure of spam sending systems and malicious Web servers, and how they are grouped and correlate with each other, and to minimize the time needed for analyzing Web pages. Therefore, it is very important to improve the accuracy of the spam clustering as much as possible so as to analyze spam based attacks more accurately. In this paper, we present an optimized spam clustering method, called O-means, based on the K-means clustering method, which is one of the most widely used clustering methods. By examining three weeks of spam gathered in our SMTP server, we observed that the accuracy of the O-means clustering method is about 87% which is superior to the previous clustering methods. In addition, we define 12 statistical features to compare similarity between spam emails, and we determined a set of optimized features which makes the O-means clustering method more effective.
ER -