The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. ex. Some numerals are expressed as "XNUMX".
Copyrights notice
The original paper is in English. Non-English content has been machine-translated and may contain typographical errors or mistranslations. Copyrights notice
추정 방법을 적용할 때 이상값 문제는 불가피합니다. 여러 연구에서 이상값 제거 방법을 평가했지만 영향의 정도는 명확하지 않습니다. 이상값에 항상 민감해야 하는지, 추정 전에 항상 이상값을 제거해야 하는지, 프로젝트 데이터를 수집하는 데 어느 정도의 예방 조치가 필요한지 불분명합니다. 따라서 본 연구의 목적은 이상값을 얼마나 민감하게 처리해야 하는지 제시하는 지침을 제시하는 것입니다. 분석에서는 세 가지 데이터세트에 이상값을 실험적으로 추가하여 그 영향을 분석했습니다. 우리는 이상값의 비율, 범위(예: 범위가 100%일 때 실제 노력을 200~100인시로 변경), 이상값을 포함한 변수(예: 기능 점수 또는 노력에 이상값 추가) 및 데이터 세트의 이상치 위치. 다음으로 이러한 데이터 세트를 사용하여 노력을 추정했습니다. 개발 노력을 추정하기 위해 다중 선형 회귀 분석과 유추 기반 추정을 사용했습니다. 실험 결과는 이상치의 정도나 비율이 상당한 경우(즉, 각각 100%와 20%) 추정 정확도에 대한 이상치의 영향이 적지 않다는 것을 나타냅니다. 대조적으로, 규모와 비율이 작을 때(즉, 각각 50%와 10%) 그 영향력은 무시할 수 있습니다. 더욱이 어떤 경우에는 선형 회귀 분석이 유추 기반 추정보다 이상치의 영향을 덜 받았습니다.
Kenichi ONO
Nara Institute of Science and Technology
Masateru TSUNODA
Kindai University
Akito MONDEN
Okayama University
Kenichi MATSUMOTO
Nara Institute of Science and Technology
The copyright of the original papers published on this site belongs to IEICE. Unauthorized use of the original or translated papers is prohibited. See IEICE Provisions on Copyright for details.
부
Kenichi ONO, Masateru TSUNODA, Akito MONDEN, Kenichi MATSUMOTO, "Influence of Outliers on Estimation Accuracy of Software Development Effort" in IEICE TRANSACTIONS on Information,
vol. E104-D, no. 1, pp. 91-105, January 2021, doi: 10.1587/transinf.2020MPP0005.
Abstract: When applying estimation methods, the issue of outliers is inevitable. The extent of their influence has not been clarified, though several studies have evaluated outlier elimination methods. It is unclear whether we should always be sensitive to outliers, whether outliers should always be removed before estimation, and what amount of precaution is required for collecting project data. Therefore, the goal of this study is to illustrate a guideline that suggests how sensitively we should handle outliers. In the analysis, we experimentally add outliers to three datasets, to analyze their influence. We modified the percentage of outliers, their extent (e.g., we varied the actual effort from 100 to 200 person-hours when the extent was 100%), the variables including outliers (e.g., adding outliers to function points or effort), and the locations of outliers in a dataset. Next, the effort was estimated using these datasets. We used multiple linear regression analysis and analogy based estimation to estimate the development effort. The experimental results indicate that the influence of outliers on the estimation accuracy is non-trivial when the extent or percentage of outliers is considerable (i.e., 100% and 20%, respectively). In contrast, their influence is negligible when the extent and percentage are small (i.e., 50% and 10%, respectively). Moreover, in some cases, the linear regression analysis was less affected by outliers than analogy based estimation.
URL: https://global.ieice.org/en_transactions/information/10.1587/transinf.2020MPP0005/_p
부
@ARTICLE{e104-d_1_91,
author={Kenichi ONO, Masateru TSUNODA, Akito MONDEN, Kenichi MATSUMOTO, },
journal={IEICE TRANSACTIONS on Information},
title={Influence of Outliers on Estimation Accuracy of Software Development Effort},
year={2021},
volume={E104-D},
number={1},
pages={91-105},
abstract={When applying estimation methods, the issue of outliers is inevitable. The extent of their influence has not been clarified, though several studies have evaluated outlier elimination methods. It is unclear whether we should always be sensitive to outliers, whether outliers should always be removed before estimation, and what amount of precaution is required for collecting project data. Therefore, the goal of this study is to illustrate a guideline that suggests how sensitively we should handle outliers. In the analysis, we experimentally add outliers to three datasets, to analyze their influence. We modified the percentage of outliers, their extent (e.g., we varied the actual effort from 100 to 200 person-hours when the extent was 100%), the variables including outliers (e.g., adding outliers to function points or effort), and the locations of outliers in a dataset. Next, the effort was estimated using these datasets. We used multiple linear regression analysis and analogy based estimation to estimate the development effort. The experimental results indicate that the influence of outliers on the estimation accuracy is non-trivial when the extent or percentage of outliers is considerable (i.e., 100% and 20%, respectively). In contrast, their influence is negligible when the extent and percentage are small (i.e., 50% and 10%, respectively). Moreover, in some cases, the linear regression analysis was less affected by outliers than analogy based estimation.},
keywords={},
doi={10.1587/transinf.2020MPP0005},
ISSN={1745-1361},
month={January},}
부
TY - JOUR
TI - Influence of Outliers on Estimation Accuracy of Software Development Effort
T2 - IEICE TRANSACTIONS on Information
SP - 91
EP - 105
AU - Kenichi ONO
AU - Masateru TSUNODA
AU - Akito MONDEN
AU - Kenichi MATSUMOTO
PY - 2021
DO - 10.1587/transinf.2020MPP0005
JO - IEICE TRANSACTIONS on Information
SN - 1745-1361
VL - E104-D
IS - 1
JA - IEICE TRANSACTIONS on Information
Y1 - January 2021
AB - When applying estimation methods, the issue of outliers is inevitable. The extent of their influence has not been clarified, though several studies have evaluated outlier elimination methods. It is unclear whether we should always be sensitive to outliers, whether outliers should always be removed before estimation, and what amount of precaution is required for collecting project data. Therefore, the goal of this study is to illustrate a guideline that suggests how sensitively we should handle outliers. In the analysis, we experimentally add outliers to three datasets, to analyze their influence. We modified the percentage of outliers, their extent (e.g., we varied the actual effort from 100 to 200 person-hours when the extent was 100%), the variables including outliers (e.g., adding outliers to function points or effort), and the locations of outliers in a dataset. Next, the effort was estimated using these datasets. We used multiple linear regression analysis and analogy based estimation to estimate the development effort. The experimental results indicate that the influence of outliers on the estimation accuracy is non-trivial when the extent or percentage of outliers is considerable (i.e., 100% and 20%, respectively). In contrast, their influence is negligible when the extent and percentage are small (i.e., 50% and 10%, respectively). Moreover, in some cases, the linear regression analysis was less affected by outliers than analogy based estimation.
ER -