Please use this identifier to cite or link to this item: http://localhost/handle/Hannan/220196
Title: A Noise-Filtered Under-Sampling Scheme for Imbalanced Classification
Authors: Qi Kang;XiaoShuang Chen;SiSi Li;MengChu Zhou
Year: 2017
Publisher: IEEE
Abstract: Under-sampling is a popular data preprocessing method in dealing with class imbalance problems, with the purposes of balancing datasets to achieve a high classification rate and avoiding the bias toward majority class examples. It always uses full minority data in a training dataset. However, some noisy minority examples may reduce the performance of classifiers. In this paper, a new under-sampling scheme is proposed by incorporating a noise filter before executing resampling. In order to verify the efficiency, this scheme is implemented based on four popular under-sampling methods, i.e., Undersampling + Adaboost, RUSBoost, UnderBagging, and EasyEnsemble through benchmarks and significance analysis. Furthermore, this paper also summarizes the relationship between algorithm performance and imbalanced ratio. Experimental results indicate that the proposed scheme can improve the original undersampling-based methods with significance in terms of three popular metrics for imbalanced classification, i.e., the area under the curve, F-measure, and G-mean.
URI: http://localhost/handle/Hannan/220196
volume: 47
issue: 12
More Information: 4263,
4274
Appears in Collections:2017

Files in This Item:
File Description SizeFormat 
7589046.pdf3.12 MBAdobe PDFThumbnail
Preview File
Title: A Noise-Filtered Under-Sampling Scheme for Imbalanced Classification
Authors: Qi Kang;XiaoShuang Chen;SiSi Li;MengChu Zhou
Year: 2017
Publisher: IEEE
Abstract: Under-sampling is a popular data preprocessing method in dealing with class imbalance problems, with the purposes of balancing datasets to achieve a high classification rate and avoiding the bias toward majority class examples. It always uses full minority data in a training dataset. However, some noisy minority examples may reduce the performance of classifiers. In this paper, a new under-sampling scheme is proposed by incorporating a noise filter before executing resampling. In order to verify the efficiency, this scheme is implemented based on four popular under-sampling methods, i.e., Undersampling + Adaboost, RUSBoost, UnderBagging, and EasyEnsemble through benchmarks and significance analysis. Furthermore, this paper also summarizes the relationship between algorithm performance and imbalanced ratio. Experimental results indicate that the proposed scheme can improve the original undersampling-based methods with significance in terms of three popular metrics for imbalanced classification, i.e., the area under the curve, F-measure, and G-mean.
URI: http://localhost/handle/Hannan/220196
volume: 47
issue: 12
More Information: 4263,
4274
Appears in Collections:2017

Files in This Item:
File Description SizeFormat 
7589046.pdf3.12 MBAdobe PDFThumbnail
Preview File
Title: A Noise-Filtered Under-Sampling Scheme for Imbalanced Classification
Authors: Qi Kang;XiaoShuang Chen;SiSi Li;MengChu Zhou
Year: 2017
Publisher: IEEE
Abstract: Under-sampling is a popular data preprocessing method in dealing with class imbalance problems, with the purposes of balancing datasets to achieve a high classification rate and avoiding the bias toward majority class examples. It always uses full minority data in a training dataset. However, some noisy minority examples may reduce the performance of classifiers. In this paper, a new under-sampling scheme is proposed by incorporating a noise filter before executing resampling. In order to verify the efficiency, this scheme is implemented based on four popular under-sampling methods, i.e., Undersampling + Adaboost, RUSBoost, UnderBagging, and EasyEnsemble through benchmarks and significance analysis. Furthermore, this paper also summarizes the relationship between algorithm performance and imbalanced ratio. Experimental results indicate that the proposed scheme can improve the original undersampling-based methods with significance in terms of three popular metrics for imbalanced classification, i.e., the area under the curve, F-measure, and G-mean.
URI: http://localhost/handle/Hannan/220196
volume: 47
issue: 12
More Information: 4263,
4274
Appears in Collections:2017

Files in This Item:
File Description SizeFormat 
7589046.pdf3.12 MBAdobe PDFThumbnail
Preview File