Please use this identifier to cite or link to this item: http://localhost/handle/Hannan/210815
Title: An Ensemble Random Forest Algorithm for Insurance Big Data Analysis
Authors: Weiwei Lin;Ziming Wu;Longxin Lin;Angzhan Wen;Jin Li
Year: 2017
Publisher: IEEE
Abstract: Due to the imbalanced distribution of business data, missing user features, and many other reasons, directly using big data techniques on realistic business data tends to deviate from the business goals. It is difficult to model the insurance business data by classification algorithms, such as logistic regression and support vector machine (SVM). In this paper, we exploit a heuristic bootstrap sampling approach combined with the ensemble learning algorithm on the large-scale insurance business data mining, and propose an ensemble random forest algorithm that uses the parallel computing capability and memorycache mechanism optimized by Spark. We collected the insurance business data from China Life Insurance Company to analyze the potential customers using the proposed algorithm. We use F-Measure and G-mean to evaluate the performance of the algorithm. Experiment result shows that the ensemble random forest algorithm outperformed SVM and other classification algorithms in both performance and accuracy within the imbalanced data, and it is useful for improving the accuracy of product marketing compared to the traditional artificial approach.
Description: 
URI: http://localhost/handle/Hannan/210815
volume: 5
More Information: 16568,
16575
Appears in Collections:2017

Files in This Item:
File SizeFormat 
8007210.pdf3.6 MBAdobe PDF
Title: An Ensemble Random Forest Algorithm for Insurance Big Data Analysis
Authors: Weiwei Lin;Ziming Wu;Longxin Lin;Angzhan Wen;Jin Li
Year: 2017
Publisher: IEEE
Abstract: Due to the imbalanced distribution of business data, missing user features, and many other reasons, directly using big data techniques on realistic business data tends to deviate from the business goals. It is difficult to model the insurance business data by classification algorithms, such as logistic regression and support vector machine (SVM). In this paper, we exploit a heuristic bootstrap sampling approach combined with the ensemble learning algorithm on the large-scale insurance business data mining, and propose an ensemble random forest algorithm that uses the parallel computing capability and memorycache mechanism optimized by Spark. We collected the insurance business data from China Life Insurance Company to analyze the potential customers using the proposed algorithm. We use F-Measure and G-mean to evaluate the performance of the algorithm. Experiment result shows that the ensemble random forest algorithm outperformed SVM and other classification algorithms in both performance and accuracy within the imbalanced data, and it is useful for improving the accuracy of product marketing compared to the traditional artificial approach.
Description: 
URI: http://localhost/handle/Hannan/210815
volume: 5
More Information: 16568,
16575
Appears in Collections:2017

Files in This Item:
File SizeFormat 
8007210.pdf3.6 MBAdobe PDF
Title: An Ensemble Random Forest Algorithm for Insurance Big Data Analysis
Authors: Weiwei Lin;Ziming Wu;Longxin Lin;Angzhan Wen;Jin Li
Year: 2017
Publisher: IEEE
Abstract: Due to the imbalanced distribution of business data, missing user features, and many other reasons, directly using big data techniques on realistic business data tends to deviate from the business goals. It is difficult to model the insurance business data by classification algorithms, such as logistic regression and support vector machine (SVM). In this paper, we exploit a heuristic bootstrap sampling approach combined with the ensemble learning algorithm on the large-scale insurance business data mining, and propose an ensemble random forest algorithm that uses the parallel computing capability and memorycache mechanism optimized by Spark. We collected the insurance business data from China Life Insurance Company to analyze the potential customers using the proposed algorithm. We use F-Measure and G-mean to evaluate the performance of the algorithm. Experiment result shows that the ensemble random forest algorithm outperformed SVM and other classification algorithms in both performance and accuracy within the imbalanced data, and it is useful for improving the accuracy of product marketing compared to the traditional artificial approach.
Description: 
URI: http://localhost/handle/Hannan/210815
volume: 5
More Information: 16568,
16575
Appears in Collections:2017

Files in This Item:
File SizeFormat 
8007210.pdf3.6 MBAdobe PDF