Main Article Content

Abstract

Company bankruptcy becomes a serious problem because it can cause economic damage and other social consequences. It’s very important to predict bankruptcy as early as possible because prediction can be useful for evaluation and planning to avoid bankruptcy. Bankruptcy prediction is one of the imbalanced classification problems because the data with the bankrupt class is far less than the non-bankrupt class. This study aims to produce a good classification model for predicting bankruptcy. Resampling used a combination of SMOTE and under sampling, is applied to the training data to produce more optimal classification model. The classification method used for prediction is multilayer perceptron and complement naïve bayes. Predictive performance was calculated using recall, ROC AUC, and PR AUC. Based on the test, using SMOTE and under sampling is quite significant in improving the classification model on the multilayer perceptron. Resampling in complement naïve bayes also increased. recall and PR AUC scores The best recall obtained was 95.45% with the complement naïve bayes method. The highest ROC AUC with resampling was also obtained using complement naïve bayes of 87.80%. Therefore, it’s concluded that bankruptcy prediction using resampling with SMOTE and under sampling, can produce good performance for detecting bankruptcy.

Keywords

imbalanced dataset bankruptcy prediction smote under sampling multilayer perceptron complement naive bayes imbalanced dataset prediksi kebangkrutan smote under sampling multilayer perceptron complement naive bayes

Article Details

How to Cite
Sabilla, W. I., & Bella Vista, . C. . (2021). Implementation of SMOTE and Under Sampling on Imbalanced Datasets for Predicting Company Bankruptcy. Jurnal Komputer Terapan, 7(2), 329–339. https://doi.org/10.35143/jkt.v7i2.5027

References

  1. A. S. Ramadhani and N. Lukviarman, "Perbandingan Analisis Prediksi Kebangkrutan Menggunakan Model Altman Pertama, Altman Revisi, Dan Altman Modifikasi Dengan Ukuran Dan Umur Perusahaan Sebagai Variabel Penjelas (Studi Pada Perusahaan Manufaktur Yang Terdaftar Di Bursa Efek Indonesia)," Jurnal Siasat Bisnis, vol. 13, no. 1, pp. 15-28, April 2009.
  2. Y. Cao, X. Liu, J. Zhai and S. Hua, "A two-stage Bayesian network model for corporate bankruptcy prediction," International Journal of Finance & Economics, pp. 1-18, 2020.
  3. N. H. Matturungan, B. Purwanto and A. K. Irwanto, "Manufacturing Company Bankruptcy Prediction in Indonesia with Altman Z-Score Model," Journal of Applied Management, vol. 15, no. 1, pp. 18-24, March 2017.
  4. D. Liang, C.-C. Lu, C.-F. Tsai and G.-A. Shih, "Financial ratios and corporate governance indicators in bankruptcy prediction: A comprehensive study," European Journal of Operational Research, vol. 252, pp. 561-572, 2016.
  5. T. Le, M. Y. Lee, J. R. Park and S. W. Baik, "Oversampling Techniques for Bankruptcy Prediction: Novel Features from a Transaction Dataset," Symmetry, vol. 10, pp. 79-91, 2018.
  6. T. Le, L. H. Son, M. T. Vo, M. Y. Lee and S. W. Baik, "A Cluster-Based Boosting Algorithm for Bankruptcy Prediction in a Highly Imbalanced Dataset," Symmetry, vol. 10, pp. 250-262, 2018.
  7. P. Vuttipittayamongkol and E. Elyan, "Improved Overlap-based Undersampling for Imbalanced Dataset Classification with Application to Epilepsy and Parkinson’s Disease," International Journal of Neural Systems, pp. 1-16, 2020.
  8. S. Makki, Z. Assaghir, Y. Taher, R. Haque, M.-S. Hacid and H. Zeineddine, "An Experimental Study With Imbalanced Classification Approaches for Credit Card Fraud Detection," IEEE Acess, pp. 93010-93022, 2019.
  9. A. K. I. Hassan and A. Abraham, "Modeling Insurance Fraud Detection Using Imbalanced Data Classification," Advances in Nature and Biologically Inspired Computing, Advances in Intelligent Systems and Computing, pp. 117-127, 2016.
  10. T. Kim and H. Ahn, "A Hybrid Under-sampling Approach for Better Bankruptcy Prediction," Journal of Intelligence and Information Systems, pp. 173-190, 2015.
  11. J. Horak, J. Vrbka and P. Suler, "Support Vector Machine Methods and Artificial Neural Networks Used for the Development of Bankruptcy Prediction Models and their Comparison," Journal of Risk and Financial Management, p. 60, 2020.
  12. Y. Bae and H. Lee, "Sentiment analysis of twitter audiences: Measuring the positive or negative influence of popular twitterers," Journal of the American Society for Information Science and Technology, vol. 63, no. 12, pp. 2521-2535, 2012.
  13. N. V. Chawla, K. W. Bowyer, L. O. Hall and W. P. Kegelmeyer, "SMOTE: Synthetic Minority Over-sampling Technique," Journal Of Artificial Intelligence Research, vol. 16, pp. 321-357, 2002.
  14. H. He and Y. Ma, Imbalanced Learning: Foundations, Algorithms, and Applications 1st Edition, New Jersey: IEEE Press Wiley, 2013.
  15. E. P. Cynthia and E. Ismanto, "Jaringan Syaraf Tiruan Algoritma Backpropagation dalam Memprediksi Ketersediaan Komoditi Pangan Provinsi Riau," Rabit : Jurnal Teknologi Dan Sistem Informasi Univrab, vol. 2, no. 2, pp. 83-98, 2017.
  16. J. D. Rennie, L. Shih, J. Teevan and D. R. Karger, "Tackling the poor assumptions of naive bayes text classifiers," CML, vol. 3, pp. 616-623, 2003.
  17. T. Saito and M. Rehmsmeier, "The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets," PLoS ONE, vol. 10, no. 3, p. e0118432, 2015.