Main Article Content

Abstract

Class imbalance in datasets is a significant challenge in machine learning, often leading to a decline in model performance. This issue is frequently encountered in real-world data, where the proportion between majority and minority classes is highly imbalanced. One common approach to address this problem is oversampling, which aims to balance class distribution by adding synthetic data to the minority class. The most popular oversampling technique is the Synthetic Minority Oversampling Technique (SMOTE), although this method has drawbacks such as producing less diverse data and the potential generation of outliers. As an alternative solution, this study proposes the use of the Latin Hypercube Sampling (LHS) method combined with k-Nearest Neighbor (k-NN) to enhance classification performance on imbalanced datasets. The combination of LHS and k-NN is expected to produce higher quality synthetic data, thereby improving the performance of classification models measured using the confusion matrix. The data used in this study is sourced from various online repositories such as KEEL, Kaggle, UCI, as well as the student specialization of vocational high school (SMK) students in Pekanbaru

Keywords

Imbalanced Class Kinerja Klasifikasi k-Nearest Neighbors Latin Hypercube Sampling Oversampling

Article Details

Author Biographies

Sapriadi Sapriadi, institut kesehatan helvetia

S1 farmasi institut kesehatan helvetiaInstitut Kesehatan HelvetiaProgram Studi Sistem Informasi STMIK Logika Medan

Mardiah Nasution, Institut Kesehatan Helvetia

Institut Kesehatan HelvetiaProgram Studi Sistem Informasi STMIK Logika Medan
How to Cite
Sapriadi, S., & Nasution, M. (2024). OVERSAMPLING MENGGUNAKAN PENDEKATAN LATIN HYPERCUBE SAMPLING DAN K-NEAREST NEIGHBORS UNTUK MENINGKATKAN KINERJA KLASIFIKASI. Jurnal Komputer Terapan, 10(2), 98–110. Retrieved from https://jurnal.pcr.ac.id/index.php/jkt/article/view/6389

References

  1. H. Wang and H. Huang, “Feature Space Oversampling Technique for Imbalanced Classification,” 2019 6th Int. Conf. Information, Cybern. Comput. Soc. Syst. ICCSS 2019, pp. 93–99, 2019, doi: 10.1109/ICCSS48103.2019.9115430.
  2. R. Sauber-Cole and T. M. Khoshgoftaar, “The use of generative adversarial networks to alleviate class imbalance in tabular data: a survey,” J. Big Data, vol. 9, no. 1, 2022, doi: 10.1186/s40537-022-00648-6.
  3. A. Puri and M. K. Gupta, “Improved Hybrid Bag-Boost Ensemble with K-Means-SMOTE-ENN Technique for Handling Noisy Class Imbalanced Data,” Comput. J., vol. 65, no. 1, pp. 124–138, 2022, doi: 10.1093/comjnl/bxab039.
  4. H. Mardiansyah, R. Widia Sembiring, and S. Efendi, “Handling Problems of Credit Data for Imbalanced Classes using SMOTEXGBoost,” J. Phys. Conf. Ser., vol. 1830, no. 1, 2021, doi: 10.1088/1742-6596/1830/1/012011.
  5. J. M. Johnson and T. M. Khoshgoftaar, “Survey on deep learning with class imbalance,” J. Big Data, vol. 6, no. 1, 2019, doi: 10.1186/s40537-019-0192-5.
  6. W. Ustyannie and S. Suprapto, “Oversampling Method To Handling Imbalanced Datasets Problem in Binary Logistic Regression Algorithm,” IJCCS (Indonesian J. Comput. Cybern. Syst., vol. 14, no. 1, p. 1, 2020, doi: 10.22146/ijccs.37415.
  7. S. Mutmainah, “Penanganan Imbalance Data Pada Klasifikasi,” SNATi, vol. 1, pp. 10–16, 2021.
  8. I. Kunakorntum, W. Hinthong, and P. Phunchongharn, “A Synthetic Minority Based on Probabilistic Distribution (SyMProD) Oversampling for Imbalanced Datasets,” IEEE Access, vol. 8, pp. 114692–114704, 2020, doi: 10.1109/ACCESS.2020.3003346.
  9. L. Zhang et al., “A class imbalance loss for imbalanced object recognition,” IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., vol. 13, pp. 2778–2792, 2020, doi: 10.1109/JSTARS.2020.2995703.
  10. Z. Wang and H. Wang, “Global Data Distribution Weighted Synthetic Oversampling Technique for Imbalanced Learning,” IEEE Access, vol. 9, pp. 44770–44783, 2021, doi: 10.1109/ACCESS.2021.3067060.
  11. C. Liu et al., “Constrained Oversampling: An Oversampling Approach to Reduce Noise Generation in Imbalanced Datasets With Class Overlapping,” IEEE Access, vol. 10, no. July 2020, pp. 91452–91465, 2022, doi: 10.1109/ACCESS.2020.3018911.
  12. J. Engelmann and S. Lessmann, “Conditional Wasserstein GAN-based oversampling of tabular data for imbalanced learning,” Expert Syst. Appl., vol. 174, no. December 2020, p. 114582, 2021, doi: 10.1016/j.eswa.2021.114582.
  13. Y. Il Kang and S. Won, “Weight decision algorithm for oversampling technique on class-imbalanced learning,” ICCAS 2010 - Int. Conf. Control. Autom. Syst., pp. 182–186, 2010, doi: 10.1109/iccas.2010.5669889.
  14. C. Liu, X. Wang, K. Wu, J. Tan, F. Li, and W. Liu, “Oversampling for imbalanced time series classification based on generative adversarial networks,” 2018 IEEE 4th Int. Conf. Comput. Commun. ICCC 2018, pp. 1104–1108, 2018, doi: 10.1109/CompComm.2018.8780808.
  15. S. Korkmaz, M. A. ?ahman, A. C. Cinar, and E. Kaya, “Boosting the oversampling methods based on differential evolution strategies for imbalanced learning,” Appl. Soft Comput., vol. 112, p. 107787, 2021, doi: 10.1016/j.asoc.2021.107787.
  16. S. K. Lee, S. J. Hong, and S. Il Yang, “Oversampling for Imbalanced Data Classification Using Adversarial Network,” 9th Int. Conf. Inf. Commun. Technol. Converg. ICT Converg. Powered by Smart Intell. ICTC 2018, pp. 1255–1257, 2018, doi: 10.1109/ICTC.2018.8539543.
  17. V. A. Briones-Segovia, V. Jiménez-Villar, J. A. Carrasco-Ochoa, and J. F. Martínez-Trinidad, “A new oversampling method in the string space,” Expert Syst. Appl., vol. 183, no. November 2020, 2021, doi: 10.1016/j.eswa.2021.115428.
  18. G. Douzas, R. Rauch, and F. Bacao, “G-SOMO: An oversampling approach based on self-organized maps and geometric SMOTE,” Expert Syst. Appl., vol. 183, no. May, p. 115230, 2021, doi: 10.1016/j.eswa.2021.115230.
  19. S. Feng, J. Keung, X. Yu, Y. Xiao, and M. Zhang, “Investigation on the stability of SMOTE-based oversampling techniques in software defect prediction,” Inf. Softw. Technol., vol. 139, no. August 2020, p. 106662, 2021, doi: 10.1016/j.infsof.2021.106662.
  20. H. Zhou, X. Dong, S. Xia, and G. Wang, “Weighted oversampling algorithms for imbalanced problems and application in prediction of streamflow[Formula presented],” Knowledge-Based Syst., vol. 229, p. 107306, 2021, doi: 10.1016/j.knosys.2021.107306.
  21. J. Liu, “A minority oversampling approach for fault detection with heterogeneous imbalanced data,” Expert Syst. Appl., vol. 184, no. July, p. 115492, 2021, doi: 10.1016/j.eswa.2021.115492.
  22. K. U. Syaliman, A. Labellapansa, and A. Yulianti, “Improving the Accuracy of Features Weighted k-Nearest Neighbor using Distance Weight,” no. ICoSET 2019, pp. 326–330, 2020, doi: 10.5220/0009390903260330.
  23. Z. Pan, Y. Wang, and W. Ku, “A new general nearest neighbor classification based on the mutual neighborhood information,” Knowledge-Based Syst., vol. 121, pp. 142–152, 2017, doi: 10.1016/j.knosys.2017.01.021.
  24. Ö. F. Ertu?rul and M. E. Ta?luk, “A novel version of k nearest neighbor: Dependent nearest neighbor,” Appl. Soft Comput. J., vol. 55, pp. 480–490, 2017, doi: 10.1016/j.asoc.2017.02.020.
  25. A. A. Nababan, O. S. Sitompul, and Tulus, “Attribute Weighting Based K-Nearest Neighbor Using Gain Ratio Attribute Weighting Based K-Nearest Neighbor Using Gain Ratio,” 2018.
  26. X. Sun et al., “Smart Sampling for Reduced and Representative Power System Scenario Selection,” IEEE Open Access J. Power Energy, vol. 8, no. May, pp. 293–302, 2021, doi: 10.1109/OAJPE.2021.3093278.
  27. Q. Wang et al., “Modified Algorithms for Fast Construction of Optimal Latin-Hypercube Design,” IEEE Access, vol. 8, pp. 191644–191658, 2020, doi: 10.1109/ACCESS.2020.3032122.
  28. L. Zhu et al., “Pendekatan Resampling Data Untuk Menangani Masalah Ketidakseimbangan Kelas,” IEEE Access, vol. 8, no. 1, pp. 31–38, 2021, doi: 10.35508/jicon.v10i1.6554.
  29. X. Wang, J. Xu, T. Zeng, and L. Jing, “Local distribution-based adaptive minority oversampling for imbalanced data classification,” Neurocomputing, vol. 422, pp. 200–213, 2021, doi: 10.1016/j.neucom.2020.05.030.
  30. T. Kurbiel, H. G. Gckler, and D. Alfsmann, “A novel approach to the design of oversampling low-delay complex-modulated filter bank Pairs,” EURASIP J. Adv. Signal Process., vol. 2009, 2009, doi: 10.1155/2009/692861.
  31. J. Mendes, M. Freitas, H. Siqueira, A. Lazzaretti, S. Stevan, and S. Pichorim, “Comparative Analysis Among Feature Selection of sEMG Signal for Hand Gesture Classification by Armband,” IEEE Lat. Am. Trans., vol. 18, no. 6, pp. 1135–1143, 2020.
  32. A. Islam, S. B. Belhaouari, A. U. Rehman, and H. Bensmail, “K Nearest Neighbor OveRsampling approach: An open source python package for data augmentation,” Softw. Impacts, vol. 12, no. February, p. 100272, 2022, doi: 10.1016/j.simpa.2022.100272.
  33. M. Kumar, N. K. Rath, A. Swain, and S. K. Rath, “Feature Selection and Classification of Microarray Data using MapReduce based ANOVA and K-Nearest Neighbor,” Procedia Comput. Sci., vol. 54, pp. 301–310, 2015, doi: 10.1016/j.procs.2015.06.035.
  34. P. Nair and I. Kashyap, “Classification of medical image data using k nearest neighbor and finding the optimal k value,” Int. J. Sci. Technol. Res., vol. 9, no. 4, pp. 221–226, 2020.
  35. G. I. Okolo, S. Katsigiannis, and N. Ramzan, “IEViT: An enhanced vision transformer architecture for chest X-ray image classification,” Comput. Methods Programs Biomed., vol. 226, p. 107141, 2022, doi: 10.1016/j.cmpb.2022.107141.
  36. J. A. Romero-del-Castillo, M. Mendoza-Hurtado, D. Ortiz-Boyer, and N. García-Pedrajas, “Local-based k values for multi-label k-nearest neighbors rule,” Eng. Appl. Artif. Intell., vol. 116, no. June, p. 105487, 2022, doi: 10.1016/j.engappai.2022.105487.
  37. S. Suyanto, P. E. Yunanto, T. Wahyuningrum, and S. Khomsah, “A multi-voter multi-commission nearest neighbor classifier,” J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 8, pp. 6292–6302, 2022, doi: 10.1016/j.jksuci.2022.01.018.
  38. X. Zhang, H. Xiao, R. Gao, H. Zhang, and Y. Wang, “K-nearest neighbors rule combining prototype selection and local feature weighting for classification,” Knowledge-Based Syst., vol. 243, 2022, doi: 10.1016/j.knosys.2022.108451.
  39. N. García-Pedrajas and D. Ortiz-Boyer, “Boosting k-nearest neighbor classifier by means of input space projection,” Expert Syst. Appl., vol. 36, no. 7, pp. 10570–10582, 2009, doi: 10.1016/j.eswa.2009.02.065.
  40. S. Ougiaroglou and G. Evangelidis, “Fast and accurate k-nearest neighbor classification using prototype selection by clustering,” Proc. 2012 16th Panhellenic Conf. Informatics, PCI 2012, no. i, pp. 168–173, 2012, doi: 10.1109/PCi.2012.69.
  41. Z. Pan, Y. Wang, and W. Ku, “A new k-harmonic nearest neighbor classifier based on the multi-local means,” Expert Syst. Appl., vol. 67, pp. 115–125, 2017, doi: 10.1016/j.eswa.2016.09.031.
  42. J. Wang, P. Neskovic, and L. N. Cooper, “Improving nearest neighbor rule with a simple adaptive distance measure,” Pattern Recognit. Lett., vol. 28, no. 2, pp. 207–213, 2007, doi: 10.1016/j.patrec.2006.07.002.
  43. K. U. Syaliman, E. B. Nababan, and O. S. Sitompul, “Improving the accuracy of k-nearest neighbor using local mean based and distance weight,” J. Phys. Conf. Ser., vol. 978, no. 1, pp. 1–6, 2018, doi: 10.1088/1742-6596/978/1/012047.
  44. Y. Yuliska and K. U. Syaliman, “Peningkatan Akurasi K-Nearest Neighbor Pada Data Index Standar Pencemaran Udara Kota Pekanbaru,” IT J. Res. Dev., vol. 5, no. 1, pp. 11–18, 2020, doi: 10.25299/itjrd.2020.vol5(1).4680.
  45. X. Wu et al., Top 10 algorithms in data mining, vol. 14, no. 1. 2008. doi: 10.1007/s10115-007-0114-2.

Similar Articles

1 2 3 4 5 > >> 

You may also start an advanced similarity search for this article.