Main Article Content

Abstract

Stroke is a significant global health concern, requiring an in-depth understanding of the complex factors contributing to its occurrence. Age, body mass index (BMI), and average glucose levels are critical factors in stroke etiology. This study employed exploratory data analysis techniques to investigate the relationships between variables in a stroke prediction dataset. The research methodology included (1) dataset description, (2) data preprocessing, (3) exploratory data analysis, and (4) interpretation. Descriptive statistical analysis provided insights into the dataset's composition and variability, while data preprocessing techniques handled missing values and facilitated feature extraction. Based on exploratory data analysis, significant relationships were found between age, hypertension, heart disease, average glucose levels, and stroke. However, BMI showed a less significant role in stroke. These findings contribute to a better understanding of the factors contributing to stroke risk and may aid in developing more effective prevention strategies.

Keywords

Analisis Data Eksploratori Stroke Analisis Statistik Deskriptif Faktor Risiko Exploratory Data Analysis Stroke Statistical Descriptive Analysis Risk Factor

Article Details

Author Biographies

Muhammad Ariful Furqon, Universitas Jember

Informatika Universitas Jember

Nina Fadilah Najwa, Politeknik Caltex Riau

Sistem Informasi Politeknik Caltex Riau

Mohammad Zarkasi, Universitas Jember

Teknologi Informasi Universitas Jember

Priza Pandunata, Universitas Jember

Teknologi Informasi Universitas Jember

Gama Wisnu Fajariyanto, Universitas Jember

Informatika Universitas Jember
How to Cite
Ariful Furqon, M. A., Najwa, N. F., Zarkasi, M., Pandunata, P., & Fajariyanto, G. W. (2024). Critical Exploratory Data Analysis on Stroke Prediction Dataset. Jurnal Komputer Terapan, 10(1), 67–77. https://doi.org/10.35143/jkt.v10i1.6307

References

  1. E. de Robertis, O. Piazza, and G. Servillo, “The role of ventricular stroke work in daily clinical practice,” Minerva Anestesiol, vol. 76, no. 11, 2010, [Online]. Available: https://www.researchgate.net/publication/47384756
  2. G. J. Del Zoppo and J. M. Hallenbeck, “Advances in the Vascular Pathophysiology of Ischemic Stroke,” Thromb Res, vol. 98, no. 3, pp. 73–81, May 2000, doi: 10.1016/S0049-3848(00)00218-8.
  3. W. Johnson, O. Onuma, M. Owolabi, and S. Sachdev, “Stroke: A global response is needed,” Bulletin of the World Health Organization, vol. 94, no. 9. World Health Organization, pp. 634A-635A, Sep. 01, 2016. doi: 10.2471/BLT.16.181636.
  4. A. K. A. Unnithan, J. M Das, and P. Mehta, “Hemorrhagic Stroke,” StatPearls, Jul. 2020, Accessed: Feb. 08, 2023. [Online]. Available: http://europepmc.org/books/NBK559173
  5. S. D. Smith and C. J. Eskey, “Hemorrhagic Stroke,” Radiologic Clinics, vol. 49, no. 1, pp. 27–45, Jan. 2011, doi: 10.1016/J.RCL.2010.07.011.
  6. S. K. Feske, “Ischemic Stroke,” Am J Med, vol. 134, no. 12, pp. 1457–1464, Dec. 2021, doi: 10.1016/J.AMJMED.2021.07.027.
  7. S. A. Randolph, “Ischemic Stroke,” Workplace Health Saf, vol. 64, no. 9, p. 444, Sep. 2016, doi: 10.1177/2165079916665400.
  8. W. Riyadina, J. Pradono, D. Kristanti, and Y. Turana, “Stroke in Indonesia: Risk factors and predispositions in young adults,” J Cardiovasc Dis Res, vol. 11, no. 2, pp. 178–183, 2020, doi: 10.31838/jcdr.2020.11.02.30.
  9. N. R. Wati, E. Husna, S. Prima, and N. Bukittinggi, “Analisis Faktor Yang Berhubungan Dengan Kejadian Stroke Pada Penderita Stroke di Ruang Rawat Inap C Lantai 1 dan 2 RSSN Bukittinggi Tahun 2016,” 2018.
  10. I. Setyopranoto et al., “Prevalence of stroke and associated risk factors in sleman district of Yogyakarta Special Region, Indonesia,” Stroke Res Treat, vol. 2019, 2019, doi: 10.1155/2019/2642458.
  11. A. Guzik and C. Bushnell, “Stroke Epidemiology and Risk Factor Management,” CONTINUUM Lifelong Learning in Neurology, vol. 23, no. 1, pp. 15–39, Feb. 2017, doi: 10.1212/CON.0000000000000416.
  12. A. K. Boehme, C. Esenwa, M. S. V Elkind, M. Fisher, C. Iadecola, and R. Sacco, “Stroke Risk Factors, Genetics, and Prevention,” Circ Res, vol. 120, no. 3, pp. 472–495, Feb. 2017, doi: 10.1161/CIRCRESAHA.116.308398.
  13. A. Alloubani, A. Saleh, and I. Abdelhafiz, “Hypertension and diabetes mellitus as a predictive risk factors for stroke,” Diabetes & Metabolic Syndrome: Clinical Research & Reviews, vol. 12, no. 4, pp. 577–584, Jul. 2018, doi: 10.1016/J.DSX.2018.03.009.
  14. S. Juvela, J. Siironen, and J. Kuhmonen, “Hyperglycemia, excess weight, and history of hypertension as risk factors for poor outcome and cerebral infarction after aneurysmal subarachnoid hemorrhage,” J Neurosurg, vol. 102, no. 6, pp. 998–1003, Jun. 2005, doi: 10.3171/JNS.2005.102.6.0998.
  15. J. Z. Willey et al., “Population attributable risks of hypertension and diabetes for cardiovascular disease and stroke in the Northern Manhattan study,” J Am Heart Assoc, vol. 3, no. 5, Sep. 2014, doi: 10.1161/JAHA.114.001106.
  16. U. Singh, M. Hur, K. Dorman, and E. S. Wurtele, “MetaOmGraph: a workbench for interactive exploratory data analysis of large expression datasets,” Nucleic Acids Res, vol. 48, no. 4, pp. e23–e23, Feb. 2020, doi: 10.1093/NAR/GKZ1209.
  17. L. Kharb, A. Tyagi, and D. Chahal, “Meta-analysis Review International Journal of Current Research and Review Exploratory Data Analysis on the Epidemiology of Coronavirus (COVID-19) Pandemic Outbreak,” Int J Cur Res Rev |, vol. 13, p. 12, 2021, doi: 10.31782/IJCRR.2021.SP170.
  18. Z. Abedjan et al., “Detecting data errors,” Proceedings of the VLDB Endowment, vol. 9, no. 12, pp. 993–1004, Aug. 2016, doi: 10.14778/2994509.2994518.
  19. E. S. Sintiya, A. Kusumawardana, M. A. Furqon, N. F. Najwa, A. C. Puspitaningrum, and A. S. Afrah, “SARIMA and Holt-Winters Seasonal Methods for Time Series Forecasting in Tuberculosis Case,” in 2020 4th International Conference on Vocational Education and Training (ICOVET), 2020, pp. 1–5.
  20. K. M. Lang and T. D. Little, “Principled missing data treatments,” Prevention Science, vol. 19, no. 3, pp. 284–294, Apr. 2018, doi: 10.1007/S11121-016-0644-5/METRICS.
  21. R. Kaur, K. Hambarde, R. George, A. Hussain, C. Gomkar, and S. Sonawani, “Stroke Prediction using Optimization and Exploratory Data Analysis,” 2022 IEEE Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation, IATMSI 2022, 2022, doi: 10.1109/IATMSI56455.2022.10119295.
  22. S. E. Kemp, M. Ng, T. Hollowood, and J. Hort, “Introduction to Descriptive Analysis,” Descriptive Analysis in Sensory Evaluation, pp. 1–39, Dec. 2017, doi: 10.1002/9781118991657.CH1.
  23. L. Wilkinson, “Visualizing Big Data Outliers Through Distributed Aggregation,” IEEE Trans Vis Comput Graph, vol. 24, no. 1, pp. 256–266, Jan. 2018, doi: 10.1109/TVCG.2017.2744685.
  24. P. R. Kaundinya, K. Choudhary, and S. R. Kalidindi, “Machine learning approaches for feature engineering of the crystal structure: Application to the prediction of the formation energy of cubic compounds,” Phys Rev Mater, vol. 5, no. 6, p. 063802, Jun. 2021, doi: 10.1103/PHYSREVMATERIALS.5.063802.
  25. T. Emmanuel, T. Maupong, D. Mpoeleng, T. Semong, B. Mphago, and O. Tabona, “A survey on missing data in machine learning,” Journal of Big Data 2021 8:1, vol. 8, no. 1, pp. 1–37, Oct. 2021, doi: 10.1186/S40537-021-00516-9.
  26. N. Yadav and N. Badal, “Data preprocessing based on missing value and discretisation,” International Journal of Forensic Software Engineering, vol. 1, no. 2/3, p. 193, 2020, doi: 10.1504/IJFSE.2020.110584.
  27. A. Chatterjee, M. W. Gerdes, and S. G. Martinez, “Statistical explorations and univariate timeseries analysis on COVID-19 datasets to understand the trend of disease spreading and death,” Sensors, vol. 20, no. 11, p. 3089, 2020.
  28. M. Abzalov and M. Abzalov, “Exploratory data analysis,” Applied Mining Geology, pp. 207–219, 2016.
  29. M. Valera, R. K. Walter, B. A. Bailey, and J. E. Castillo, “Machine learning based predictions of dissolved oxygen in a small coastal embayment,” J Mar Sci Eng, vol. 8, no. 12, p. 1007, 2020.
  30. F. Thabtah, S. Hammoud, F. Kamalov, and A. Gonsalves, “Data imbalance in classification: Experimental evaluation,” Inf Sci (N Y), vol. 513, pp. 429–441, 2020.