Critical Exploratory Data Analysis on Stroke Prediction Dataset

Muhammad Arif Ariful Furqon; Nina Fadilah Najwa; Mohamad Zarkasi; Priza Pandunata; Gama Wisnu Fajariyanto

doi:10.35143/jkt.v10i1.6307

Submitted

2 May 2024

Accepted

9 May 2024

Published

14 June 2024

Download

PDF

Statistic

Read Counter : 1326 Download : 314

Downloads

Download data is not yet available.

Abstract

Stroke is a significant global health concern, requiring an in-depth understanding of the complex factors contributing to its occurrence. Age, body mass index (BMI), and average glucose levels are critical factors in stroke etiology. This study employed exploratory data analysis techniques to investigate the relationships between variables in a stroke prediction dataset. The research methodology included (1) dataset description, (2) data preprocessing, (3) exploratory data analysis, and (4) interpretation. Descriptive statistical analysis provided insights into the dataset's composition and variability, while data preprocessing techniques handled missing values and facilitated feature extraction. Based on exploratory data analysis, significant relationships were found between age, hypertension, heart disease, average glucose levels, and stroke. However, BMI showed a less significant role in stroke. These findings contribute to a better understanding of the factors contributing to stroke risk and may aid in developing more effective prevention strategies.

Keywords

Exploratory Data Analysis Stroke Statistical Descriptive Analysis Risk Factor

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Copyright info for authors

1. Authors hold the copyright in any process, procedure, or article described in the work and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.

2. Authors retain publishing rights to re-use all or portion of the work in different work but can not granting third-party requests for reprinting and republishing the work.

3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) as it can lead to productive exchanges, as well as earlier and greater citation of published work.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Author Biographies

Muhammad Ariful Furqon, Universitas Jember

Informatika Universitas Jember

Nina Fadilah Najwa, Politeknik Caltex Riau

Sistem Informasi Politeknik Caltex Riau

Mohammad Zarkasi, Universitas Jember

Teknologi Informasi Universitas Jember

Priza Pandunata, Universitas Jember

Teknologi Informasi Universitas Jember

Gama Wisnu Fajariyanto, Universitas Jember

Informatika Universitas Jember

How to Cite

Ariful Furqon, M. A., Najwa, N. F., Zarkasi, M., Pandunata, P., & Fajariyanto, G. W. (2024). Critical Exploratory Data Analysis on Stroke Prediction Dataset. Jurnal Komputer Terapan, 10(1), 67–77. https://doi.org/10.35143/jkt.v10i1.6307

Download Citation

References

E. de Robertis, O. Piazza, and G. Servillo, “The role of ventricular stroke work in daily clinical practice,” Minerva Anestesiol, vol. 76, no. 11, 2010, [Online]. Available: https://www.researchgate.net/publication/47384756
G. J. Del Zoppo and J. M. Hallenbeck, “Advances in the Vascular Pathophysiology of Ischemic Stroke,” Thromb Res, vol. 98, no. 3, pp. 73–81, May 2000, doi: 10.1016/S0049-3848(00)00218-8.
W. Johnson, O. Onuma, M. Owolabi, and S. Sachdev, “Stroke: A global response is needed,” Bulletin of the World Health Organization, vol. 94, no. 9. World Health Organization, pp. 634A-635A, Sep. 01, 2016. doi: 10.2471/BLT.16.181636.
A. K. A. Unnithan, J. M Das, and P. Mehta, “Hemorrhagic Stroke,” StatPearls, Jul. 2020, Accessed: Feb. 08, 2023. [Online]. Available: http://europepmc.org/books/NBK559173
S. D. Smith and C. J. Eskey, “Hemorrhagic Stroke,” Radiologic Clinics, vol. 49, no. 1, pp. 27–45, Jan. 2011, doi: 10.1016/J.RCL.2010.07.011.
S. K. Feske, “Ischemic Stroke,” Am J Med, vol. 134, no. 12, pp. 1457–1464, Dec. 2021, doi: 10.1016/J.AMJMED.2021.07.027.
S. A. Randolph, “Ischemic Stroke,” Workplace Health Saf, vol. 64, no. 9, p. 444, Sep. 2016, doi: 10.1177/2165079916665400.
W. Riyadina, J. Pradono, D. Kristanti, and Y. Turana, “Stroke in Indonesia: Risk factors and predispositions in young adults,” J Cardiovasc Dis Res, vol. 11, no. 2, pp. 178–183, 2020, doi: 10.31838/jcdr.2020.11.02.30.
N. R. Wati, E. Husna, S. Prima, and N. Bukittinggi, “Analisis Faktor Yang Berhubungan Dengan Kejadian Stroke Pada Penderita Stroke di Ruang Rawat Inap C Lantai 1 dan 2 RSSN Bukittinggi Tahun 2016,” 2018.
I. Setyopranoto et al., “Prevalence of stroke and associated risk factors in sleman district of Yogyakarta Special Region, Indonesia,” Stroke Res Treat, vol. 2019, 2019, doi: 10.1155/2019/2642458.
A. Guzik and C. Bushnell, “Stroke Epidemiology and Risk Factor Management,” CONTINUUM Lifelong Learning in Neurology, vol. 23, no. 1, pp. 15–39, Feb. 2017, doi: 10.1212/CON.0000000000000416.
A. K. Boehme, C. Esenwa, M. S. V Elkind, M. Fisher, C. Iadecola, and R. Sacco, “Stroke Risk Factors, Genetics, and Prevention,” Circ Res, vol. 120, no. 3, pp. 472–495, Feb. 2017, doi: 10.1161/CIRCRESAHA.116.308398.
A. Alloubani, A. Saleh, and I. Abdelhafiz, “Hypertension and diabetes mellitus as a predictive risk factors for stroke,” Diabetes & Metabolic Syndrome: Clinical Research & Reviews, vol. 12, no. 4, pp. 577–584, Jul. 2018, doi: 10.1016/J.DSX.2018.03.009.
S. Juvela, J. Siironen, and J. Kuhmonen, “Hyperglycemia, excess weight, and history of hypertension as risk factors for poor outcome and cerebral infarction after aneurysmal subarachnoid hemorrhage,” J Neurosurg, vol. 102, no. 6, pp. 998–1003, Jun. 2005, doi: 10.3171/JNS.2005.102.6.0998.
J. Z. Willey et al., “Population attributable risks of hypertension and diabetes for cardiovascular disease and stroke in the Northern Manhattan study,” J Am Heart Assoc, vol. 3, no. 5, Sep. 2014, doi: 10.1161/JAHA.114.001106.
U. Singh, M. Hur, K. Dorman, and E. S. Wurtele, “MetaOmGraph: a workbench for interactive exploratory data analysis of large expression datasets,” Nucleic Acids Res, vol. 48, no. 4, pp. e23–e23, Feb. 2020, doi: 10.1093/NAR/GKZ1209.
L. Kharb, A. Tyagi, and D. Chahal, “Meta-analysis Review International Journal of Current Research and Review Exploratory Data Analysis on the Epidemiology of Coronavirus (COVID-19) Pandemic Outbreak,” Int J Cur Res Rev |, vol. 13, p. 12, 2021, doi: 10.31782/IJCRR.2021.SP170.
Z. Abedjan et al., “Detecting data errors,” Proceedings of the VLDB Endowment, vol. 9, no. 12, pp. 993–1004, Aug. 2016, doi: 10.14778/2994509.2994518.
E. S. Sintiya, A. Kusumawardana, M. A. Furqon, N. F. Najwa, A. C. Puspitaningrum, and A. S. Afrah, “SARIMA and Holt-Winters Seasonal Methods for Time Series Forecasting in Tuberculosis Case,” in 2020 4th International Conference on Vocational Education and Training (ICOVET), 2020, pp. 1–5.
K. M. Lang and T. D. Little, “Principled missing data treatments,” Prevention Science, vol. 19, no. 3, pp. 284–294, Apr. 2018, doi: 10.1007/S11121-016-0644-5/METRICS.
R. Kaur, K. Hambarde, R. George, A. Hussain, C. Gomkar, and S. Sonawani, “Stroke Prediction using Optimization and Exploratory Data Analysis,” 2022 IEEE Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation, IATMSI 2022, 2022, doi: 10.1109/IATMSI56455.2022.10119295.
S. E. Kemp, M. Ng, T. Hollowood, and J. Hort, “Introduction to Descriptive Analysis,” Descriptive Analysis in Sensory Evaluation, pp. 1–39, Dec. 2017, doi: 10.1002/9781118991657.CH1.
L. Wilkinson, “Visualizing Big Data Outliers Through Distributed Aggregation,” IEEE Trans Vis Comput Graph, vol. 24, no. 1, pp. 256–266, Jan. 2018, doi: 10.1109/TVCG.2017.2744685.
P. R. Kaundinya, K. Choudhary, and S. R. Kalidindi, “Machine learning approaches for feature engineering of the crystal structure: Application to the prediction of the formation energy of cubic compounds,” Phys Rev Mater, vol. 5, no. 6, p. 063802, Jun. 2021, doi: 10.1103/PHYSREVMATERIALS.5.063802.
T. Emmanuel, T. Maupong, D. Mpoeleng, T. Semong, B. Mphago, and O. Tabona, “A survey on missing data in machine learning,” Journal of Big Data 2021 8:1, vol. 8, no. 1, pp. 1–37, Oct. 2021, doi: 10.1186/S40537-021-00516-9.
N. Yadav and N. Badal, “Data preprocessing based on missing value and discretisation,” International Journal of Forensic Software Engineering, vol. 1, no. 2/3, p. 193, 2020, doi: 10.1504/IJFSE.2020.110584.
A. Chatterjee, M. W. Gerdes, and S. G. Martinez, “Statistical explorations and univariate timeseries analysis on COVID-19 datasets to understand the trend of disease spreading and death,” Sensors, vol. 20, no. 11, p. 3089, 2020.
M. Abzalov and M. Abzalov, “Exploratory data analysis,” Applied Mining Geology, pp. 207–219, 2016.
M. Valera, R. K. Walter, B. A. Bailey, and J. E. Castillo, “Machine learning based predictions of dissolved oxygen in a small coastal embayment,” J Mar Sci Eng, vol. 8, no. 12, p. 1007, 2020.
F. Thabtah, S. Hammoud, F. Kamalov, and A. Gonsalves, “Data imbalance in classification: Experimental evaluation,” Inf Sci (N Y), vol. 513, pp. 429–441, 2020.

References

E. de Robertis, O. Piazza, and G. Servillo, “The role of ventricular stroke work in daily clinical practice,” Minerva Anestesiol, vol. 76, no. 11, 2010, [Online]. Available: https://www.researchgate.net/publication/47384756

G. J. Del Zoppo and J. M. Hallenbeck, “Advances in the Vascular Pathophysiology of Ischemic Stroke,” Thromb Res, vol. 98, no. 3, pp. 73–81, May 2000, doi: 10.1016/S0049-3848(00)00218-8.

W. Johnson, O. Onuma, M. Owolabi, and S. Sachdev, “Stroke: A global response is needed,” Bulletin of the World Health Organization, vol. 94, no. 9. World Health Organization, pp. 634A-635A, Sep. 01, 2016. doi: 10.2471/BLT.16.181636.

A. K. A. Unnithan, J. M Das, and P. Mehta, “Hemorrhagic Stroke,” StatPearls, Jul. 2020, Accessed: Feb. 08, 2023. [Online]. Available: http://europepmc.org/books/NBK559173

S. D. Smith and C. J. Eskey, “Hemorrhagic Stroke,” Radiologic Clinics, vol. 49, no. 1, pp. 27–45, Jan. 2011, doi: 10.1016/J.RCL.2010.07.011.

S. K. Feske, “Ischemic Stroke,” Am J Med, vol. 134, no. 12, pp. 1457–1464, Dec. 2021, doi: 10.1016/J.AMJMED.2021.07.027.

S. A. Randolph, “Ischemic Stroke,” Workplace Health Saf, vol. 64, no. 9, p. 444, Sep. 2016, doi: 10.1177/2165079916665400.

W. Riyadina, J. Pradono, D. Kristanti, and Y. Turana, “Stroke in Indonesia: Risk factors and predispositions in young adults,” J Cardiovasc Dis Res, vol. 11, no. 2, pp. 178–183, 2020, doi: 10.31838/jcdr.2020.11.02.30.

N. R. Wati, E. Husna, S. Prima, and N. Bukittinggi, “Analisis Faktor Yang Berhubungan Dengan Kejadian Stroke Pada Penderita Stroke di Ruang Rawat Inap C Lantai 1 dan 2 RSSN Bukittinggi Tahun 2016,” 2018.

I. Setyopranoto et al., “Prevalence of stroke and associated risk factors in sleman district of Yogyakarta Special Region, Indonesia,” Stroke Res Treat, vol. 2019, 2019, doi: 10.1155/2019/2642458.

A. Guzik and C. Bushnell, “Stroke Epidemiology and Risk Factor Management,” CONTINUUM Lifelong Learning in Neurology, vol. 23, no. 1, pp. 15–39, Feb. 2017, doi: 10.1212/CON.0000000000000416.

A. K. Boehme, C. Esenwa, M. S. V Elkind, M. Fisher, C. Iadecola, and R. Sacco, “Stroke Risk Factors, Genetics, and Prevention,” Circ Res, vol. 120, no. 3, pp. 472–495, Feb. 2017, doi: 10.1161/CIRCRESAHA.116.308398.

A. Alloubani, A. Saleh, and I. Abdelhafiz, “Hypertension and diabetes mellitus as a predictive risk factors for stroke,” Diabetes & Metabolic Syndrome: Clinical Research & Reviews, vol. 12, no. 4, pp. 577–584, Jul. 2018, doi: 10.1016/J.DSX.2018.03.009.

S. Juvela, J. Siironen, and J. Kuhmonen, “Hyperglycemia, excess weight, and history of hypertension as risk factors for poor outcome and cerebral infarction after aneurysmal subarachnoid hemorrhage,” J Neurosurg, vol. 102, no. 6, pp. 998–1003, Jun. 2005, doi: 10.3171/JNS.2005.102.6.0998.

J. Z. Willey et al., “Population attributable risks of hypertension and diabetes for cardiovascular disease and stroke in the Northern Manhattan study,” J Am Heart Assoc, vol. 3, no. 5, Sep. 2014, doi: 10.1161/JAHA.114.001106.

U. Singh, M. Hur, K. Dorman, and E. S. Wurtele, “MetaOmGraph: a workbench for interactive exploratory data analysis of large expression datasets,” Nucleic Acids Res, vol. 48, no. 4, pp. e23–e23, Feb. 2020, doi: 10.1093/NAR/GKZ1209.

L. Kharb, A. Tyagi, and D. Chahal, “Meta-analysis Review International Journal of Current Research and Review Exploratory Data Analysis on the Epidemiology of Coronavirus (COVID-19) Pandemic Outbreak,” Int J Cur Res Rev |, vol. 13, p. 12, 2021, doi: 10.31782/IJCRR.2021.SP170.

Z. Abedjan et al., “Detecting data errors,” Proceedings of the VLDB Endowment, vol. 9, no. 12, pp. 993–1004, Aug. 2016, doi: 10.14778/2994509.2994518.

E. S. Sintiya, A. Kusumawardana, M. A. Furqon, N. F. Najwa, A. C. Puspitaningrum, and A. S. Afrah, “SARIMA and Holt-Winters Seasonal Methods for Time Series Forecasting in Tuberculosis Case,” in 2020 4th International Conference on Vocational Education and Training (ICOVET), 2020, pp. 1–5.

K. M. Lang and T. D. Little, “Principled missing data treatments,” Prevention Science, vol. 19, no. 3, pp. 284–294, Apr. 2018, doi: 10.1007/S11121-016-0644-5/METRICS.

R. Kaur, K. Hambarde, R. George, A. Hussain, C. Gomkar, and S. Sonawani, “Stroke Prediction using Optimization and Exploratory Data Analysis,” 2022 IEEE Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation, IATMSI 2022, 2022, doi: 10.1109/IATMSI56455.2022.10119295.

S. E. Kemp, M. Ng, T. Hollowood, and J. Hort, “Introduction to Descriptive Analysis,” Descriptive Analysis in Sensory Evaluation, pp. 1–39, Dec. 2017, doi: 10.1002/9781118991657.CH1.

L. Wilkinson, “Visualizing Big Data Outliers Through Distributed Aggregation,” IEEE Trans Vis Comput Graph, vol. 24, no. 1, pp. 256–266, Jan. 2018, doi: 10.1109/TVCG.2017.2744685.

P. R. Kaundinya, K. Choudhary, and S. R. Kalidindi, “Machine learning approaches for feature engineering of the crystal structure: Application to the prediction of the formation energy of cubic compounds,” Phys Rev Mater, vol. 5, no. 6, p. 063802, Jun. 2021, doi: 10.1103/PHYSREVMATERIALS.5.063802.

T. Emmanuel, T. Maupong, D. Mpoeleng, T. Semong, B. Mphago, and O. Tabona, “A survey on missing data in machine learning,” Journal of Big Data 2021 8:1, vol. 8, no. 1, pp. 1–37, Oct. 2021, doi: 10.1186/S40537-021-00516-9.

N. Yadav and N. Badal, “Data preprocessing based on missing value and discretisation,” International Journal of Forensic Software Engineering, vol. 1, no. 2/3, p. 193, 2020, doi: 10.1504/IJFSE.2020.110584.

A. Chatterjee, M. W. Gerdes, and S. G. Martinez, “Statistical explorations and univariate timeseries analysis on COVID-19 datasets to understand the trend of disease spreading and death,” Sensors, vol. 20, no. 11, p. 3089, 2020.

M. Abzalov and M. Abzalov, “Exploratory data analysis,” Applied Mining Geology, pp. 207–219, 2016.

M. Valera, R. K. Walter, B. A. Bailey, and J. E. Castillo, “Machine learning based predictions of dissolved oxygen in a small coastal embayment,” J Mar Sci Eng, vol. 8, no. 12, p. 1007, 2020.

F. Thabtah, S. Hammoud, F. Kamalov, and A. Gonsalves, “Data imbalance in classification: Experimental evaluation,” Inf Sci (N Y), vol. 513, pp. 429–441, 2020.

Critical Exploratory Data Analysis on Stroke Prediction Dataset

Article Sidebar

Downloads

Main Article Content

Abstract

Keywords

Article Details

Copyright info for authors

Muhammad Ariful Furqon, Universitas Jember

Nina Fadilah Najwa, Politeknik Caltex Riau

Mohammad Zarkasi, Universitas Jember

Priza Pandunata, Universitas Jember

Gama Wisnu Fajariyanto, Universitas Jember

References

References

Most read articles by the same author(s)