Application of Machine Learning to Analyze the Risk Factors of Stroke

Karan Keerthy1

1Briarcliff High School, 444 Pleasantville Rd, Briarcliff Manor, NY 10510, United States.

Abstract: In 2019, stroke was the second leading cause of death and disability-adjusted life years, globally. 80\% of second strokes have been demonstrated to be preventable by using medication, maintaining a strict diet, and engaging in physical activity. Considering its debilitating effects, early detection of stroke is an important area of interest. Thus, this study aims to identify key risk factors for stroke, to encourage proper monitoring and lifestyle changes that can prevent stroke onset. To determine the most significant risk factors, a machine learning (ML) based artificial neural network model was derived from the Keras library in Python. With an accuracy of 92\%, the model was then applied to different combinations of risk factors using the SelectKBest function. At first, a feature selection strategy using chi-squared scoring was used to select the K best features from a combination of risk factors. These prominent features were then used to train the ANN to predict presence of stroke. The accuracy of the trained model was presented in terms of area under receiver operating characteristic curve (AUC). Average glucose level, age, and BMI were determined to be the most predictive risk factors of stroke. ROC analysis yielded an AUC value of 0.73, which indicates good test performance of the model's determination of the aforementioned most significant combination of risk factors. In addition to confirming the significance of frequently reported risk factors in the existing literature such as average glucose level, age, BMI, smoking status, and hypertension, the model identifies occupation as the next most predictive risk factor for stroke, surpassing even heart disease. Thus, with information on patients, preventative measures can be given based on previously unidentified risk factors like occupation to hopefully avoid the long-term impacts of a potential stroke.
Keywords: Machine Learning, Risk Factors, Stroke.

Cite this article as: Karan Keerthy, Application of Machine Learning to Analyze the Risk Factors of Stroke, Int. J. Math. And Appl., vol. 10, no. 1, 2022, pp. 100-109.

  1. E. Alpaydin, Introduction to machine learning, 2nd Edition, (2010).
  2. A. Geron, Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems, O'Reilly Media, (2019).
  3. G. Zaccone and M. R. Karim, Deep learning with tensorFlow: Explore neural networks and build intelligent systems with python, Packt Publishing Ltd, (2018).
  4. A. C. Muller and S. Guido, Introduction to machine learning with Python: a guide for data scientists, O'Reilly Media, (2016).
  5. L. P. Chen, Mehryar Mohri, Afshin Rostamizadeh and Ameet Talwalkar, Foundations of machine learning, (2019).
  6. K. P. Murphy, Machine learning: a probabilistic perspective, MIT Press, (2012).
  7. W. Wang, M. Kiik, N. Peek, V. Curcin, I. J. Marshall, A. G. Rudd and B. Bray, A systematic review of machine learning models for predicting outcomes of stroke with structured data, PloS one, 15(6)(2020), e0234722.
  8. A. Khosla, Y. Cao, C. C. Y. Lin, H. K. Chiu, J. Hu and H. Lee, An integrated machine learning approach to stroke prediction, Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, (2010), 183-192.
  9. H. Kamal, V. Lopez and S. A. Sheth, Machine learning in acute ischemic stroke neuroimaging, Frontiers in neurology, 9(2018), 945.
  10. R. Feng, M. Badgeley, J. Mocco and E. K. Oermann, Deep learning guided stroke management: a review of clinical applications, Journal of Neurointerventional Surgery, 10(4)(2018), 358-362.
  11. E. J. Lee, Y. H. Kim, N. Kim and D. W. Kang, Deep into the brain: artificial intelligence in stroke imaging, Journal of Stroke, 19(3)(2017), 277.
  12. H. Gardener, R. L. Sacco, T. Rundek, V. Battistella, Y. K. Cheung and M. S. Elkind, Race and ethnic disparities in stroke incidence in the Northern Manhattan Study, Stroke, 51(4)(2020), 1064-1069.
  13. S. Sealy-Jefferson, J. J. Wing, B. N. Sanchez, D. L. Brown, W. J. Meurer, M. A. Smith and L. D. Lisabeth, Age-and ethnic-specific sex differences in stroke risk, Gender Medicine, 9(2)(2012), 121-128.
  14. G. Kumar and R. Patnaik, Inhibition of Gelatinases (MMP-2 and MMP-9) by Withania somnifera Phytochemicals Confers Neuroprotection in Stroke: An In Silico Analysis, Interdiscip Sci Comput Life Sci., 10(2018), 722-733.
  15. Fedesoriano, Healthcare-dataset-stroke-data.csv, version 1,
  16. Pedregosa, Scikit-learn: Machine Learning in Python, JMLR, 12(2011), 2825-2830.