This paper investigates the decision making process aided by machine learning for biomedical problems and how to improve it through meta assessments of the most relevant features. Classification algorithms are usually trained and exploited with high dimensional datasets (i.e., with an extremely large number of features), which is inefficient and costly. It would be beneficial to identify the most meaningful features that contribute the most to assigning a category to a subject, and in particular, diagnosing a pathological condition. A helpful support can come from cooperative game theory, through the computation of the Shapley value, an indicator of desirable properties according to which the players, in our case the input features, can be ranked. We apply such a framework to a supervised machine learning scenario of a random forest tree classifier applied to heart disease detection. From a publicly available dataset, we identify the most relevant features that can affect the decision, thus obtaining practical guidelines for a compact yet efficient description based on an analytical rationale.

Shapley Value as an Aid to Biomedical Machine Learning: a Heart Disease Dataset Analysis

Cisotto G.;Gindullina E.;Badia L.
2022

Abstract

This paper investigates the decision making process aided by machine learning for biomedical problems and how to improve it through meta assessments of the most relevant features. Classification algorithms are usually trained and exploited with high dimensional datasets (i.e., with an extremely large number of features), which is inefficient and costly. It would be beneficial to identify the most meaningful features that contribute the most to assigning a category to a subject, and in particular, diagnosing a pathological condition. A helpful support can come from cooperative game theory, through the computation of the Shapley value, an indicator of desirable properties according to which the players, in our case the input features, can be ranked. We apply such a framework to a supervised machine learning scenario of a random forest tree classifier applied to heart disease detection. From a publicly available dataset, we identify the most relevant features that can affect the decision, thus obtaining practical guidelines for a compact yet efficient description based on an analytical rationale.
2022
Proceedings - 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022
22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022
978-1-6654-9956-9
File in questo prodotto:
File Dimensione Formato  
2022_05_AI4Health.pdf

accesso aperto

Tipologia: Postprint (accepted version)
Licenza: Accesso gratuito
Dimensione 338.55 kB
Formato Adobe PDF
338.55 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3454919
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 12
  • ???jsp.display-item.citation.isi??? 5
  • OpenAlex ND
social impact