During the first period of the COVID-19 lockdown in Italy, an online survey was spread through social networks as part of the SEBCOV international study to investigate the impact of the pandemic on Italians’ everyday life. The final optional question of the survey was open-ended, soliciting additional comments and garnered a remarkably high response rate. This particularly rich source of spontaneous insights about Italians’ feelings in this challenging period was classified manually into positive, negative, and neutral sentiment. In the previous work, we analyzed the sentiment expressed in these open-ended response texts, obtaining interesting results regarding the sentiment of Italians during the COVID-19 lockdown. In this article, we use survey questions to predict the sentiment of the participants who did not express it in the free text. To do so, we use a random forest model, trained on data balanced via the Synthetic Minority Over-sampling TEchnique (SMOTE) algorithm. The sample is weighted with the inverse of the probability of answering the open question, previously estimated with a logistic model. The results obtained allow us not only to predict the sentiment of subjects who did not express it but also to observe which survey questions are most associated with the sentiment.

Prediction of Italians’ Sentiment During the First COVID-19 Lockdown Through a Weighted Random Forest Balanced with SMOTE Algorithm

Belloni, Pietro;Silan, Margherita
2024

Abstract

During the first period of the COVID-19 lockdown in Italy, an online survey was spread through social networks as part of the SEBCOV international study to investigate the impact of the pandemic on Italians’ everyday life. The final optional question of the survey was open-ended, soliciting additional comments and garnered a remarkably high response rate. This particularly rich source of spontaneous insights about Italians’ feelings in this challenging period was classified manually into positive, negative, and neutral sentiment. In the previous work, we analyzed the sentiment expressed in these open-ended response texts, obtaining interesting results regarding the sentiment of Italians during the COVID-19 lockdown. In this article, we use survey questions to predict the sentiment of the participants who did not express it in the free text. To do so, we use a random forest model, trained on data balanced via the Synthetic Minority Over-sampling TEchnique (SMOTE) algorithm. The sample is weighted with the inverse of the probability of answering the open question, previously estimated with a logistic model. The results obtained allow us not only to predict the sentiment of subjects who did not express it but also to observe which survey questions are most associated with the sentiment.
2024
Studies in Classification, Data Analysis, and Knowledge Organization
16th International Conference on the Statistical Analysis of Textual Data, JADT 2022
9783031559167
9783031559174
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3540852
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex ND
social impact