Prediction of Italians’ Sentiment During the First COVID-19 Lockdown Through a Weighted Random Forest Balanced with SMOTE Algorithm

Belloni, Pietro; Silan, Margherita

doi:10.1007/978-3-031-55917-4_18

During the first period of the COVID-19 lockdown in Italy, an online survey was spread through social networks as part of the SEBCOV international study to investigate the impact of the pandemic on Italians’ everyday life. The final optional question of the survey was open-ended, soliciting additional comments and garnered a remarkably high response rate. This particularly rich source of spontaneous insights about Italians’ feelings in this challenging period was classified manually into positive, negative, and neutral sentiment. In the previous work, we analyzed the sentiment expressed in these open-ended response texts, obtaining interesting results regarding the sentiment of Italians during the COVID-19 lockdown. In this article, we use survey questions to predict the sentiment of the participants who did not express it in the free text. To do so, we use a random forest model, trained on data balanced via the Synthetic Minority Over-sampling TEchnique (SMOTE) algorithm. The sample is weighted with the inverse of the probability of answering the open question, previously estimated with a logistic model. The results obtained allow us not only to predict the sentiment of subjects who did not express it but also to observe which survey questions are most associated with the sentiment.