This study analyses Evaluation Summary Reports (ESRs) of Marie Skłodowska-Curie Actions (MSCA) Individual and Postdoctoral Fellowships proposals at the University of Padua (Unipd), spanning Horizon 2020 and Horizon Europe from 2015 to 2022. The aim is to identify recurring strengths and weaknesses in the evaluation process, recognizing the most important and recurrent features of successful proposals. Nearly 400 ESRs were analysed by employing keyword extraction and correspondence analysis (CA) to map relationships between words and variables such as project success. While CA did not clearly distinguish between successful and unsuccessful proposals, machine learning was applied. The coordinates from CA were used to predict project outcomes. Comparisons were made with models using only textual features and those employing transformers, specifically, BERT contextualised embeddings. Results showed that using a Large Language Model (LLM) for text representation improved prediction accuracy compared to other methods. However, it highlighted challenges in interpretability and emphasised the need for explicable methods in the absence of words. Overall, the study provides valuable insights for refining support services and training at Unipd, highlighting the effectiveness of LLMs in prediction while acknowledging the interpretive challenges associated with their use.

Analysis of Marie Skłodowska-Curie Actions (MSCA) evaluations and models for predicting the success of proposals

Ilaria Rodella;Andrea Sciandra
;
Arjuna Tuzzi
2024

Abstract

This study analyses Evaluation Summary Reports (ESRs) of Marie Skłodowska-Curie Actions (MSCA) Individual and Postdoctoral Fellowships proposals at the University of Padua (Unipd), spanning Horizon 2020 and Horizon Europe from 2015 to 2022. The aim is to identify recurring strengths and weaknesses in the evaluation process, recognizing the most important and recurrent features of successful proposals. Nearly 400 ESRs were analysed by employing keyword extraction and correspondence analysis (CA) to map relationships between words and variables such as project success. While CA did not clearly distinguish between successful and unsuccessful proposals, machine learning was applied. The coordinates from CA were used to predict project outcomes. Comparisons were made with models using only textual features and those employing transformers, specifically, BERT contextualised embeddings. Results showed that using a Large Language Model (LLM) for text representation improved prediction accuracy compared to other methods. However, it highlighted challenges in interpretability and emphasised the need for explicable methods in the absence of words. Overall, the study provides valuable insights for refining support services and training at Unipd, highlighting the effectiveness of LLMs in prediction while acknowledging the interpretive challenges associated with their use.
2024
JADT 2024 Mots comptés, textes déchiffrés
JADT 2024 - 17th International Conference on Statistical Analysis of Textual Data
978-2-39061-471-5
978-2-39061-472-2
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3516669
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact