Intrinsic disorder (ID) in proteins is a complex phenomenon, encompassing a continuum from entirely disordered regions to structured domains with flexible segments. The absence of a ground truth for all forms of disorder, combined with the possibility of structural transitions between ordered and disordered states under specific conditions, makes accurate prediction of ID especially challenging. The Critical Assessment of Protein Intrinsic Disorder (CAID) evaluates ID prediction methods using diverse benchmarks derived from DisProt, a manually curated database of experimentally validated annotations. This paper presents findings from the third (CAID3), in which 24 new methods were assessed along with the predictors from previous rounds. Compared to CAID2, the top-performing methods in CAID3 demonstrated significant gains in average precision: over 31% improvement in predicting linker regions, and 15% in disorder prediction. This round introduces a new binding sub-challenge focused on identifying binding regions within known IDR boundaries. The results indicate that this task remains challenging, highlighting the potential for improvement. The top-performing methods in CAID3 are mostly new and commonly used embeddings from protein language models (pLMs), underscoring the growing impact of pLMs in tackling the complexities of disordered proteins and advancing ID prediction.
Critical Assessment of Protein Intrinsic Disorder Round 3 ‐ Predicting Disorder in the Era of Protein Language Models
Mehdiabadi, Mahta;Del Conte, Alessio;Nugnes, Maria Victoria;Aspromonte, Maria Cristina;Tosatto, Silvio C. E.
;Piovesan, Damiano
2025
Abstract
Intrinsic disorder (ID) in proteins is a complex phenomenon, encompassing a continuum from entirely disordered regions to structured domains with flexible segments. The absence of a ground truth for all forms of disorder, combined with the possibility of structural transitions between ordered and disordered states under specific conditions, makes accurate prediction of ID especially challenging. The Critical Assessment of Protein Intrinsic Disorder (CAID) evaluates ID prediction methods using diverse benchmarks derived from DisProt, a manually curated database of experimentally validated annotations. This paper presents findings from the third (CAID3), in which 24 new methods were assessed along with the predictors from previous rounds. Compared to CAID2, the top-performing methods in CAID3 demonstrated significant gains in average precision: over 31% improvement in predicting linker regions, and 15% in disorder prediction. This round introduces a new binding sub-challenge focused on identifying binding regions within known IDR boundaries. The results indicate that this task remains challenging, highlighting the potential for improvement. The top-performing methods in CAID3 are mostly new and commonly used embeddings from protein language models (pLMs), underscoring the growing impact of pLMs in tackling the complexities of disordered proteins and advancing ID prediction.File | Dimensione | Formato | |
---|---|---|---|
2025_caid3.pdf
accesso aperto
Tipologia:
Published (Publisher's Version of Record)
Licenza:
Creative commons
Dimensione
1.14 MB
Formato
Adobe PDF
|
1.14 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.