Critical Assessment of Protein Intrinsic Disorder Round 3 - Predicting Disorder in the Era of Protein Language Models

Mehdiabadi, Mahta; Del Conte, Alessio; Nugnes, Maria Victoria; Aspromonte, Maria Cristina; Tosatto, Silvio C. E.; Piovesan, Damiano

doi:10.1002/prot.70045

Intrinsic disorder (ID) in proteins is a complex phenomenon, encompassing a continuum from entirely disordered regions to structured domains with flexible segments. The absence of a ground truth for all forms of disorder, combined with the possibility of structural transitions between ordered and disordered states under specific conditions, makes accurate prediction of ID especially challenging. The Critical Assessment of Protein Intrinsic Disorder (CAID) evaluates ID prediction methods using diverse benchmarks derived from DisProt, a manually curated database of experimentally validated annotations. This paper presents findings from the third (CAID3), in which 24 new methods were assessed along with the predictors from previous rounds. Compared to CAID2, the top-performing methods in CAID3 demonstrated significant gains in average precision: over 31% improvement in predicting linker regions, and 15% in disorder prediction. This round introduces a new binding sub-challenge focused on identifying binding regions within known IDR boundaries. The results indicate that this task remains challenging, highlighting the potential for improvement. The top-performing methods in CAID3 are mostly new and commonly used embeddings from protein language models (pLMs), underscoring the growing impact of pLMs in tackling the complexities of disordered proteins and advancing ID prediction.