Critical assessment of protein intrinsic disorder prediction

Necci, M.; Piovesan, D.; Hoque, M. T.; Walsh, I.; Iqbal, S.; Vendruscolo, M.; Sormanni, P.; Wang, C.; Raimondi, D.; Sharma, R.; Zhou, Y.; Litfin, T.; Galzitskaya, O. V.; Lobanov, M. Y.; Vranken, W.; Wallner, B.; Mirabello, C.; Malhis, N.; Dosztanyi, Z.; Erdos, G.; Meszaros, B.; Gao, J.; Wang, K.; Hu, G.; Wu, Z.; Sharma, A.; Hanson, J.; Paliwal, K.; Callebaut, I.; Bitard-Feildel, T.; Orlando, G.; Peng, Z.; Xu, J.; Wang, S.; Jones, D. T.; Cozzetto, D.; Meng, F.; Yan, J.; Gsponer, J.; Cheng, J.; Wu, T.; Kurgan, L.; Promponas, V. J.; Tamana, S.; Marino-Buslje, C.; Martinez-Perez, E.; Chasapi, A.; Ouzounis, C.; Dunker, A. K.; Kajava, A. V.; Leclercq, J. Y.; Aykac-Fas, B.; Lambrughi, M.; Maiani, E.; Papaleo, E.; Chemes, L. B.; Alvarez, L.; Gonzalez-Foutel, N. S.; Iglesias, V.; Pujols, J.; Ventura, S.; Palopoli, N.; Benitez, G. I.; Parisi, G.; Bassot, C.; Elofsson, A.; Govindarajan, S.; Lamb, J.; Salvatore, M.; Hatos, A.; Monzon, A. M.; Bevilacqua, M.; Micetic, I.; Minervini, G.; Paladin, L.; Quaglia, F.; Leonardi, E.; Davey, N.; Horvath, T.; Kovacs, O. P.; Murvai, N.; Pancsa, R.; Schad, E.; Szabo, B.; Tantos, A.; Macedo-Ribeiro, S.; Manso, J. A.; Pereira, P. J. B.; Davidovic, R.; Veljkovic, N.; Hajdu-Soltesz, B.; Pajkos, M.; Szaniszlo, T.; Guharoy, M.; Lazar, T.; Macossay-Castillo, M.; Tompa, P.; Tosatto, S. C. E.

doi:10.1038/s41592-021-01117-3

Intrinsically disordered proteins, defying the traditional protein structure–function paradigm, are a challenge to study experimentally. Because a large part of our knowledge rests on computational predictions, it is crucial that their accuracy is high. The Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiment was established as a community-based blind test to determine the state of the art in prediction of intrinsically disordered regions and the subset of residues involved in binding. A total of 43 methods were evaluated on a dataset of 646 proteins from DisProt. The best methods use deep learning techniques and notably outperform physicochemical methods. The top disorder predictor has Fmax = 0.483 on the full dataset and Fmax = 0.792 following filtering out of bona fide structured regions. Disordered binding regions remain hard to predict, with Fmax = 0.231. Interestingly, computing times among methods can vary by up to four orders of magnitude.