Disease resistance represents a key trait for breeding programs in aquaculture species. Here we re-analysed 2bRAD sequence data from two experimental challenges of gilthead sea bream with Photobacterium damsealae piscicida. Using a high quality reference genome, we carried out variant calling and data imputation with Beagle to obtain a large set of SNPs (80,744). This allowed the identification of eight novel QTLs for resistance to photobacteriosis across different chromosomes and revealed a highly polygenic genetic architecture. Bayesian regression approaches and machine learning methods (support vector machines and linear bagging) were compared to evaluate relative performance to classify susceptible-resistant individuals. Both data sets showed higher Matthew Correlation Coefficient (MCC) and accuracy values for machine learning methods, particularly linear bagging, with 20-70 % increase in prediction performance. Overall, machine learning methods should be explored in parallel with parametric regression approaches to increase the chances of highly effective genomic prediction.
Data imputation and machine learning improve association analysis and genomic prediction for resistance to fish photobacteriosis in the gilthead sea bream
Bargelloni, L;Tassiello, O;Babbucci, M;Ferraresso, S;Franch, R;Montanucci, L;Carnier, P
2021
Abstract
Disease resistance represents a key trait for breeding programs in aquaculture species. Here we re-analysed 2bRAD sequence data from two experimental challenges of gilthead sea bream with Photobacterium damsealae piscicida. Using a high quality reference genome, we carried out variant calling and data imputation with Beagle to obtain a large set of SNPs (80,744). This allowed the identification of eight novel QTLs for resistance to photobacteriosis across different chromosomes and revealed a highly polygenic genetic architecture. Bayesian regression approaches and machine learning methods (support vector machines and linear bagging) were compared to evaluate relative performance to classify susceptible-resistant individuals. Both data sets showed higher Matthew Correlation Coefficient (MCC) and accuracy values for machine learning methods, particularly linear bagging, with 20-70 % increase in prediction performance. Overall, machine learning methods should be explored in parallel with parametric regression approaches to increase the chances of highly effective genomic prediction.File | Dimensione | Formato | |
---|---|---|---|
1-s2.0-S2352513421000776-main.pdf
accesso aperto
Tipologia:
Published (publisher's version)
Licenza:
Creative commons
Dimensione
2.41 MB
Formato
Adobe PDF
|
2.41 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.