BackgroundThe genome of the largest known animal virus, the white spot syndrome virus (WSSV) responsible for huge economic losses and loss of employment in aquaculture, suffers from inconsistent annotation nomenclature. Novel genome sequence, circular genome and variable genome length led to nomenclature inconsistencies. Since vast knowledge has already accumulated in the past two decades with inconsistent nomenclature, the insights gained on a genome could not be easily extendable to other genomes. Therefore, the present study aims to perform comparative genomics studies in WSSV on uniform nomenclature.MethodsWe have combined the standard mummer tool with custom scripts to develop missing regions finder (MRF) that documents the missing genome regions and coding sequences in virus genomes in comparison to a reference genome and in its annotation nomenclature. The procedure was implemented as web tool and in command-line interface. Using MRF, we have documented the missing coding sequences in WSSV and explored their role in virulence through application of phylogenomics, machine learning models and homologous genes.ResultsWe have tabulated and depicted the missing genome regions, missing coding sequences and deletion hotspots in WSSV on a common annotation nomenclature and attempted to link them to virus virulence. It was observed that the ubiquitination, transcription regulation and nucleotide metabolism might be essentially required for WSSV pathogenesis; and the structural proteins, VP19, VP26 and VP28 are essential for virus assembly. Few minor structural proteins in WSSV would act as envelope glycoproteins. We have also demonstrated the advantage of MRF in providing detailed graphic/tabular output in less time and also in handling of low-complexity, repeat-rich and highly similar regions of the genomes using other virus cases.ConclusionsPathogenic virus research benefits from tools that could directly indicate the missing genomic regions and coding sequences between isolates/strains. In virus research, the analyses performed in this study provides an advancement to find the differences between genomes and to quickly identify the important coding sequences/genomes that require early attention from researchers. To conclude, the approach implemented in MRF complements similarity-based tools in comparative genomics involving large, highly-similar, length-varying and/or inconsistently annotated viral genomes.

MRF: a tool to overcome the barrier of inconsistent genome annotations and perform comparative genomics studies for the largest animal DNA virus

Peruzza L.
Writing – Review & Editing
;
2023

Abstract

BackgroundThe genome of the largest known animal virus, the white spot syndrome virus (WSSV) responsible for huge economic losses and loss of employment in aquaculture, suffers from inconsistent annotation nomenclature. Novel genome sequence, circular genome and variable genome length led to nomenclature inconsistencies. Since vast knowledge has already accumulated in the past two decades with inconsistent nomenclature, the insights gained on a genome could not be easily extendable to other genomes. Therefore, the present study aims to perform comparative genomics studies in WSSV on uniform nomenclature.MethodsWe have combined the standard mummer tool with custom scripts to develop missing regions finder (MRF) that documents the missing genome regions and coding sequences in virus genomes in comparison to a reference genome and in its annotation nomenclature. The procedure was implemented as web tool and in command-line interface. Using MRF, we have documented the missing coding sequences in WSSV and explored their role in virulence through application of phylogenomics, machine learning models and homologous genes.ResultsWe have tabulated and depicted the missing genome regions, missing coding sequences and deletion hotspots in WSSV on a common annotation nomenclature and attempted to link them to virus virulence. It was observed that the ubiquitination, transcription regulation and nucleotide metabolism might be essentially required for WSSV pathogenesis; and the structural proteins, VP19, VP26 and VP28 are essential for virus assembly. Few minor structural proteins in WSSV would act as envelope glycoproteins. We have also demonstrated the advantage of MRF in providing detailed graphic/tabular output in less time and also in handling of low-complexity, repeat-rich and highly similar regions of the genomes using other virus cases.ConclusionsPathogenic virus research benefits from tools that could directly indicate the missing genomic regions and coding sequences between isolates/strains. In virus research, the analyses performed in this study provides an advancement to find the differences between genomes and to quickly identify the important coding sequences/genomes that require early attention from researchers. To conclude, the approach implemented in MRF complements similarity-based tools in comparative genomics involving large, highly-similar, length-varying and/or inconsistently annotated viral genomes.
2023
File in questo prodotto:
File Dimensione Formato  
2023-Krishnan-MRF_ a tool to overcome the barr.pdf

accesso aperto

Tipologia: Published (publisher's version)
Licenza: Creative commons
Dimensione 3.96 MB
Formato Adobe PDF
3.96 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3478725
Citazioni
  • ???jsp.display-item.citation.pmc??? 0
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 1
  • OpenAlex ND
social impact