: The reaction of the scientific community against the COVID-19 pandemic has generated a huge (approx. 106 entries) dataset of genome sequences collected worldwide and spanning a relatively short time window. These unprecedented conditions together with the certain identification of the reference viral genome sequence allow for an original statistical study of mutations in the virus genome. In this paper, we compute the Shannon entropy of every sequence in the dataset as well as the relative entropy and the mutual information between the reference sequence and the mutated ones. These functions, originally developed in information theory, measure the information content of a sequence and allows us to study the random character of mutation mechanism in terms of its entropy and information gain or loss. We show that this approach allows us to set in new format known features of the SARS-CoV-2 mutation mechanism like the CT bias, but also to discover new optimal entropic properties of the mutation process in the sense that the virus mutation mechanism track closely theoretically computable lower bounds for the entropy decrease and the information transfer.
Optimal entropic properties of SARS-CoV-2 RNA sequences
Formentin, Marco;Favretti, Marco
2024
Abstract
: The reaction of the scientific community against the COVID-19 pandemic has generated a huge (approx. 106 entries) dataset of genome sequences collected worldwide and spanning a relatively short time window. These unprecedented conditions together with the certain identification of the reference viral genome sequence allow for an original statistical study of mutations in the virus genome. In this paper, we compute the Shannon entropy of every sequence in the dataset as well as the relative entropy and the mutual information between the reference sequence and the mutated ones. These functions, originally developed in information theory, measure the information content of a sequence and allows us to study the random character of mutation mechanism in terms of its entropy and information gain or loss. We show that this approach allows us to set in new format known features of the SARS-CoV-2 mutation mechanism like the CT bias, but also to discover new optimal entropic properties of the mutation process in the sense that the virus mutation mechanism track closely theoretically computable lower bounds for the entropy decrease and the information transfer.File | Dimensione | Formato | |
---|---|---|---|
rsos231369.pdf
accesso aperto
Tipologia:
Published (publisher's version)
Licenza:
Creative commons
Dimensione
2.07 MB
Formato
Adobe PDF
|
2.07 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.