The study of microbial communities is an emerging field that is revolutionizing many disciplines from ecology to medicine. The major problem when analyzing a metagenomic sample is to taxonomic annotate its reads in order to identify the species in the sample and their relative abundance. Many tools have been developed in the recent years, however the performance in terms of precision and speed are not always adequate for these very large datasets. In this work we present SKraken an efficient approach to accurately classify metagenomic reads against a set of reference genomes, e.g. the NCBI/RefSeq database. SKraken is based on k-mers statistics combined with the taxonomic tree. Given a set of target genomes SKraken is able to detect the most representative k-mers for each species, filtering out uninformative k-mers. The classification performance on several synthetic and real metagenomics datasets shows that SKraken achieves in most cases the best performances in terms of precision and ...

SKraken: Fast and sensitive classification of short metagenomic reads based on filtering uninformative k-mers

COMIN, MATTEO;MARCHIORI, DAVIDE
2017

Abstract

The study of microbial communities is an emerging field that is revolutionizing many disciplines from ecology to medicine. The major problem when analyzing a metagenomic sample is to taxonomic annotate its reads in order to identify the species in the sample and their relative abundance. Many tools have been developed in the recent years, however the performance in terms of precision and speed are not always adequate for these very large datasets. In this work we present SKraken an efficient approach to accurately classify metagenomic reads against a set of reference genomes, e.g. the NCBI/RefSeq database. SKraken is based on k-mers statistics combined with the taxonomic tree. Given a set of target genomes SKraken is able to detect the most representative k-mers for each species, filtering out uninformative k-mers. The classification performance on several synthetic and real metagenomics datasets shows that SKraken achieves in most cases the best performances in terms of precision and ...
2017
Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS,
8th International Conference on Bioinformatics Models, Methods and Algorithms, BIOINFORMATICS 2017 - Part of 10th International Joint Conference on Biomedical Engineering Systems and Technologies, BIOSTEC 2017
9789897582141
File in questo prodotto:
File Dimensione Formato  
61505.pdf

accesso aperto

Tipologia: Published (Publisher's Version of Record)
Licenza: Creative commons
Dimensione 828.6 kB
Formato Adobe PDF
828.6 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11577/3227449
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 11
  • ???jsp.display-item.citation.isi??? 13
  • OpenAlex ND
social impact