Next generation sequencing technology has considerably improved over the past few years, making easier and more affordable the shotgun sequencing approach. Short reads are particularly popular as they are very easy and cheap to produce. On the other hand, their assembly results in the generation of a vast number of relatively short contigs that would require suitable physical maps and scaffolding procedures to be further assembled in a draft genomic sequence. Unfortunately, physical maps are still very difficult to produce and take little advantage of the next generation sequencing technology. The aim of our project is to investigate the possibility to overcome this problem. The organism we chose as field test is Nannochloropsis gaditana a unicellular algae that could be very useful in biofuel production, because of its capacity to accumulate high amount of lipids under particular growth conditions, and because its genome is relatively small (32Mb). We obtained a BAC library of more than 11,000 clones with an average insert size of 120 kb. This BAC library is the starting point of our method: by selecting random clones from the library we produced 32 pools, each representing about 40% of the genome. Each pool was fragmented by sonication and sequenced with a SOLiD 5500XL. A high-coverage of the genome was also produced by an independent shotgun project. Non-repeated sequences (tags) can be identified taking into consideration their coverage and each of them can be considered as a genetic marker. The presence or absence of a tag in each pool can then be analyzed: the more two tags are close in the genome, the more they are expected to be present together in each pool. Analyzing the profiles of presence/absence of each tag in each pool it is possible to sort the tags according to their relative position, producing a high density and high quality physical map. We are currently in the process of analyzing 32 pools of 96 BAC clones randomly fragmented. A computer simulation indicated that 32 pools should be sufficient to produce a complete physical map of a genome equivalent to that of N. gaditana. If necessary, we could easily produce more pools. Larger genomes could also benefit from this approach, although the number of BACs per pool and the number of pools must be adjusted accordingly. The perspective of producing physical maps at a low cost is very important for the improvement of de novo assembly, and in particular for the scaffolding procedures that are now the limiting step of the entire process.
Genome Physical Mapping with Next Generation Sequencing
DE PASCALE, FABIO;SCHIAVON, RICCARDO;VEZZI, ALESSANDRO;VALLE, GIORGIO
2012
Abstract
Next generation sequencing technology has considerably improved over the past few years, making easier and more affordable the shotgun sequencing approach. Short reads are particularly popular as they are very easy and cheap to produce. On the other hand, their assembly results in the generation of a vast number of relatively short contigs that would require suitable physical maps and scaffolding procedures to be further assembled in a draft genomic sequence. Unfortunately, physical maps are still very difficult to produce and take little advantage of the next generation sequencing technology. The aim of our project is to investigate the possibility to overcome this problem. The organism we chose as field test is Nannochloropsis gaditana a unicellular algae that could be very useful in biofuel production, because of its capacity to accumulate high amount of lipids under particular growth conditions, and because its genome is relatively small (32Mb). We obtained a BAC library of more than 11,000 clones with an average insert size of 120 kb. This BAC library is the starting point of our method: by selecting random clones from the library we produced 32 pools, each representing about 40% of the genome. Each pool was fragmented by sonication and sequenced with a SOLiD 5500XL. A high-coverage of the genome was also produced by an independent shotgun project. Non-repeated sequences (tags) can be identified taking into consideration their coverage and each of them can be considered as a genetic marker. The presence or absence of a tag in each pool can then be analyzed: the more two tags are close in the genome, the more they are expected to be present together in each pool. Analyzing the profiles of presence/absence of each tag in each pool it is possible to sort the tags according to their relative position, producing a high density and high quality physical map. We are currently in the process of analyzing 32 pools of 96 BAC clones randomly fragmented. A computer simulation indicated that 32 pools should be sufficient to produce a complete physical map of a genome equivalent to that of N. gaditana. If necessary, we could easily produce more pools. Larger genomes could also benefit from this approach, although the number of BACs per pool and the number of pools must be adjusted accordingly. The perspective of producing physical maps at a low cost is very important for the improvement of de novo assembly, and in particular for the scaffolding procedures that are now the limiting step of the entire process.Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.