6533b827fe1ef96bd12865e1

RESEARCH PRODUCT

Establishing gene models from the Pinus pinaster genome using gene capture and BAC sequencing

Francisco M. CánovasPedro Seoane-zonjicIsabel ArrillagaRocío BautistaRafael A. CañasJosefa Gómez-maldonadoConcepción ÁVilaM. Gonzalo ClarosNoe Fernandez-pozo

subject

0301 basic medicineChromosomes Artificial BacterialDNA PlantGenomicsBiologyMaritime pineGenome03 medical and health sciencesGene captureGeneticsGene familyGenomic libraryGeneBACGene LibraryGeneticsModels GeneticExonsGenomicsSequence Analysis DNAPinusIntronsGene structurePromoter studies030104 developmental biologyBioinformatic pipelineGene model constructDNA microarrayFunctional genomicsGenome PlantReference genomeResearch ArticleBiotechnology

description

Background In the era of DNA throughput sequencing, assembling and understanding gymnosperm mega-genomes remains a challenge. Although drafts of three conifer genomes have recently been published, this number is too low to understand the full complexity of conifer genomes. Using techniques focused on specific genes, gene models can be established that can aid in the assembly of gene-rich regions, and this information can be used to compare genomes and understand functional evolution. Results In this study, gene capture technology combined with BAC isolation and sequencing was used as an experimental approach to establish de novo gene structures without a reference genome. Probes were designed for 866 maritime pine transcripts to sequence genes captured from genomic DNA. The gene models were constructed using GeneAssembler, a new bioinformatic pipeline, which reconstructed over 82 % of the gene structures, and a high proportion (85 %) of the captured gene models contained sequences from the promoter regulatory region. In a parallel experiment, the P. pinaster BAC library was screened to isolate clones containing genes whose cDNA sequence were already available. BAC clones containing the asparagine synthetase, sucrose synthase and xyloglucan endotransglycosylase gene sequences were isolated and used in this study. The gene models derived from the gene capture approach were compared with the genomic sequences derived from the BAC clones. This combined approach is a particularly efficient way to capture the genomic structures of gene families with a small number of members. Conclusions The experimental approach used in this study is a valuable combined technique to study genomic gene structures in species for which a reference genome is unavailable. It can be used to establish exon/intron boundaries in unknown gene structures, to reconstruct incomplete genes and to obtain promoter sequences that can be used for transcriptional studies. A bioinformatics algorithm (GeneAssembler) is also provided as a Ruby gem for this class of analyses. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2490-z) contains supplementary material, which is available to authorized users.

10.1186/s12864-016-2490-zhttp://dx.doi.org/10.1186/s12864-016-2490-z