6533b7d4fe1ef96bd126352c

RESEARCH PRODUCT

What is in a lichen? A metagenomic approach to reconstruct the holo-genome of Umbilicaria pustulata

Ingo EbersbergerThomas HankelnImke SchmittGrande FdBastian Greshake TzovarasBastian Greshake TzovarasBastian Greshake TzovarasAnne BickerOtte JSeyed Yahya AnvarFrancisca H. I. D. Segers

subject

TrebouxiaAposymbioticbiologyMetagenomicsShotgun sequencingHorizontal gene transferComputational biologybiology.organism_classificationLichenGeneGenome

description

AbstractLichens are valuable models in symbiosis research and promising sources of biosynthetic genes for biotechnological applications. Most lichenized fungi grow slowly, resist aposymbiotic cultivation, and are generally poor candidates for experimentation. Obtaining contiguous, high quality genomes for such symbiotic communities is technically challenging. Here we present the first assembly of a lichen holo-genome from metagenomic whole genome shotgun data comprising both PacBio long reads and Illumina short reads. The nuclear genomes of the two primary components of the lichen symbiosis – the fungus Umbilicaria pustulata (33 Mbp) and the green alga Trebouxia sp. (53 Mbp) – were assembled at contiguities comparable to single-species assemblies. The analysis of the read coverage pattern revealed a relative cellular abundance of approximately 20:1 (fungus:alga). Gap-free, circular sequences for all organellar genomes were obtained. The community of lichen-associated bacteria is dominated by Acidobacteriaceae, and the two largest bacterial contigs belong to the genus Acidobacterium. Gene set analyses showed no evidence of horizontal gene transfer from algae or bacteria into the fungal genome. Our data suggest a lineage-specific loss of a putative gibberellin-20-oxidase in the fungus, a gene fusion in the fungal mitochondrion, and a relocation of an algal chloroplast gene to the algal nucleus. Major technical obstacles during reconstruction of the holo-genome were coverage differences among individual genomes surpassing three orders of magnitude. Moreover, we show that G/C-rich inverted repeats paired with non-random sequencing error in PacBio data can result in missing gene predictions. This likely poses a general problem for genome assemblies based on long reads.

https://doi.org/10.1101/810986