6533b7d2fe1ef96bd125f6e3

RESEARCH PRODUCT

EnvDB, a database for describing the environmental distribution of prokaryotic taxa.

Javier TamamesAndrés MoyaMiguel Pignatelli

subject

geographygeography.geographical_feature_categoryDatabasebusiness.industryDistribution (economics)Sample (statistics)WetlandBiologycomputer.software_genreAgricultural and Biological Sciences (miscellaneous)TaxonMetagenomicsAbundance (ecology)GenBankTaxonomic rankbusinesscomputerEcology Evolution Behavior and Systematics

description

Summary EnvDB is a database that classifies the environmental samples and their associated 16S rDNA sequences currently stored in GenBank. The samples were cat- egorized in a three-level, hierarchical classification of media: the five upper levels (terrestrial, aquatic, thermal, host-associated and other) are further decomposed in 20 intermediate (such as marine, marine sediments, freshwater, soil, gut, etc.) and 47 lower levels (for instance, soil is further decomposed in forest, agricultural, wetlands, grasslands, tropical, arid, etc.). Each sample was also characterized with nine environmental features: polluted, diseased (for clinical samples), acidic, alkaline, hot environment, cold environment, saline, anoxic and restricted (when the study is focused only in particular taxonomic groups). The classification of samples was aided by text-mining techniques, complemented with careful curation and completion by human experts. EnvDB currently includes 359 928 sequences from 3502 samples. The sequences were clustered at several identity levels to obtain operative taxonomic units (OTUs). Sequences and OTUs have been taxonomi- cally assigned to the maximum possible resolution by different procedures. The user can obtain information about sequences, OTUs, samples and environments, combining these tables using a flexible querying system that allows generating very diverse queries. Thus, the user can easily inspect the presence and abundance of particular taxa in particular samples and environments. The database also allows the users to run analyses with their own data: users can input their sequences and find the closest sequences or samples in the database. EnvDB can be accessed in the web address http://metagenomics.uv.es/envDB.

10.1111/j.1758-2229.2009.00030.xhttps://pubmed.ncbi.nlm.nih.gov/23765793