6533b830fe1ef96bd129664d

RESEARCH PRODUCT

DySC: software for greedy clustering of 16S rRNA reads.

Stefan KramerBertil SchmidtZejun Zheng

subject

Statistics and ProbabilityComputer sciencebusiness.industrySequence Analysis RNA16S ribosomal RNAcomputer.software_genreBiochemistryComputer Science ApplicationsComputational MathematicsSoftwareComputational Theory and MathematicsRNA Ribosomal 16SCluster AnalysisMetagenomeData miningCluster analysisbusinessMolecular BiologycomputerSoftware

description

Abstract Summary: Pyrosequencing technologies are frequently used for sequencing the 16S ribosomal RNA marker gene for profiling microbial communities. Clustering of the produced reads is an important but time-consuming task. We present Dynamic Seed-based Clustering (DySC), a new tool based on the greedy clustering approach that uses a dynamic seeding strategy. Evaluations based on the normalized mutual information (NMI) criterion show that DySC produces higher quality clusters than UCLUST and CD-HIT at a comparable runtime. Availability and implementation: DySC, implemented in C, is available at http://code.google.com/p/dysc/ under GNU GPL license. Contact:  bertil.schmidt@uni-mainz.de Supplementary Information:  Supplementary data are available at Bioinformatics online.

10.1093/bioinformatics/bts355https://pubmed.ncbi.nlm.nih.gov/22730435