6533b883fe1ef96bd12dcb70

RESEARCH PRODUCT

Data from: GIbPSs: a toolkit for fast and accurate analyses of genotyping-by-sequencing data without a reference genome

A. HapkeD. Thiele

subject

medicine and health careMedicineRADpaired-end sequencingbioinformaticsLife sciences

description

Genotyping-by-sequencing (GBS) and related methods are increasingly used for studies of non-model organisms from population genetic to phylogenetic scales. We present GIbPSs, a new genotyping toolkit for the analysis of data from various protocols such as RAD, double-digest RAD, GBS, and two-enzyme GBS without a reference genome. GIbPSs can handle paired-end GBS data and is able to assign reads from both strands of a restriction fragment to the same locus. GIbPSs is most suitable for population genetic and phylogeographic analyses. It avoids genotyping errors due to indel variation by identifying and discarding affected loci. GIbPSs creates a genotype database that offers rich functionality for data filtering and export in numerous formats. We performed comparative analyses of simulated and real GBS data with GIbPSs and another program, pyRAD. This program accounts for indel variation by aligning homologous sequences. GIbPSs performed better than pyRAD in several aspects. It required much less computation time and displayed higher genotyping accuracy. GIbPSs retained smaller numbers of loci overall in analyses of real GBS data. It nevertheless delivered more complete genotype matrices with greater locus overlap between individuals and greater numbers of loci sampled in all individuals.

10.5061/dryad.q2b49https://doi.org/10.5061/dryad.q2b49