6533b85dfe1ef96bd12bea48

RESEARCH PRODUCT

panISa: ab initio detection of insertion sequences in bacterial genomes from short read sequence data.

Didier HocquetCharlotte CouchoudAlexandre MeunierChristophe GuyeuxPanisa TreepongBenoît Valot

subject

0301 basic medicineStatistics and ProbabilityLineage (genetic)Computer scienceAb initioComputational biologyBacterial genome size[INFO.INFO-SE]Computer Science [cs]/Software Engineering [cs.SE]BiochemistryGenome[INFO.INFO-IU]Computer Science [cs]/Ubiquitous Computing03 medical and health sciences[INFO.INFO-CR]Computer Science [cs]/Cryptography and Security [cs.CR][SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry Molecular Biology/Genomics [q-bio.GN]Insertion sequenceMolecular BiologyGenomic organizationHigh-Throughput Nucleotide SequencingSequence Analysis DNA[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM][SDV.MP.BAC]Life Sciences [q-bio]/Microbiology and Parasitology/BacteriologyPipeline (software)[INFO.INFO-MO]Computer Science [cs]/Modeling and SimulationComputer Science ApplicationsComputational Mathematics030104 developmental biologyComputational Theory and Mathematics[INFO.INFO-MA]Computer Science [cs]/Multiagent Systems [cs.MA]DNA Transposable Elements[INFO.INFO-ET]Computer Science [cs]/Emerging Technologies [cs.ET][INFO.INFO-DC]Computer Science [cs]/Distributed Parallel and Cluster Computing [cs.DC]Genome BacterialSoftware

description

Abstract Motivation The advent of next-generation sequencing has boosted the analysis of bacterial genome evolution. Insertion sequence (IS) elements play a key role in prokaryotic genome organization and evolution, but their repetitions in genomes complicate their detection from short-read data. Results PanISa is a software pipeline that identifies IS insertions ab initio in bacterial genomes from short-read data. It is a highly sensitive and precise tool based on the detection of read-mapping patterns at the insertion site. PanISa performs better than existing IS detection systems as it is based on a database-free approach. We applied it to a high-risk clone lineage of the pathogenic species Pseudomonas aeruginosa, and report 43 insertions of five different ISs (among which three are new) and a burst of ISPa1635 in a hypermutator isolate. Availability and implementation PanISa is implemented in Python and released as an open source software (GPL3) at https://github.com/bvalot/panISa. Supplementary information Supplementary data are available at Bioinformatics online.

10.1093/bioinformatics/bty479https://pubmed.ncbi.nlm.nih.gov/29931098