Search results for "Software"

showing 10 items of 7396 documents

The Power of Word-Frequency Based Alignment-Free Functions: a Comprehensive Large-Scale Experimental Analysis

2021

Abstract Motivation Alignment-free (AF) distance/similarity functions are a key tool for sequence analysis. Experimental studies on real datasets abound and, to some extent, there are also studies regarding their control of false positive rate (Type I error). However, assessment of their power, i.e. their ability to identify true similarity, has been limited to some members of the D2 family. The corresponding experimental studies have concentrated on short sequences, a scenario no longer adequate for current applications, where sequence lengths may vary considerably. Such a State of the Art is methodologically problematic, since information regarding a key feature such as power is either mi…

Statistics and ProbabilitySequenceSimilarity (geometry)Settore INF/01 - Informaticasequence analysisComputer sciencepower statisticsAlignment-Free Genomic Analysis Big Data Software Platforms Bioinformatics AlgorithmsScale (descriptive set theory)Function (mathematics)computer.software_genreBiochemistryComputer Science ApplicationsSet (abstract data type)Computational MathematicsRange (mathematics)Computational Theory and Mathematicssequence analysis; power statistics; alignment-free functionsalignment-free functionsData miningCompleteness (statistics)Molecular BiologycomputerType I and type II errors

researchProduct

Long read alignment based on maximal exact match seeds

2012

Abstract Motivation: The explosive growth of next-generation sequencing datasets poses a challenge to the mapping of reads to reference genomes in terms of alignment quality and execution speed. With the continuing progress of high-throughput sequencing technologies, read length is constantly increasing and many existing aligners are becoming inefficient as generated reads grow larger. Results: We present CUSHAW2, a parallelized, accurate, and memory-efficient long read aligner. Our aligner is based on the seed-and-extend approach and uses maximal exact matches as seeds to find gapped alignments. We have evaluated and compared CUSHAW2 to the three other long read aligners BWA-SW, Bowtie2 an…

Statistics and ProbabilitySequencing and Sequence AnalysisTheoretical computer scienceGenomicsBiologyBiochemistrySoftwareHumansMolecular BiologyAlignment-free sequence analysisExact matchSupplementary dataGenome Humanbusiness.industryChromosome MappingHigh-Throughput Nucleotide SequencingGenomicsSequence Analysis DNAOriginal PapersComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicsComputer engineeringScalabilitybusinessSequence AlignmentAlgorithmsSoftwareBioinformatics

researchProduct

Dimension reduction for time series in a blind source separation context using r

2021

Funding Information: The work of KN was supported by the CRoNoS COST Action IC1408 and the Austrian Science Fund P31881-N32. The work of ST was supported by the CRoNoS COST Action IC1408. The work of JV was supported by Academy of Finland (grant 321883). We would like to thank the anonymous reviewers for their comments which improved the paper and package considerably. Publisher Copyright: © 2021, American Statistical Association. All rights reserved. Multivariate time series observations are increasingly common in multiple fields of science but the complex dependencies of such data often translate into intractable models with large number of parameters. An alternative is given by first red…

Statistics and ProbabilitySeries (mathematics)Stochastic volatilityComputer scienceblind source separation; supervised dimension reduction; RsignaalinkäsittelyDimensionality reductionRsignaalianalyysiContext (language use)CovarianceBlind signal separationQA273-280aikasarja-analyysiR-kieliDimension (vector space)monimuuttujamenetelmätBlind source separationStatistics Probability and UncertaintyTime seriesAlgorithmSoftwareSupervised dimension reduction

researchProduct

DRUDIT: Web-based DRUgs DIscovery Tools to design small molecules as modulators of biological targets

2019

Abstract Motivation New in silico tools to predict biological affinities for input structures are presented. The tools are implemented in the DRUDIT (DRUgs DIscovery Tools) web service. The DRUDIT biological finder module is based on molecular descriptors that are calculated by the MOLDESTO (MOLecular DEScriptors TOol) software module developed by the same authors, which is able to calculate more than one thousand molecular descriptors. At this stage, DRUDIT includes 250 biological targets, but new external targets can be added. This feature extends the application scope of DRUDIT to several fields. Moreover, two more functions are implemented: the multi- and on/off-target tasks. These tool…

Statistics and ProbabilityService (systems architecture)PolypharmacologyComputer scienceIn silicoMachine learningcomputer.software_genre01 natural sciencesBiochemistrybiological target finderdrug discoveryMolecular descriptors03 medical and health sciencesMolecular descriptorSettore BIO/10 - BiochimicaWeb applicationComputer SimulationPolypharmacologyMolecular Biology030304 developmental biologySettore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniInternet0303 health sciencesbusiness.industrySmall moleculeSettore CHIM/08 - Chimica Farmaceutica0104 chemical sciencesComputer Science Applications010404 medicinal & biomolecular chemistryComputational MathematicsComputational Theory and MathematicsBiological targetThe InternetArtificial intelligencebusinesscomputerSoftware

researchProduct

Estimating growth charts via nonparametric quantile regression: a practical framework with application in ecology.

2013

We discuss a practical and effective framework to estimate reference growth charts via regression quantiles. Inequality constraints are used to ensure both monotonicity and non-crossing of the estimated quantile curves and penalized splines are employed to model the nonlinear growth patterns with respect to age. A companion R package is presented and relevant code discussed to favour spreading and application of the proposed methods.

Statistics and ProbabilitySettore BIO/07 - EcologiaStatistics::TheoryEcology (disciplines)Nonparametric statisticsMonotonic functionRegressionStatistics::ComputationQuantile regressionNonlinear systemR packageStatisticsEconometricsStatistics::MethodologyGrowth charts Nonparametric regression quantiles Penalized splines P. oceanica modelling R softwareStatistics Probability and UncertaintySettore SECS-S/01 - StatisticaGeneral Environmental ScienceMathematicsQuantile

researchProduct

Overlap and diversity in antimicrobial peptide databases: Compiling a non-redundant set of sequences

2015

Abstract Motivation: The large variety of antimicrobial peptide (AMP) databases developed to date are characterized by a substantial overlap of data and similarity of sequences. Our goals are to analyze the levels of redundancy for all available AMP databases and use this information to build a new non-redundant sequence database. For this purpose, a new software tool is introduced. Results: A comparative study of 25 AMP databases reveals the overlap and diversity among them and the internal diversity within each database. The overlap analysis shows that only one database (Peptaibol) contains exclusive data, not present in any other, whereas all sequences in the LAMP_Patent database are inc…

Statistics and ProbabilitySimilarity (geometry)Computer scienceSequence analysisAntimicrobial peptidesPeptaibolPeptidecomputer.software_genreProceduresBiochemistrySet (abstract data type)chemistry.chemical_compoundProtein methodsSequence Analysis ProteinRedundancy (engineering)HumansDatabases ProteinMolecular BiologyAntimicrobial cationic peptideschemistry.chemical_classificationSequenceAntimicrobial cationic peptideDatabaseSequence databaseSequence analysisComputer Science ApplicationsAlgorithmComputational MathematicsChemistryProtein databaseComputational Theory and MathematicschemistryData miningNucleic acid databaseDatabases Nucleic AcidcomputerSoftwareAlgorithmsHuman

researchProduct

SKINK: a web server for string kernel based kink prediction in α-helices

2014

Abstract Motivation: The reasons for distortions from optimal α-helical geometry are widely unknown, but their influences on structural changes of proteins are significant. Hence, their prediction is a crucial problem in structural bioinformatics. Here, we present a new web server, called SKINK, for string kernel based kink prediction. Extending our previous study, we also annotate the most probable kink position in a given α-helix sequence. Availability and implementation: The SKINK web server is freely accessible at http://biows-inf.zdv.uni-mainz.de/skink. Moreover, SKINK is a module of the BALL software, also freely available at www.ballview.org. Contact: benny.kneissl@roche.com

Statistics and ProbabilitySkinkWeb serverTheoretical computer scienceComputer scienceReal-time computingcomputer.software_genreBiochemistryProtein Structure SecondaryStructural bioinformaticsSoftwareSequence Analysis ProteinString kernelPosition (vector)Ball (mathematics)Molecular BiologyInternetSequencebiologybusiness.industryComputational BiologyProteinsbiology.organism_classificationComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicsbusinesscomputerSoftwareBioinformatics

researchProduct

kmcEx: memory-frugal and retrieval-efficient encoding of counted k-mers.

2018

Abstract Motivation K-mers along with their frequency have served as an elementary building block for error correction, repeat detection, multiple sequence alignment, genome assembly, etc., attracting intensive studies in k-mer counting. However, the output of k-mer counters itself is large; very often, it is too large to fit into main memory, leading to highly narrowed usability. Results We introduce a novel idea of encoding k-mers as well as their frequency, achieving good memory saving and retrieval efficiency. Specifically, we propose a Bloom filter-like data structure to encode counted k-mers by coupled-bit arrays—one for k-mer representation and the other for frequency encoding. Exper…

Statistics and ProbabilitySource codeComputer sciencemedia_common.quotation_subject0206 medical engineeringHash function02 engineering and technologyBiochemistry03 medical and health sciencesEncoding (memory)Molecular BiologyTime complexity030304 developmental biologyBlock (data storage)media_common0303 health sciencesSequence Analysis DNAData structureComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicsError detection and correctionAlgorithmSequence Alignment020602 bioinformaticsAlgorithmsSoftwareBioinformatics (Oxford, England)

researchProduct

ArtiFuse—computational validation of fusion gene detection tools without relying on simulated reads

2019

Abstract Motivation Gene fusions are an important class of transcriptional variants that can influence cancer development and can be predicted from RNA sequencing (RNA-seq) data by multiple existing tools. However, the real-world performance of these tools is unclear due to the lack of known positive and negative events, especially with regard to fusion genes in individual samples. Often simulated reads are used, but these cannot account for all technical biases in RNA-seq data generated from real samples. Results Here, we present ArtiFuse, a novel approach that simulates fusion genes by sequence modification to the genomic reference, and therefore, can be applied to any RNA-seq dataset wit…

Statistics and ProbabilitySource codeSequence analysisComputer sciencemedia_common.quotation_subjectValue (computer science)Genomicscomputer.software_genreBiochemistryFusion gene03 medical and health sciences0302 clinical medicineSoftwareMolecular BiologyGene030304 developmental biologymedia_common0303 health sciencesSequence Analysis RNAbusiness.industryHigh-Throughput Nucleotide SequencingRNAGenomicsComputer Science ApplicationsComputational MathematicsComputational Theory and Mathematics030220 oncology & carcinogenesisBenchmark (computing)RNAData miningGene FusionbusinesscomputerSoftwareBioinformatics

researchProduct

Fully Bayesian Approach to Image Restoration with an Application in Biogeography

1994

SUMMARY A common method of studying biogeographical ranges is an atlas survey, in which the research area is divided into a square grid and the data consist of the squares where observations occur. Often the observations form only an incomplete map of the true range, and a method is required to decide whether the blank squares indicate true absence or merely a lack of study there. This is essentially an image restoration problem, but it has properties that make the common empirical Bayesian procedures inadequate. Most notably, the observed image is heavily degraded, causing difficulties in the estimation of spatial interaction, and the assessment of reliability of the restoration is emphasi…

Statistics and ProbabilitySquare tilingAtlas (topology)Spatial interactionBayesian probabilityCommon methodcomputer.software_genreBlankGeographyData miningStatistics Probability and UncertaintySpatial analysiscomputerImage restorationApplied Statistics

researchProduct