Search results for "algorithm"

showing 10 items of 4887 documents

Overlap and diversity in antimicrobial peptide databases: Compiling a non-redundant set of sequences

2015

Abstract Motivation: The large variety of antimicrobial peptide (AMP) databases developed to date are characterized by a substantial overlap of data and similarity of sequences. Our goals are to analyze the levels of redundancy for all available AMP databases and use this information to build a new non-redundant sequence database. For this purpose, a new software tool is introduced. Results: A comparative study of 25 AMP databases reveals the overlap and diversity among them and the internal diversity within each database. The overlap analysis shows that only one database (Peptaibol) contains exclusive data, not present in any other, whereas all sequences in the LAMP_Patent database are inc…

Statistics and ProbabilitySimilarity (geometry)Computer scienceSequence analysisAntimicrobial peptidesPeptaibolPeptidecomputer.software_genreProceduresBiochemistrySet (abstract data type)chemistry.chemical_compoundProtein methodsSequence Analysis ProteinRedundancy (engineering)HumansDatabases ProteinMolecular BiologyAntimicrobial cationic peptideschemistry.chemical_classificationSequenceAntimicrobial cationic peptideDatabaseSequence databaseSequence analysisComputer Science ApplicationsAlgorithmComputational MathematicsChemistryProtein databaseComputational Theory and MathematicschemistryData miningNucleic acid databaseDatabases Nucleic AcidcomputerSoftwareAlgorithmsHuman

researchProduct

kmcEx: memory-frugal and retrieval-efficient encoding of counted k-mers.

2018

Abstract Motivation K-mers along with their frequency have served as an elementary building block for error correction, repeat detection, multiple sequence alignment, genome assembly, etc., attracting intensive studies in k-mer counting. However, the output of k-mer counters itself is large; very often, it is too large to fit into main memory, leading to highly narrowed usability. Results We introduce a novel idea of encoding k-mers as well as their frequency, achieving good memory saving and retrieval efficiency. Specifically, we propose a Bloom filter-like data structure to encode counted k-mers by coupled-bit arrays—one for k-mer representation and the other for frequency encoding. Exper…

Statistics and ProbabilitySource codeComputer sciencemedia_common.quotation_subject0206 medical engineeringHash function02 engineering and technologyBiochemistry03 medical and health sciencesEncoding (memory)Molecular BiologyTime complexity030304 developmental biologyBlock (data storage)media_common0303 health sciencesSequence Analysis DNAData structureComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicsError detection and correctionAlgorithmSequence Alignment020602 bioinformaticsAlgorithmsSoftwareBioinformatics (Oxford, England)

researchProduct

Adaptive Modifications of Hypotheses After an Interim Analysis

2001

It is investigated how one can modify hypotheses in a trial after an interim analysis such that the type I error rate is controlled. If only a global statement is desired, a solution was given by Bauer (1989). For a general multiple testing problem, Kieser, Bauer and Lehmacher (1999) and Bauer and Kieser (1999) gave solutions, by means of which the initial set of hypotheses can be reduced after the interim analysis. The same techniques can be applied to obtain more flexible strategies, as changing weights of hypotheses, changing an a priori order, or even including new hypotheses. It is emphasized that the application of these methods requires very careful planning of a trial as well as a c…

Statistics and ProbabilityStatement (computer science)Mathematical optimizationGeneral MedicineInterim analysisWeightingMultiple comparisons problemA priori and a posterioriStatistics Probability and UncertaintySet (psychology)AlgorithmStatistical hypothesis testingType I and type II errorsMathematicsBiometrical Journal

researchProduct

The Induced Smoothed lasso: A practical framework for hypothesis testing in high dimensional regression.

2020

This paper focuses on hypothesis testing in lasso regression, when one is interested in judging statistical significance for the regression coefficients in the regression equation involving a lot of covariates. To get reliable p-values, we propose a new lasso-type estimator relying on the idea of induced smoothing which allows to obtain appropriate covariance matrix and Wald statistic relatively easily. Some simulation experiments reveal that our approach exhibits good performance when contrasted with the recent inferential tools in the lasso framework. Two real data analyses are presented to illustrate the proposed framework in practice.

Statistics and ProbabilityStatistics::TheoryInduced smoothingEpidemiologyComputer scienceFeature selectionWald test01 natural sciencesasthma researchStatistics::Machine Learning010104 statistics & probability03 medical and health sciencesHealth Information ManagementLasso (statistics)Linear regressionsparse modelsStatistics::MethodologyComputer Simulation0101 mathematicssandwich formula030304 developmental biologyStatistical hypothesis testing0303 health sciencesCovariance matrixlung functionRegression analysisStatistics::Computationsparse modelResearch DesignAlgorithmSmoothingvariable selectionStatistical methods in medical research

researchProduct

Selecting the tuning parameter in penalized Gaussian graphical models

2019

Penalized inference of Gaussian graphical models is a way to assess the conditional independence structure in multivariate problems. In this setting, the conditional independence structure, corresponding to a graph, is related to the choice of the tuning parameter, which determines the model complexity or degrees of freedom. There has been little research on the degrees of freedom for penalized Gaussian graphical models. In this paper, we propose an estimator of the degrees of freedom in $$\ell _1$$ -penalized Gaussian graphical models. Specifically, we derive an estimator inspired by the generalized information criterion and propose to use this estimator as the bias term for two informatio…

Statistics and ProbabilityStatistics::TheoryKullback–Leibler divergenceKullback-Leibler divergenceComputer scienceGaussianInformation Criteria010103 numerical & computational mathematicsModel complexityModel selection01 natural sciencesTheoretical Computer Science010104 statistics & probabilitysymbols.namesakeStatistics::Machine LearningGeneralized information criterionEntropy (information theory)Statistics::MethodologyGraphical model0101 mathematicsPenalized Likelihood Kullback-Leibler Divergence Model Complexity Model Selection Generalized Information Criterion.Model selectionEstimatorStatistics::ComputationComputational Theory and MathematicsConditional independencesymbolsPenalized likelihoodStatistics Probability and UncertaintySettore SECS-S/01 - StatisticaAlgorithmStatistics and Computing

researchProduct

Design-based estimation for geometric quantiles with application to outlier detection

2010

Geometric quantiles are investigated using data collected from a complex survey. Geometric quantiles are an extension of univariate quantiles in a multivariate set-up that uses the geometry of multivariate data clouds. A very important application of geometric quantiles is the detection of outliers in multivariate data by means of quantile contours. A design-based estimator of geometric quantiles is constructed and used to compute quantile contours in order to detect outliers in both multivariate data and survey sampling set-ups. An algorithm for computing geometric quantile estimates is also developed. Under broad assumptions, the asymptotic variance of the quantile estimator is derived an…

Statistics and ProbabilityStatistics::TheoryTheoryofComputation_COMPUTATIONBYABSTRACTDEVICESStatistics::ApplicationsComputingMethodologies_SIMULATIONANDMODELINGApplied MathematicsMathematicsofComputing_NUMERICALANALYSISUnivariateInformationSystems_DATABASEMANAGEMENTEstimatorStatistics::ComputationQuantile regressionHorvitz–Thompson estimatorComputational MathematicsDelta methodComputational Theory and MathematicsTheoryofComputation_ANALYSISOFALGORITHMSANDPROBLEMCOMPLEXITYOutlierConsistent estimatorStatisticsStatistics::MethodologyMathematicsQuantileComputational Statistics & Data Analysis

researchProduct

On the stability and ergodicity of adaptive scaling Metropolis algorithms

2011

The stability and ergodicity properties of two adaptive random walk Metropolis algorithms are considered. The both algorithms adjust the scaling of the proposal distribution continuously based on the observed acceptance probability. Unlike the previously proposed forms of the algorithms, the adapted scaling parameter is not constrained within a predefined compact interval. The first algorithm is based on scale adaptation only, while the second one incorporates also covariance adaptation. A strong law of large numbers is shown to hold assuming that the target density is smooth enough and has either compact support or super-exponentially decaying tails.

Statistics and ProbabilityStochastic approximationMathematics - Statistics TheoryStatistics Theory (math.ST)Law of large numbersMultiple-try Metropolis01 natural sciencesStability (probability)010104 statistics & probabilityModelling and Simulation65C40 60J27 93E15 93E35Adaptive Markov chain Monte CarloFOS: Mathematics0101 mathematicsScalingMetropolis algorithmMathematicsta112Applied Mathematics010102 general mathematicsRejection samplingErgodicityProbability (math.PR)ta111CovarianceRandom walkMetropolis–Hastings algorithmModeling and SimulationAlgorithmStabilityMathematics - ProbabilityStochastic Processes and their Applications

researchProduct

Test and power considerations for multiple endpoint analyses using sequentially rejective graphical procedures

2009

A variety of powerful test procedures are available for the analysis of clinical trials addressing multiple objectives, such as comparing several treatments with a control, assessing the benefit of a new drug for more than one endpoint, etc. However, some of these procedures have reached a level of complexity that makes it difficult to communicate the underlying test strategies to clinical teams. Graphical approaches have been proposed instead that facilitate the derivation and communication of Bonferroni-based closed test procedures. In this paper we give a coherent description of the methodology and illustrate it with a real clinical trial example. We further discuss suitable power measur…

Statistics and ProbabilityTest strategyEndpoint DeterminationEpidemiologyComputer scienceControl (management)Analysis of clinical trialsMachine learningcomputer.software_genresymbols.namesakeDrug TherapyComputer GraphicsConfidence IntervalsHumansMulticenter Studies as TopicRandomized Controlled Trials as Topicbusiness.industryVariety (cybernetics)Test (assessment)Clinical trialBonferroni correctionClinical Trials Phase III as TopicData Interpretation StatisticalMultiple comparisons problemsymbolsArtificial intelligencebusinessAlgorithmcomputerStatistics in Medicine

researchProduct

Bayesian Design of “Successful” Replications

2002

Replication of experiments is commonin applied research. However, systematic studies of the goals and motivations of a “replication” are rare. As a consequence, there does not seem to be a precise notion of what a “success” when replicating means. This article discusses some of the possible goals for replication; this leads to different (but precise) notions of “success” when replicating. Bayesian hierarchical models allow for a flexible and explicit incorporation of the assumed relationship among the experiments. Bayesian predictive distributions are a natural tool to compute the probability of the replication being successful, and hence to design the replication so that the probability of…

Statistics and ProbabilityTheoretical computer scienceGeneral MathematicsBayesian probabilityHierarchical database modelBayesian designProbability of successNoncentral t-distributionReplication (statistics)Applied researchStatistics Probability and UncertaintyAlgorithmMathematicsStatistical hypothesis testingThe American Statistician

researchProduct

Basic networks: Definition and applications

2009

7 pages, 4 figures, 1 table.-- PMID: 19490867 [PubMed]

Statistics and ProbabilityTheoretical computer scienceInteractomeGeodesicinteractomeSteiner tree problemModels BiologicalGeneral Biochemistry Genetics and Molecular BiologyGraph03 medical and health sciencessymbols.namesakeModuleProtein Interaction MappingmoduleAnimalsSteiner tree030304 developmental biologyMathematicsDiscrete mathematics0303 health sciencesModels StatisticalGeneral Immunology and MicrobiologyApplied Mathematics030302 biochemistry & molecular biologyGeneral MedicinegraphGraphModeling and SimulationsymbolsNeural Networks ComputerGeneral Agricultural and Biological SciencesAlgorithms

researchProduct