Search results for "error"

showing 10 items of 1643 documents

Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases

2019

AbstractThe widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with ‘ready-to-use’ deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotatio…

FOS: Computer and information sciencesBioinformatics[SDV]Life Sciences [q-bio]Sequence assemblyGenomics[SDV.BC]Life Sciences [q-bio]/Cellular BiologyComputational biologyBiologyGenome03 medical and health sciencesAnnotation0302 clinical medicineTandem repeatGeneticsAnimalsSurvey and SummaryDatabases ProteinGeneComputingMilieux_MISCELLANEOUS030304 developmental biology0303 health sciencesEnd user572: BiochemieDNASequence Analysis DNAGenomics[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM]WorkflowComputingMethodologies_PATTERNRECOGNITIONGadus morhuaTandem Repeat SequencesScientific Experimental Error[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]Databases Nucleic Acid030217 neurology & neurosurgery

researchProduct

Analyzing Learned Representations of a Deep ASR Performance Prediction Model

2018

This paper addresses a relatively new task: prediction of ASR performance on unseen broadcast programs. In a previous paper, we presented an ASR performance prediction system using CNNs that encode both text (ASR transcript) and speech, in order to predict word error rate. This work is dedicated to the analysis of speech signal embeddings and text embeddings learnt by the CNN while training our prediction model. We try to better understand which information is captured by the deep model and its relation with different conditioning factors. It is shown that hidden layers convey a clear signal about speech style, accent and broadcast type. We then try to leverage these 3 types of information …

FOS: Computer and information sciencesComputer Science - Computation and LanguageComputer scienceSpeech recognitionWord error rate02 engineering and technology010501 environmental sciences01 natural sciences[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL][INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL]0202 electrical engineering electronic engineering information engineeringPerformance predictionLeverage (statistics)020201 artificial intelligence & image processingComputation and Language (cs.CL)0105 earth and related environmental sciences

researchProduct

Accounting for Input Noise in Gaussian Process Parameter Retrieval

2020

Gaussian processes (GPs) are a class of Kernel methods that have shown to be very useful in geoscience and remote sensing applications for parameter retrieval, model inversion, and emulation. They are widely used because they are simple, flexible, and provide accurate estimates. GPs are based on a Bayesian statistical framework which provides a posterior probability function for each estimation. Therefore, besides the usual prediction (given in this case by the mean function), GPs come equipped with the possibility to obtain a predictive variance (i.e., error bars, confidence intervals) for each prediction. Unfortunately, the GP formulation usually assumes that there is no noise in the inpu…

FOS: Computer and information sciencesComputer Science - Machine LearningComputer sciencePosterior probability0211 other engineering and technologiesMachine Learning (stat.ML)02 engineering and technologyMachine Learning (cs.LG)symbols.namesakeStatistics - Machine LearningElectrical and Electronic EngineeringGaussian process021101 geological & geomatics engineeringPropagation of uncertaintyNoise measurementbusiness.industryFunction (mathematics)Geotechnical Engineering and Engineering GeologySea surface temperatureNoiseKernel methodsymbolsGlobal Positioning SystemErrors-in-variables modelsbusinessAlgorithmIEEE Geoscience and Remote Sensing Letters

researchProduct

Retrieval of aboveground crop nitrogen content with a hybrid machine learning method

2020

Abstract Hyperspectral acquisitions have proven to be the most informative Earth observation data source for the estimation of nitrogen (N) content, which is the main limiting nutrient for plant growth and thus agricultural production. In the past, empirical algorithms have been widely employed to retrieve information on this biochemical plant component from canopy reflectance. However, these approaches do not seek for a cause-effect relationship based on physical laws. Moreover, most studies solely relied on the correlation of chlorophyll content with nitrogen, and thus neglected the fact that most N is bound in proteins. Our study presents a hybrid retrieval method using a physically-base…

FOS: Computer and information sciencesComputer Science - Machine LearningHeteroscedasticity010504 meteorology & atmospheric sciencesMean squared errorEnMAP0211 other engineering and technologiesGaussian processes02 engineering and technologyManagement Monitoring Policy and LawQuantitative Biology - Quantitative Methods01 natural sciencesMachine Learning (cs.LG)symbols.namesakeHomoscedasticityEnMAPAgricultural monitoringComputers in Earth SciencesGaussian processQuantitative Methods (q-bio.QM)021101 geological & geomatics engineering0105 earth and related environmental sciencesEarth-Surface ProcessesMathematicsRemote sensing2. Zero hungerGlobal and Planetary ChangeInversionHyperspectral imagingImaging spectroscopyRadiative transfer modelingRegressionImaging spectroscopyFOS: Biological sciences[SDE]Environmental SciencessymbolsInternational Journal of Applied Earth Observation and Geoinformation

researchProduct

Human experts vs. machines in taxa recognition

2020

The step of expert taxa recognition currently slows down the response time of many bioassessments. Shifting to quicker and cheaper state-of-the-art machine learning approaches is still met with expert scepticism towards the ability and logic of machines. In our study, we investigate both the differences in accuracy and in the identification logic of taxonomic experts and machines. We propose a systematic approach utilizing deep Convolutional Neural Nets with the transfer learning paradigm and extensively evaluate it over a multi-pose taxonomic dataset with hierarchical labels specifically created for this comparison. We also study the prediction accuracy on different ranks of taxonomic hier…

FOS: Computer and information sciencesComputer Science - Machine Learninghahmontunnistus (tietotekniikka)Computer scienceClassification approachTaxonomic expert02 engineering and technologyneuroverkotcomputer.software_genreConvolutional neural networkQuantitative Biology - Quantitative MethodsField (computer science)Machine Learning (cs.LG)Machine learning approachesStatistics - Machine LearningAutomated approachDeep neural networks0202 electrical engineering electronic engineering information engineeringTaxonomic rankQuantitative Methods (q-bio.QM)Classification (of information)Artificial neural networksystematiikka (biologia)Prediction accuracyIdentification (information)koneoppiminenMulti-image dataBenchmark (computing)020201 artificial intelligence & image processingConvolutional neural networksComputer Vision and Pattern RecognitionClassification errorsMachine Learning (stat.ML)Machine learningState of the artElectrical and Electronic EngineeringTaxonomySupport vector machinesLearning systemsbusiness.industryNode (networking)020206 networking & telecommunicationsComputer circuitsHierarchical classificationConvolutionSupport vector machineFOS: Biological sciencesTaxonomic hierarchySignal ProcessingBiomonitoringBenchmark datasetsArtificial intelligencebusinesscomputertaksonitSoftware

researchProduct

Optimal one-shot quantum algorithm for EQUALITY and AND

2017

We study the computation complexity of Boolean functions in the quantum black box model. In this model our task is to compute a function $f:\{0,1\}\to\{0,1\}$ on an input $x\in\{0,1\}^n$ that can be accessed by querying the black box. Quantum algorithms are inherently probabilistic; we are interested in the lowest possible probability that the algorithm outputs incorrect answer (the error probability) for a fixed number of queries. We show that the lowest possible error probability for $AND_n$ and $EQUALITY_{n+1}$ is $1/2-n/(n^2+1)$.

FOS: Computer and information sciencesDiscrete mathematicsOne shotQuantum PhysicsGeneral Computer ScienceProbabilistic logicFOS: Physical sciencesFunction (mathematics)Computational Complexity (cs.CC)Computer Science - Computational ComplexityProbability of errorComputation complexityQuantum algorithmQuantum Physics (quant-ph)Boolean functionQuantumMathematics

researchProduct

Isometric Words Based on Swap and Mismatch Distance

2023

An edit distance is a metric between words that quantifies how two words differ by counting the number of edit operations needed to transform one word into the other one. A word f is said isometric with respect to an edit distance if, for any pair of f-free words u and v, there exists a transformation of minimal length from u to v via the related edit operations such that all the intermediate words are also f-free. The adjective 'isometric' comes from the fact that, if the Hamming distance is considered (i.e., only mismatches), then isometric words are connected with definitions of isometric subgraphs of hypercubes. We consider the case of edit distance with swap and mismatch. We compare it…

FOS: Computer and information sciencesFormal Languages and Automata Theory (cs.FL)Computer Science - Formal Languages and Automata TheorySwap and mismatch distance Isometric words Overlap with errors

researchProduct

A Bayesian Multilevel Random-Effects Model for Estimating Noise in Image Sensors

2020

Sensor noise sources cause differences in the signal recorded across pixels in a single image and across multiple images. This paper presents a Bayesian approach to decomposing and characterizing the sensor noise sources involved in imaging with digital cameras. A Bayesian probabilistic model based on the (theoretical) model for noise sources in image sensing is fitted to a set of a time-series of images with different reflectance and wavelengths under controlled lighting conditions. The image sensing model is a complex model, with several interacting components dependent on reflectance and wavelength. The properties of the Bayesian approach of defining conditional dependencies among parame…

FOS: Computer and information sciencesMean squared errorC.4Computer scienceBayesian probabilityG.3ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISIONInference02 engineering and technologyBayesian inferenceStatistics - Applications0202 electrical engineering electronic engineering information engineeringFOS: Electrical engineering electronic engineering information engineeringApplications (stat.AP)Electrical and Electronic EngineeringImage sensorI.4.1C.4; G.3; I.4.1Pixelbusiness.industryImage and Video Processing (eess.IV)020206 networking & telecommunicationsPattern recognitionStatistical modelElectrical Engineering and Systems Science - Image and Video ProcessingRandom effects modelNoise62P30 62P35 62F15 62J05Signal Processing020201 artificial intelligence & image processingComputer Vision and Pattern RecognitionArtificial intelligencebusinessSoftware

researchProduct

Heretical Mutiple Importance Sampling

2016

Multiple Importance Sampling (MIS) methods approximate moments of complicated distributions by drawing samples from a set of proposal distributions. Several ways to compute the importance weights assigned to each sample have been recently proposed, with the so-called deterministic mixture (DM) weights providing the best performance in terms of variance, at the expense of an increase in the computational cost. A recent work has shown that it is possible to achieve a trade-off between variance reduction and computational effort by performing an a priori random clustering of the proposals (partial DM algorithm). In this paper, we propose a novel "heretical" MIS framework, where the clustering …

FOS: Computer and information sciencesMean squared errorComputer scienceApplied MathematicsEstimator020206 networking & telecommunications02 engineering and technologyVariance (accounting)Statistics - Computation01 natural sciencesReduction (complexity)010104 statistics & probability[INFO.INFO-TS]Computer Science [cs]/Signal and Image ProcessingSignal Processing0202 electrical engineering electronic engineering information engineeringA priori and a posterioriVariance reduction0101 mathematicsElectrical and Electronic EngineeringCluster analysisAlgorithm[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processingImportance samplingComputation (stat.CO)ComputingMilieux_MISCELLANEOUS

researchProduct

Unbiased Estimators and Multilevel Monte Carlo

2018

Multilevel Monte Carlo (MLMC) and unbiased estimators recently proposed by McLeish (Monte Carlo Methods Appl., 2011) and Rhee and Glynn (Oper. Res., 2015) are closely related. This connection is elaborated by presenting a new general class of unbiased estimators, which admits previous debiasing schemes as special cases. New lower variance estimators are proposed, which are stratified versions of earlier unbiased schemes. Under general conditions, essentially when MLMC admits the canonical square root Monte Carlo error rate, the proposed new schemes are shown to be asymptotically as efficient as MLMC, both in terms of variance and cost. The experiments demonstrate that the variance reduction…

FOS: Computer and information sciencesMonte Carlo methodWord error rate010103 numerical & computational mathematicsstochastic differential equationManagement Science and Operations ResearchStatistics - Computation01 natural sciences010104 statistics & probabilityStochastic differential equationstratificationSquare rootFOS: MathematicsApplied mathematics0101 mathematicsComputation (stat.CO)stokastiset prosessitMathematicsProbability (math.PR)ta111EstimatorVariance (accounting)unbiased estimatorsComputer Science ApplicationsMonte Carlo -menetelmät65C05 (Primary) 65C30 (Secondary)efficiencykerrostuneisuusVariance reductionunbiasemultilevel Monte CarlodifferentiaaliyhtälötMathematics - ProbabilityOperations Research

researchProduct