Search results for "Computer and Information Science"

showing 10 items of 1335 documents

Sorting suffixes of a text via its Lyndon Factorization

2013

The process of sorting the suffixes of a text plays a fundamental role in Text Algorithms. They are used for instance in the constructions of the Burrows-Wheeler transform and the suffix array, widely used in several fields of Computer Science. For this reason, several recent researches have been devoted to finding new strategies to obtain effective methods for such a sorting. In this paper we introduce a new methodology in which an important role is played by the Lyndon factorization, so that the local suffixes inside factors detected by this factorization keep their mutual order when extended to the suffixes of the whole word. This property suggests a versatile technique that easily can b…

FOS: Computer and information sciencesBWTLyndon FactorizationSettore INF/01 - InformaticaSorting Suffixes; Lyndon Factorization; Lyndon WordsSuffix arrayComputer Science - Data Structures and AlgorithmsData_FILESData Structures and Algorithms (cs.DS)Lyndon wordSorting suffixeSorting SuffixesLyndon Words
researchProduct

Helminth Microbiota Profiling Using Bacterial 16S rRNA Gene Amplicon Sequencing: From Sampling to Sequence Data Mining

2021

Symbiont microbial communities play important roles in animal biology and are thus considered integral components of metazoan organisms, including parasitic worms (helminths). Nevertheless, the study of helminth microbiomes has thus far been largely overlooked, and symbiotic relationships between helminths and their microbiomes have been only investigated in selected parasitic worms. Over the past decade, advances in next-generation sequencing technologies, coupled with their increased affordability, have spurred investigations of helminth-associated microbial communities aiming at enhancing current understanding of their fundamental biology and physiology, as well as of host-microbe intera…

FOS: Computer and information sciencesBioinformaticsComputational biologyBiologyDNA sequencingSymbiosisHelminthsRNA Ribosomal 16Sparasitic diseasesHelminthAnimalsData MiningHelminthsMicrobiomeGeneBacterial 16S rRNA geneIndirect life cycleHigh-throughput sequencingMicrobiotaHigh-Throughput Nucleotide SequencingGenes rRNASchistosoma mansoniAmplicon sequencingHuman genomeSample collectionWorm-associated microbiome
researchProduct

Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases

2019

AbstractThe widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with ‘ready-to-use’ deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotatio…

FOS: Computer and information sciencesBioinformatics[SDV]Life Sciences [q-bio]Sequence assemblyGenomics[SDV.BC]Life Sciences [q-bio]/Cellular BiologyComputational biologyBiologyGenome03 medical and health sciencesAnnotation0302 clinical medicineTandem repeatGeneticsAnimalsSurvey and SummaryDatabases ProteinGeneComputingMilieux_MISCELLANEOUS030304 developmental biology0303 health sciencesEnd user572: BiochemieDNASequence Analysis DNAGenomics[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM]WorkflowComputingMethodologies_PATTERNRECOGNITIONGadus morhuaTandem Repeat SequencesScientific Experimental Error[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]Databases Nucleic Acid030217 neurology & neurosurgery
researchProduct

Extending the Tsetlin Machine With Integer-Weighted Clauses for Increased Interpretability

2020

Despite significant effort, building models that are both interpretable and accurate is an unresolved challenge for many pattern recognition problems. In general, rule-based and linear models lack accuracy, while deep learning interpretability is based on rough approximations of the underlying inference. Using a linear combination of conjunctive clauses in propositional logic, Tsetlin Machines (TMs) have shown competitive performance on diverse benchmarks. However, to do so, many clauses are needed, which impacts interpretability. Here, we address the accuracy-interpretability challenge in machine learning by equipping the TM clauses with integer weights. The resulting Integer Weighted TM (…

FOS: Computer and information sciencesBoosting (machine learning)Theoretical computer scienceinteger-weighted Tsetlin machineGeneral Computer ScienceComputer scienceComputer Science - Artificial Intelligence0206 medical engineeringNatural language understandingInference02 engineering and technologycomputer.software_genre0202 electrical engineering electronic engineering information engineeringGeneral Materials ScienceTsetlin machineVDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550InterpretabilityArtificial neural networkLearning automatabusiness.industryDeep learningGeneral Engineeringinterpretable machine learningrule-based learninginterpretable AIPropositional calculusSupport vector machineArtificial Intelligence (cs.AI)TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGESXAIPattern recognition (psychology)020201 artificial intelligence & image processinglcsh:Electrical engineering. Electronics. Nuclear engineeringArtificial intelligencebusinesslcsh:TK1-9971computer020602 bioinformaticsInteger (computer science)
researchProduct

Novel Results on the Number of Runs of the Burrows-Wheeler-Transform

2021

The Burrows-Wheeler-Transform (BWT), a reversible string transformation, is one of the fundamental components of many current data structures in string processing. It is central in data compression, as well as in efficient query algorithms for sequence data, such as webpages, genomic and other biological sequences, or indeed any textual data. The BWT lends itself well to compression because its number of equal-letter-runs (usually referred to as $r$) is often considerably lower than that of the original string; in particular, it is well suited for strings with many repeated factors. In fact, much attention has been paid to the $r$ parameter as measure of repetitiveness, especially to evalua…

FOS: Computer and information sciencesBurrows–Wheeler transformSettore INF/01 - InformaticaCombinatorics on wordsFormal Languages and Automata Theory (cs.FL)Computer scienceString (computer science)Search engine indexingCompressed data structuresComputer Science - Formal Languages and Automata TheoryString indexingData structureMeasure (mathematics)Burrows-Wheeler-TransformRepetitivenessCombinatorics on wordsBurrows-Wheeler-Transform Compressed data structures String indexing Repetitiveness Combinatorics on wordsTransformation (function)Computer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)AlgorithmData compression
researchProduct

Adaptive learning of compressible strings

2020

Suppose an oracle knows a string $S$ that is unknown to us and that we want to determine. The oracle can answer queries of the form "Is $s$ a substring of $S$?". In 1995, Skiena and Sundaram showed that, in the worst case, any algorithm needs to ask the oracle $\sigma n/4 -O(n)$ queries in order to be able to reconstruct the hidden string, where $\sigma$ is the size of the alphabet of $S$ and $n$ its length, and gave an algorithm that spends $(\sigma-1)n+O(\sigma \sqrt{n})$ queries to reconstruct $S$. The main contribution of our paper is to improve the above upper-bound in the context where the string is compressible. We first present a universal algorithm that, given a (computable) compre…

FOS: Computer and information sciencesCentroid decompositionGeneral Computer ScienceString compressionAdaptive learningKolmogorov complexityContext (language use)Data_CODINGANDINFORMATIONTHEORYString reconstructionTheoretical Computer ScienceCombinatoricsString reconstruction; String learning; Adaptive learning; Kolmogorov complexity; String compression; Lempel-Ziv; Centroid decomposition; Suffix treeSuffix treeIntegerComputer Science - Data Structures and AlgorithmsOrder (group theory)Data Structures and Algorithms (cs.DS)Adaptive learning; Centroid decomposition; Kolmogorov complexity; Lempel-Ziv; String compression; String learning; String reconstruction; Suffix treeTime complexityComputer Science::DatabasesMathematicsLempel-ZivSettore INF/01 - InformaticaLinear spaceString (computer science)SubstringBounded functionString learningTheoretical Computer Science
researchProduct

Adaptive Task Assignment in Online Learning Environments

2016

With the increasing popularity of online learning, intelligent tutoring systems are regaining increased attention. In this paper, we introduce adaptive algorithms for personalized assignment of learning tasks to student so that to improve his performance in online learning environments. As main contribution of this paper, we propose a a novel Skill-Based Task Selector (SBTS) algorithm which is able to approximate a student's skill level based on his performance and consequently suggest adequate assignments. The SBTS is inspired by the class of multi-armed bandit algorithms. However, in contrast to standard multi-armed bandit approaches, the SBTS aims at acquiring two criteria related to stu…

FOS: Computer and information sciencesClass (computer programming)Computer sciencebusiness.industryComputer Science - Artificial IntelligenceNode (networking)05 social sciences050301 educationContrast (statistics)02 engineering and technologyMachine learningcomputer.software_genrePopularityIntelligent tutoring systemTask (project management)Artificial Intelligence (cs.AI)020204 information systems0202 electrical engineering electronic engineering information engineeringSelection (linguistics)ComputingMilieux_COMPUTERSANDEDUCATIONAdaptive learningArtificial intelligencebusiness0503 educationcomputer
researchProduct

Popularity of patterns over $d$-equivalence classes of words and permutations

2020

Abstract Two same length words are d-equivalent if they have same descent set and same underlying alphabet. In particular, two same length permutations are d-equivalent if they have same descent set. The popularity of a pattern in a set of words is the overall number of copies of the pattern within the words of the set. We show the far-from-trivial fact that two patterns are d-equivalent if and only if they are equipopular over any d-equivalence class, and this equipopularity does not follow obviously from a trivial equidistribution.

FOS: Computer and information sciencesClass (set theory)General Computer ScienceDiscrete Mathematics (cs.DM)010102 general mathematics0102 computer and information sciences01 natural sciencesPopularityTheoretical Computer ScienceCombinatoricsSet (abstract data type)010201 computation theory & mathematicsIf and only if[MATH.MATH-CO]Mathematics [math]/Combinatorics [math.CO]FOS: MathematicsMathematics - CombinatoricsCombinatorics (math.CO)0101 mathematicsAlphabetComputingMilieux_MISCELLANEOUSComputer Science::Formal Languages and Automata TheoryMathematicsDescent (mathematics)Computer Science - Discrete Mathematics
researchProduct

Probabilistic entailment in the setting of coherence: The role of quasi conjunction and inclusion relation

2013

In this paper, by adopting a coherence-based probabilistic approach to default reasoning, we focus the study on the logical operation of quasi conjunction and the Goodman-Nguyen inclusion relation for conditional events. We recall that quasi conjunction is a basic notion for defining consistency of conditional knowledge bases. By deepening some results given in a previous paper we show that, given any finite family of conditional events F and any nonempty subset S of F, the family F p-entails the quasi conjunction C(S); then, given any conditional event E|H, we analyze the equivalence between p-entailment of E|H from F and p-entailment of E|H from C(S), where S is some nonempty subset of F.…

FOS: Computer and information sciencesClass (set theory)Goodman–Nguyen’s inclusion relationQAND ruleSettore MAT/06 - Probabilita' E Statistica MatematicaComputer Science - Artificial IntelligenceMathematics - Statistics TheoryStatistics Theory (math.ST)Logical consequencegoodman-nguyen's inclusion relationTheoretical Computer ScienceArtificial IntelligenceQuasi conjunctionFOS: MathematicsEquivalence (measure theory)MathematicsEvent (probability theory)Discrete mathematicsSettore INF/01 - InformaticaApplied MathematicsProbability (math.PR)quasi conjunction; goodman-nguyen inclusion relation; qand rule; coherence; probabilistic default reasoning; p-entailment; goodman-nguyen's inclusion relationProbabilistic logicCoherence (statistics)Conjunction (grammar)Greatest elementArtificial Intelligence (cs.AI)Probabilistic default reasoninggoodman-nguyen inclusion relationp-EntailmentCoherenceSoftwareMathematics - Probability
researchProduct

On the Number of Closed Factors in a Word

2015

A closed word (a.k.a. periodic-like word or complete first return) is a word whose longest border does not have internal occurrences, or, equivalently, whose longest repeated prefix is not right special. We investigate the structure of closed factors of words. We show that a word of length $n$ contains at least $n+1$ distinct closed factors, and characterize those words having exactly $n+1$ closed factors. Furthermore, we show that a word of length $n$ can contain $\Theta(n^{2})$ many distinct closed factors.

FOS: Computer and information sciencesClosed wordCombinatorics on wordsComplete returnFormal Languages and Automata Theory (cs.FL)Computer scienceComputer Science (all)Structure (category theory)Computer Science - Formal Languages and Automata TheoryCombinatorics on words Closed word Complete return Rich word Bitonic word68R15Theoretical Computer ScienceCombinatoricsPrefixCombinatorics on wordsRich wordBitonic wordFOS: MathematicsMathematics - CombinatoricsCombinatorics (math.CO)ArithmeticWord (computer architecture)Combinatorics on word
researchProduct