Search results for "Data structures"

showing 10 items of 258 documents

Algorithmic paradigms for stability-based cluster validity and model selection statistical methods, with applications to microarray data analysis

2012

AbstractThe advent of high throughput technologies, in particular microarrays, for biological research has revived interest in clustering, resulting in a plethora of new clustering algorithms. However, model selection, i.e., the identification of the correct number of clusters in a dataset, has received relatively little attention. Indeed, although central for statistics, its difficulty is also well known. Fortunately, a few novel techniques for model selection, representing a sharp departure from previous ones in statistics, have been proposed and gained prominence for microarray data analysis. Among those, the stability-based methods are the most robust and best performing in terms of pre…

Settore INF/01 - InformaticaGeneral Computer Sciencebusiness.industryComputer scienceBioinformaticsModel selectionGeneral statisticsMachine learningcomputer.software_genreTheoretical Computer ScienceComputational biologyAnalysis of massive datasetsMachine learningCluster (physics)Algorithms and data structures General statistics Analysis of massive datasets Machine learning Computational biology BioinformaticsAlgorithms and data structuresAlgorithm designArtificial intelligenceCluster analysisbusinessCompleteness (statistics)computerComputer Science(all)Theoretical Computer Science
researchProduct

Indexed Two-Dimensional String Matching

2016

Settore INF/01 - InformaticaTwo-dimensional index data structuresString searching algorithm0102 computer and information sciences02 engineering and technologyApproximate string matching01 natural sciencesCombinatorics010201 computation theory & mathematicsIndex data structures for matrices or imageIndexing for matrices or image0202 electrical engineering electronic engineering information engineeringTwo-dimensional indexing for pattern matching020201 artificial intelligence & image processingString metricMathematics
researchProduct

A New Class of Searchable and Provably Highly Compressible String Transformations

2019

The Burrows-Wheeler Transform is a string transformation that plays a fundamental role for the design of self-indexing compressed data structures. Over the years, researchers have successfully extended this transformation outside the domains of strings. However, efforts to find non-trivial alternatives of the original, now 25 years old, Burrows-Wheeler string transformation have met limited success. In this paper we bring new lymph to this area by introducing a whole new family of transformations that have all the "myriad virtues" of the BWT: they can be computed and inverted in linear time, they produce provably highly compressible strings, and they support linear time pattern search direc…

Settore ING-INF/05 - Sistemi Di Elaborazione Delle InformazioniFOS: Computer and information sciences050101 languages & linguisticsBurrows-wheeler transformation; Combinatorics on words; Data indexing and compression000 Computer science knowledge general worksSettore INF/01 - InformaticaCombinatorics on words05 social sciences02 engineering and technologyData_CODINGANDINFORMATIONTHEORYComputer ScienceBurrows-wheeler transformationComputer Science - Data Structures and Algorithms0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processing0501 psychology and cognitive sciencesData Structures and Algorithms (cs.DS)Data indexing and compressionCombinatorics on word
researchProduct

Identifying the k Best Targets for an Advertisement Campaign via Online Social Networks

2020

We propose a novel approach for the recommendation of possible customers (users) to advertisers (e.g., brands) based on two main aspects: (i) the comparison between On-line Social Network profiles, and (ii) neighborhood analysis on the On-line Social Network. Profile matching between users and brands is considered based on bag-of-words representation of textual contents coming from the social media, and measures such as the Term Frequency-Inverse Document Frequency are used in order to characterize the importance of words in the comparison. The approach has been implemented relying on Big Data Technologies, allowing this way the efficient analysis of very large Online Social Networks. Resul…

Social and Information Networks (cs.SI)FOS: Computer and information sciencesMatching (statistics)Social networkSettore INF/01 - Informaticabusiness.industryComputer scienceBig dataDatabases (cs.DB)AdvertisingComputer Science - Social and Information NetworksOnline Social Networks Social Advertising tf-idf Profile Matching.Term (time)Computer Science - Information RetrievalSet (abstract data type)Computer Science - DatabasesOrder (business)Computer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)Social mediabusinessRepresentation (mathematics)Information Retrieval (cs.IR)
researchProduct

Clique Percolation Method: Memory Efficient Almost Exact Communities

2022

Automatic detection of relevant groups of nodes in large real-world graphs, i.e. community detection, has applications in many fields and has received a lot of attention in the last twenty years. The most popular method designed to find overlapping communities (where a node can belong to several communities) is perhaps the clique percolation method (CPM). This method formalizes the notion of community as a maximal union of $k$-cliques that can be reached from each other through a series of adjacent $k$-cliques, where two cliques are adjacent if and only if they overlap on $k-1$ nodes. Despite much effort CPM has not been scalable to large graphs for medium values of $k$. Recent work has sho…

Social and Information Networks (cs.SI)FOS: Computer and information sciencesPhysics - Physics and Society[INFO.INFO-SI] Computer Science [cs]/Social and Information Networks [cs.SI][PHYS.PHYS.PHYS-SOC-PH]Physics [physics]/Physics [physics]/Physics and Society [physics.soc-ph][INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]FOS: Physical sciences[INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS]Computer Science - Social and Information NetworksPhysics and Society (physics.soc-ph)[INFO.INFO-SI]Computer Science [cs]/Social and Information Networks [cs.SI]Computer Science - Information Retrieval[PHYS.PHYS.PHYS-SOC-PH] Physics [physics]/Physics [physics]/Physics and Society [physics.soc-ph][INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]Computer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)[INFO.INFO-IR] Computer Science [cs]/Information Retrieval [cs.IR]Information Retrieval (cs.IR)MathematicsofComputing_DISCRETEMATHEMATICS
researchProduct

Special factors and the combinatorics of suffix and factor automata

2011

AbstractThe suffix automaton (resp. factor automaton) of a finite word w is the minimal deterministic automaton recognizing the set of suffixes (resp. factors) of w. We study the relationships between the structure of the suffix and factor automata and classical combinatorial parameters related to the special factors of w. We derive formulae for the number of states of these automata. We also characterize the languages LSA and LFA of words having respectively suffix automaton and factor automaton with the minimal possible number of states.

Special factorGeneral Computer ScienceSpecial factorsFactor automatonBüchi automatonω-automatonTheoretical Computer ScienceCombinatoricsDeterministic automatonTwo-way deterministic finite automatonNondeterministic finite automatonComputer Science::Data Structures and AlgorithmsCombinatorics on wordStandard Sturmian wordsMathematicsDiscrete mathematicsCombinatorics on wordsDAWGPushdown automatonComputer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)Nonlinear Sciences::Cellular Automata and Lattice GasesSuffix automatonProbabilistic automatonSuffix automatonComputer Science::Formal Languages and Automata TheoryComputer Science(all)Theoretical Computer Science
researchProduct

Adaptive reference-free compression of sequence quality scores

2014

Motivation: Rapid technological progress in DNA sequencing has stimulated interest in compressing the vast datasets that are now routinely produced. Relatively little attention has been paid to compressing the quality scores that are assigned to each sequence, even though these scores may be harder to compress than the sequences themselves. By aggregating a set of reads into a compressed index, we find that the majority of bases can be predicted from the sequence of bases that are adjacent to them and hence are likely to be less informative for variant calling or other applications. The quality scores for such bases are aggressively compressed, leaving a relatively small number at full reso…

Statistics and ProbabilityFOS: Computer and information sciencesComputer sciencemedia_common.quotation_subjectReference-freecomputer.software_genreBiochemistryDNA sequencingSet (abstract data type)Redundancy (information theory)BWTComputer Science - Data Structures and AlgorithmsCode (cryptography)AnimalsHumansQuality (business)Data Structures and Algorithms (cs.DS)Quantitative Biology - GenomicsCaenorhabditis elegansMolecular Biologymedia_commonGenomics (q-bio.GN)SequenceGenomeSettore INF/01 - Informaticareference-free compressionHigh-Throughput Nucleotide SequencingGenomicsSequence Analysis DNAData CompressioncompressionComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicsFOS: Biological sciencesData miningquality scoreMetagenomicscomputerBWT; compression; quality score; reference-free compressionAlgorithmsReference genome
researchProduct

Repetitiveness Measures based on String Attractors and Burrows-Wheeler Transform: Properties and Applications

2023

String AttractorSettore INF/01 - InformaticaMeasure of repetitiveneBurrows-Wheeler TransformCompressed Data StructuresData CompressionCombinatorics on WordStringology
researchProduct

SimpleBIM: From full ifcOWL graphs to simplified building graphs

2016

International audience; Recent research in semantic web technologies for the built environment has resulted in several proposals to further improve information exchange among stakeholders from the domain. Most notable is the production of several OWL ontologies that allow to capture building data in RDF graphs. For example, an ifcOWL ontology allows to capture IFC data in an RDF graph. As the building data is now available in a semantic graph with an explicit formal basis, it can be restructured and simplified so that it more easily matches the different requirements associated with practical use case scenarios. In this paper, we investigate several proposals and technological approaches to…

Technology and Engineeringbuilding data[INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL][INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]ifcOWL[INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS][ INFO.INFO-CL ] Computer Science [cs]/Computation and Language [cs.CL]IFCBIMlinked data[ INFO.INFO-DS ] Computer Science [cs]/Data Structures and Algorithms [cs.DS][INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]OWL
researchProduct

Dictionary-symbolwise flexible parsing

2012

AbstractLinear-time optimal parsing algorithms are rare in the dictionary-based branch of the data compression theory. A recent result is the Flexible Parsing algorithm of Matias and Sahinalp (1999) that works when the dictionary is prefix closed and the encoding of dictionary pointers has a constant cost. We present the Dictionary-Symbolwise Flexible Parsing algorithm that is optimal for prefix-closed dictionaries and any symbolwise compressor under some natural hypothesis. In the case of LZ78-like algorithms with variable costs and any, linear as usual, symbolwise compressor we show how to implement our parsing algorithm in linear time. In the case of LZ77-like dictionaries and any symbol…

Theoretical computer scienceComputer science[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS][INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS]Data_CODINGANDINFORMATIONTHEORY0102 computer and information sciences02 engineering and technologycomputer.software_genre01 natural sciencesDirected acyclic graphTheoretical Computer ScienceConstant (computer programming)020204 information systemsEncoding (memory)Optimal parsing0202 electrical engineering electronic engineering information engineeringDiscrete Mathematics and CombinatoricsStringologySymbolwise text compressionTime complexityLossless compressionParsingSettore INF/01 - InformaticaDictionary-based compressionOptimal Parsing Lossless Data Compression DAGDirected acyclic graphPrefixComputational Theory and MathematicsText compression010201 computation theory & mathematicsAlgorithmcomputerBottom-up parsingData compressionJournal of Discrete Algorithms
researchProduct