Search results for "Computer Science - Data Structures and Algorithms"

showing 4 items of 64 documents

Identifying the k Best Targets for an Advertisement Campaign via Online Social Networks

2020

We propose a novel approach for the recommendation of possible customers (users) to advertisers (e.g., brands) based on two main aspects: (i) the comparison between On-line Social Network profiles, and (ii) neighborhood analysis on the On-line Social Network. Profile matching between users and brands is considered based on bag-of-words representation of textual contents coming from the social media, and measures such as the Term Frequency-Inverse Document Frequency are used in order to characterize the importance of words in the comparison. The approach has been implemented relying on Big Data Technologies, allowing this way the efficient analysis of very large Online Social Networks. Resul…

Social and Information Networks (cs.SI)FOS: Computer and information sciencesMatching (statistics)Social networkSettore INF/01 - Informaticabusiness.industryComputer scienceBig dataDatabases (cs.DB)AdvertisingComputer Science - Social and Information NetworksOnline Social Networks Social Advertising tf-idf Profile Matching.Term (time)Computer Science - Information RetrievalSet (abstract data type)Computer Science - DatabasesOrder (business)Computer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)Social mediabusinessRepresentation (mathematics)Information Retrieval (cs.IR)
researchProduct

Clique Percolation Method: Memory Efficient Almost Exact Communities

2022

Automatic detection of relevant groups of nodes in large real-world graphs, i.e. community detection, has applications in many fields and has received a lot of attention in the last twenty years. The most popular method designed to find overlapping communities (where a node can belong to several communities) is perhaps the clique percolation method (CPM). This method formalizes the notion of community as a maximal union of $k$-cliques that can be reached from each other through a series of adjacent $k$-cliques, where two cliques are adjacent if and only if they overlap on $k-1$ nodes. Despite much effort CPM has not been scalable to large graphs for medium values of $k$. Recent work has sho…

Social and Information Networks (cs.SI)FOS: Computer and information sciencesPhysics - Physics and Society[INFO.INFO-SI] Computer Science [cs]/Social and Information Networks [cs.SI][PHYS.PHYS.PHYS-SOC-PH]Physics [physics]/Physics [physics]/Physics and Society [physics.soc-ph][INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]FOS: Physical sciences[INFO.INFO-DS] Computer Science [cs]/Data Structures and Algorithms [cs.DS]Computer Science - Social and Information NetworksPhysics and Society (physics.soc-ph)[INFO.INFO-SI]Computer Science [cs]/Social and Information Networks [cs.SI]Computer Science - Information Retrieval[PHYS.PHYS.PHYS-SOC-PH] Physics [physics]/Physics [physics]/Physics and Society [physics.soc-ph][INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]Computer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)[INFO.INFO-IR] Computer Science [cs]/Information Retrieval [cs.IR]Information Retrieval (cs.IR)MathematicsofComputing_DISCRETEMATHEMATICS
researchProduct

Adaptive reference-free compression of sequence quality scores

2014

Motivation: Rapid technological progress in DNA sequencing has stimulated interest in compressing the vast datasets that are now routinely produced. Relatively little attention has been paid to compressing the quality scores that are assigned to each sequence, even though these scores may be harder to compress than the sequences themselves. By aggregating a set of reads into a compressed index, we find that the majority of bases can be predicted from the sequence of bases that are adjacent to them and hence are likely to be less informative for variant calling or other applications. The quality scores for such bases are aggressively compressed, leaving a relatively small number at full reso…

Statistics and ProbabilityFOS: Computer and information sciencesComputer sciencemedia_common.quotation_subjectReference-freecomputer.software_genreBiochemistryDNA sequencingSet (abstract data type)Redundancy (information theory)BWTComputer Science - Data Structures and AlgorithmsCode (cryptography)AnimalsHumansQuality (business)Data Structures and Algorithms (cs.DS)Quantitative Biology - GenomicsCaenorhabditis elegansMolecular Biologymedia_commonGenomics (q-bio.GN)SequenceGenomeSettore INF/01 - Informaticareference-free compressionHigh-Throughput Nucleotide SequencingGenomicsSequence Analysis DNAData CompressioncompressionComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicsFOS: Biological sciencesData miningquality scoreMetagenomicscomputerBWT; compression; quality score; reference-free compressionAlgorithmsReference genome
researchProduct

Lightweight LCP construction for next-generation sequencing datasets

2012

The advent of "next-generation" DNA sequencing (NGS) technologies has meant that collections of hundreds of millions of DNA sequences are now commonplace in bioinformatics. Knowing the longest common prefix array (LCP) of such a collection would facilitate the rapid computation of maximal exact matches, shortest unique substrings and shortest absent words. CPU-efficient algorithms for computing the LCP of a string have been described in the literature, but require the presence in RAM of large data structures. This prevents such methods from being feasible for NGS datasets. In this paper we propose the first lightweight method that simultaneously computes, via sequential scans, the LCP and B…

Whole genome sequencingGenomics (q-bio.GN)FOS: Computer and information sciencesSequenceBWT; LCP; next-generation sequencing datasetsBWT LCP text indexes next-generation sequencing datasets massive datasetsSettore INF/01 - InformaticaComputer scienceComputationString (computer science)LCP arrayParallel computingData structureDNA sequencingSubstringBWTLCPFOS: Biological sciencesComputer Science - Data Structures and AlgorithmsQuantitative Biology - GenomicsData Structures and Algorithms (cs.DS)next-generation sequencing datasets
researchProduct