Search results for "search engine"

showing 10 items of 121 documents

Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform

2012

Motivation The Burrows-Wheeler transform (BWT) is the foundation of many algorithms for compression and indexing of text data, but the cost of computing the BWT of very large string collections has prevented these techniques from being widely applied to the large sets of sequences often encountered as the outcome of DNA sequencing experiments. In previous work, we presented a novel algorithm that allows the BWT of human genome scale data to be computed on very moderate hardware, thus enabling us to investigate the BWT as a tool for the compression of such datasets. Results We first used simulated reads to explore the relationship between the level of compression and the error rate, the leng…

FOS: Computer and information sciencesStatistics and ProbabilityBurrows–Wheeler transformComputer scienceData_CODINGANDINFORMATIONTHEORYBurrows-Wheeler transformcomputer.software_genreBiochemistryBurrows-Wheeler transform; Data Compression; Next-generation sequencingComputer Science - Data Structures and AlgorithmsEscherichia coliCode (cryptography)HumansOverhead (computing)Data Structures and Algorithms (cs.DS)Computer SimulationQuantitative Biology - GenomicsMolecular BiologyGenomics (q-bio.GN)Genome HumanString (computer science)Search engine indexingSortingGenomicsSequence Analysis DNAConstruct (python library)Data CompressionComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicsFOS: Biological sciencesNext-generation sequencingData miningDatabases Nucleic AcidcomputerAlgorithmsData compression
researchProduct

CitySearcher: A City Search Engine For Interests

2017

We introduce CitySearcher, a vertical search engine that searches for cities when queried for an interest. Generally in search engines, utilization of semantics between words is favorable for performance improvement. Even though ambiguous query words have multiple semantic meanings, search engines can return diversified results to satisfy different users' information needs. But for CitySearcher, mismatched semantic relationships can lead to extremely unsatisfactory results. For example, the city Sale would incorrectly rank high for the interest shopping because of semantic interpretations of the words. Thus in our system, the main challenge is to eliminate the mismatched semantic relationsh…

Feature engineeringWord embeddingkaupungitComputer scienceInformation needs02 engineering and technologysemanttinen webSemanticscomputer.software_genresearch enginesSearch enginesemantic web020204 information systems0202 electrical engineering electronic engineering information engineeringhakuohjelmatWord2vectowns and citiesta113Information retrievalbusiness.industryRank (computer programming)Semantic searchsuosittelujärjestelmätVertical search020201 artificial intelligence & image processingLearning to rankArtificial intelligencerecommender systemsbusinesscomputerNatural language processing
researchProduct

Sorted deduplication: How to process thousands of backup streams

2016

The requirements of deduplication systems have changed in the last years. Early deduplication systems had to process dozens to hundreds of backup streams at the same time while today they are able to process hundreds to thousands of them. Traditional approaches rely on stream-locality, which supports parallelism, but which easily leads to many non-contiguous disk accesses, as each stream competes with all other streams for the available resources. This paper presents a new exact deduplication approach designed for processing thousands of backup streams at the same time on the same fingerprint index. The underlying approach destroys the traditionally exploited temporal chunk locality and cre…

File system020203 distributed computingComputer scienceData domainFingerprint (computing)Search engine indexingSorting020206 networking & telecommunications02 engineering and technologyParallel computingcomputer.software_genreBackupServerData_FILES0202 electrical engineering electronic engineering information engineeringData deduplicationcomputer2016 32nd Symposium on Mass Storage Systems and Technologies (MSST)
researchProduct

A two-armed bandit collective for hierarchical examplar based mining of frequent itemsets with applications to intrusion detection

2014

Published version of a chapter in the book: Transactions on Computational Collective Intelligence XIV. Also available from the publisher at: http://dx.doi.org/10.1007/978-3-662-44509-9_1 In this paper we address the above problem by posing frequent item-set mining as a collection of interrelated two-armed bandit problems. We seek to find itemsets that frequently appear as subsets in a stream of itemsets, with the frequency being constrained to support granularity requirements. Starting from a randomly or manually selected examplar itemset, a collective of Tsetlin automata based two-armed bandit players - one automaton for each item in the examplar - learns which items should be included in …

Finite-state machineVDP::Technology: 500::Information and communication technology: 550::Computer technology: 551Computational complexity theoryData stream miningComputer scienceNearest neighbor searchSearch engine indexingInformationSystems_DATABASEMANAGEMENTIntrusion detection systemcomputer.software_genreCardinalityAnomaly detectionData miningcomputer
researchProduct

Hardware implementation of content based video indexing algorithms

2005

This paper focus on hardware implementation of content based video indexing techniques by using the FPGA technology. We aim to propose hardware modules that can satisfy requirements of constrained applications, such as real time applications and complex applications that can combine a large number of techniques in the same indexing system. We represent tow examples of micro-architectures related to the dominant colors descriptor and the compact color descriptor.

Focus (computing)Hardware modulesbusiness.industryComputer scienceContent (measure theory)Search engine indexingComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISIONbusinessField-programmable gate arrayComputer hardwareMicroarchitectureContent based retrieval2005 12th IEEE International Conference on Electronics, Circuits and Systems
researchProduct

Quality of Service Management on Multimedia Data Transformation into Serial Stories Using Movement Oriented Method

2011

Multimedia data transformation into serial stories or story board will help to reduce the consumption of storage media, indexing, sorting and searching system. Movement Oriented Method that is being developed changes the form of multimedia data into serial stories. Movement Oriented Method depends on the knowledge each actor who uses it. Different knowledge of each actor in the transformation process raises complex issues, such as the sequence, and the resulted story object that could become the standard. And the most fatal could be, the resulted stories does not same with the original multimedia data. To solve it, the Standard Level Knowledge (SLK) in maintaining the quality of the story c…

General Computer ScienceMultimediaProcess (engineering)Computer scienceQuality of servicemedia_common.quotation_subjectSearch engine indexingData transformationObject (computer science)computer.software_genreTransformation (function)Quality (business)computermedia_commonInternational Journal of Advanced Computer Science and Applications
researchProduct

Comparing DNA sequence collections by direct comparison of compressed text indexes

2012

Popular sequence alignment tools such as BWA convert a reference genome to an indexing data structure based on the Burrows-Wheeler Transform (BWT), from which matches to individual query sequences can be rapidly determined. However the utility of also indexing the query sequences themselves remains relatively unexplored. Here we show that an all-against-all comparison of two sequence collections can be computed from the BWT of each collection with the BWTs held entirely in external memory, i.e. on disk and not in RAM. As an application of this technique, we show that BWTs of transcriptomic and genomic reads can be compared to obtain reference-free predictions of splice junctions that have h…

Genomics (q-bio.GN)SequenceComputer sciencebusiness.industrySearch engine indexingSequence alignmentPattern recognitionConstruct (python library)Data structureBurrows-Wheeler Transform; Splice junctions; External memoryExternal memoryFOS: Biological sciencesCode (cryptography)Quantitative Biology - GenomicsBurrows-Wheeler TransformArtificial intelligencebusinessSplice junctionsAuxiliary memoryReference genome
researchProduct

Facilitating Access to Health Web Pages with Different Language Complexity Levels

2019

The number of people looking for health information on the Internet is constantly growing. When searching for health information, different types of users, such as patients, clinicians or medical researchers, have different needs and should easily find the information they are looking for based on their specific requirements. However, generic search engines do not make any distinction among the users and, often, overload them with the provided amount of information. On the other hand, specific search engines mostly work on medical literature and specialized web sites are often not free and contain focused information built by hand. This paper presents a method to facilitate the search of he…

Health Information Seeking020205 medical informaticsComputer science02 engineering and technologyUser requirements documentUser RequirementsWorld Wide Web03 medical and health sciencesSearch engine0302 clinical medicineStructured Data on the WebWeb page0202 electrical engineering electronic engineering information engineeringInformation retrievale-Health; Health Information Seeking; User Requirements; Language Complexity; Structured Data on the Web030212 general & internal medicineLanguage complexitySettore INF/01 - Informaticabusiness.industryWorld Wide WebLanguage ComplexityWork (electrical)HealthThe InternetE-HealthHealth informationbusinessMedical literature
researchProduct

Today Is My Day: Analysis of the Awareness Campaigns’ Impact on Functional Diversity in the Press, on Google, and on Twitter

2021

(1) Every day, people with functional diversity face different kinds of difficulties that pose a barrier to their social inclusion. These difficulties often go unnoticed by most citizens. Social networks are a powerful tool to sensitize the population. With this objective, different organizations such as associations, federations, foundations, and other institutions have promoted campaigns through the celebration of world days for different types of functional diversity. This research aims to monitor and analyze the impact of these social campaigns in Spain, including Asperger’s syndrome, rare diseases, Down syndrome, autism, hearing and visual impairment, cerebral palsy, dyslexia, ADHD, sp…

Health Toxicology and MutagenesisVisual impairmentPopulationInternet privacyTwitterFace (sociological concept)050801 communication & media studiesArticleXarxes socials03 medical and health sciencesFunctional diversity0302 clinical medicine0508 media and communicationsmedicinepressHumans030212 general & internal medicineAutistic DisordereducationAdaptive behavioreducation.field_of_studyawareness campaignbusiness.industry05 social sciencesmediaPublic Health Environmental and Occupational HealthDyslexiaRmedicine.diseasefunctional diversityGoogleSearch EngineSpainDyscalculiaAutismMedicinemedicine.symptomPsychologybusinessSocial MediaInternational Journal of Environmental Research and Public Health
researchProduct

Indización y uso de los Descriptores MeSH en Hospitalización a Domicilio

2017

Objetivo: Analizar la utilización de los Descriptores, como Major Topic, en la indización de los artículos sobre Hospitalización Domiciliaria en la base de datos MEDLINE.Método: Estudio descriptivo transversal de los registros de indización recogidos en la base de datos MEDLINE (vía PubMed) hasta 2016. El término utilizado, como descriptor principal para la búsqueda fue «Home Care Services, Hospital-Based».El método de muestreo fue la aleatorización simple sin reemplazo, tomando como base el número total de referencias obtenidas (tamaño muestral 386).Resultados: Se observaron diferencias significativas en la utilización de los Descriptores asociados a hospitalización a domicilio. La compara…

Home hospitalizationTelemedicineInformation retrievalGeographySample size determinationSearch engine indexingMeSH DescriptorsSubject (documents)Medline databaseCartographyTerm (time)Hospital a Domicilio
researchProduct