Search results for "InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL"
showing 10 items of 53 documents
Multilingual Clustering of Streaming News
2018
Clustering news across languages enables efficient media monitoring by aggregating articles from multilingual sources into coherent stories. Doing so in an online setting allows scalable processing of massive news streams. To this end, we describe a novel method for clustering an incoming stream of multilingual documents into monolingual and crosslingual story clusters. Unlike typical clustering approaches that consider a small and known number of labels, we tackle the problem of discovering an ever growing number of cluster labels in an online fashion, using real news datasets in multiple languages. Our method is simple to implement, computationally efficient and produces state-of-the-art …
PESI - a taxonomic backbone for Europe
2015
Reliable taxonomy underpins communication in all of biology, not least nature conservation and sustainable use of ecosystem resources. The flexibility of taxonomic interpretations, however, presents a serious challenge for end-users of taxonomic concepts. Users need standardised and continuously harmonised taxonomic reference systems, as well as highquality and complete taxonomic data sets, but these are generally lacking for nonspecialists. The solution is in dynamic, expertly curated web-based taxonomic tools. The Pan-European Species-directories Infrastructure (PESI) worked to solve this key issue by providing a taxonomic e-infrastructure for Europe. It strengthened the relevant social (…
Comparative Cytogenetics Allows the Reconstruction of Human Chromosome History: The Case of Human Chromosome 13
2019
Comparative cytogenetics permits the identification of human chromosomal homologies and rearrangements between species, allowing the reconstruction of the history of each human chromosome. The aim of this work is to review evolutionary aspects regarding human chromosome 13. Classic and molecular cytogenetics using comparative banding, chromosome painting, and bacterial artificial chromosome (BAC) mapping can help us formulate hypotheses about chromosome ancestral forms; more recently, sequence data have been integrated as well. Although it has been previously shown to be conserved when compared to the ancestral primate chromosome, it shows a degree of rearrangements in some primate taxa; fu…
kmcEx
2019
Memory-frugal and retrieval-efficient encoding of counted k-mers.
Keywords given by authors of scientific articles in database descriptors
2007
Behavior-based personalization in web search
2016
Personalized search approaches tailor search results to users' current interests, so as to help improve the likelihood of a user finding relevant documents for their query. Previous work on personalized search focuses on using the content of the user's query and of the documents clicked to model the user's preference. In this paper we focus on a different type of signal: We investigate the use of behavioral information for the purpose of search personalization. That is, we consider clicks and dwell time for reranking an initially retrieved list of documents. In particular, we (i) investigate the impact of distributions of users and queries on document reranking; (ii) estimate the relevance …
Semantic Portal for Legislative Information
2006
Semantic portals enabled by Semantic Web technologies have been suggested to provide a point of access to an integrated body of information about some domain. In the area of e-Government there are multiple possible domains for semantic portals, one of them being legislative work. In this paper we propose a semantic portal based on a rich metadata repository to support the retrieval of legislative information. The portal provides process oriented semantic browsing capabilities. A prototype of the portal has been implemented for the retrieval of Finnish legislative information.
A Novel Approach to Improve the Accuracy of Web Retrieval
2010
General purpose search engines utilize a very simple view on text documents: They consider them as bags of words. It results that after indexing, the semantics of documents is lost. In this paper, we introduce a novel approach to improve the accuracy of Web retrieval. We utilize the WordNet and WordNet SenseRelate All Words Software as main tools to preserve the semantics of the sentences of documents and user queries. Nouns and verbs in the WordNet are organized in the tree hierarchies. The word meanings are presented by numbers that reference to the nodes on the semantic tree. The meaning of each word in the sentence is calculated when the sentence is analyzed. The goal is to put each nou…
Using UDDI for Publishing Metadata of the Semantic Web
2006
Although UDDI does not provide support for semantic search, retrieval and storage, it is already accepted as an industrial standard and a huge number of services already store their service specifications in UDDI. Objective of this paper is to analyze possibilities and ways to use UDDI registry to allow utilization of meta-data encoded according to Semantic Web standards for semantic-based description, discovery and integration of web resources in the context of needs of two research projects: “Adaptive Services Grid” and “SmartResource”. We present an approach of mapping RDFS upper concepts to UDDI data model using tModel structure, which makes possible to store semantically annotated reso…
Part-of-speech labeling for Reuters database
2015
Even if the Vector Space Model used for document representation in information retrieval systems integrates a small quantity of knowledge it continues to be used due to its computational cost, speed execution and simplicity. We try to improve this document representation by adding some syntactic information such as the parts of speech. In this paper, we have evaluated three different tagging algorithms in order to select the most suitable tagger for using it to tag the Reuters dataset. In this work, we have evaluated the taggers using only five different parts of speech: noun, verb, adverb, adjective and others. We considered these particular tags being the most representative for describin…