Search results for "indexing"
showing 10 items of 94 documents
A systematic analysis of duplicate records in Scopus
2015
In recent years, the Web of Science Core Collection and Scopus databases have become primary sources for conducting studies that evaluate scientific investigations. Such studies require that duplicate records be excluded to avoid errors of overrepresentation. In this line, we identify duplicate records in Scopus and examine their origins. Identifying journals with duplicate records in Scopus, selecting and downloading bibliographic journal records, and identifying and analyzing the duplicate records is the methodology adopted. Duplicate records are found when articles published in a journal are incorrectly mapped by Scopus to this journal and to a different journal from the same publisher a…
A Novel Approach to Improve the Accuracy of Web Retrieval
2010
General purpose search engines utilize a very simple view on text documents: They consider them as bags of words. It results that after indexing, the semantics of documents is lost. In this paper, we introduce a novel approach to improve the accuracy of Web retrieval. We utilize the WordNet and WordNet SenseRelate All Words Software as main tools to preserve the semantics of the sentences of documents and user queries. Nouns and verbs in the WordNet are organized in the tree hierarchies. The word meanings are presented by numbers that reference to the nodes on the semantic tree. The meaning of each word in the sentence is calculated when the sentence is analyzed. The goal is to put each nou…
Semantic retrieval: an approach to representing, searching and summarising text documents
2011
Nowadays, the internet is the major source of information for millions of people. There are many search tools available on the net but finding appropriate text information is still difficult. The retrieval efficiency of the presently used systems cannot be significantly improved: ‘bag of words’ interpretation causes losing semantics of texts. We applied the functional approach to represent English text documents. It allows taking into account semantic relations between words when indexing documents and use ordinary English sentences as queries to a search engine. The proposed retrieval mechanisms return only highly relevant documents. They make it possible to generate content-aware summarie…
Automatic building of a visual interface for content-based multiresolution retrieval of paleontology images
2001
In this article we present research work in the field of content-based image retrieval in large databases applied to the paleontology image database of the Universite´ de Bourgogne, Dijon, France, called ‘‘TRANS’TYFIPAL.’’ Our indexing method is based on multiresolution decomposition of database images using wavelets. For each family of paleontology images we try to find a model image that represents it. The K-means automatic classification algorithm divides the space of parameters into several clusters. A model image for each cluster is computed from the wavelet transform of each image of the cluster. Then a search tree is built to offer users a graphic interface for retrieving images. So …
Novel Indexing Method of Relations Between Salient Objects
2011
Since the last decade, images have been integrated into several application domains such as GIS, medicine, etc. This integration necessitates new managing methods particularly in image retrieval. Queries should be formulated using different types of features such as low-level features of images (histograms, color distribution, etc.), spatial and temporal relations between salient objects, semantic features, etc. In this chapter, we propose a novel method for identifying and indexing several types of relations between salient objects. Spatial relations are used here to show how our method can provide high expressive power to relations in comparison to the traditional methods.
Large Scale Knowledge Matching with Balanced Efficiency-Effectiveness Using LSH Forest
2017
Evolving Knowledge Ecosystems were proposed to approach the Big Data challenge, following the hypothesis that knowledge evolves in a way similar to biological systems. Therefore, the inner working of the knowledge ecosystem can be spotted from natural evolution. An evolving knowledge ecosystem consists of Knowledge Organisms, which form a representation of the knowledge, and the environment in which they reside. The environment consists of contexts, which are composed of so-called knowledge tokens. These tokens are ontological fragments extracted from information tokens, in turn, which originate from the streams of information flowing into the ecosystem. In this article we investigate the u…
Multimedia Retrieval by Means of Merge of Results from Textual and Content Based Retrieval Subsystems
2010
The main goal of this paper it is to present our experiments in ImageCLEF 2009 Campaign (photo retrieval task). In 2008 we proved empirically that the Text-based Image Retrieval (TBIR) methods defeats the Content-based Image Retrieval CBIR "quality" of results, so this time we developed several experiments in which the CBIR helps the TBIR. The TBIR System [6] main improvement is the named-entity sub-module. In case of the CBIR system [3] the number of low-level features has been increased from the 68 component used at ImageCLEF 2008 up to 114 components, and only the Mahalanobis distance has been used. We propose an ad-hoc management of the topics delivered, and the generation of XML struct…
Some Results Using Different Approaches to Merge Visual and Text-Based Features in CLEF’08 Photo Collection
2009
This paper describes the participation of the MIRACLE team at the ImageCLEF Photographic Retrieval task of CLEF 2008. We succeeded in submitting 41 runs. Obtained results from text-based retrieval are better than content-based as previous experiments in the MIRACLE team campaigns [5, 6] using different software. Our main aim was to experiment with several merging approaches to fuse text-based retrieval and content-based retrieval results, and it happened that we improve the text-based baseline when applying one of the three merging algorithms, although visual results are lower than textual ones.
Bibliotēka un sabiedrība, 3
2002
Krājumā ievietoti raksti, kuros aplūkoti jauni virzieni informācijas izplatībā un bibliotēku darbībā, kā arī informācijas sakārtošanā un apstrādē. Atsevišķos rakstos analizētas vēsturiskas parādības. Raksti izkārtoti četrās tematiskās iedaļās: "Grāmata vērtību sistēmā", "Bibliotēku vēstures pētījumi", "Publiskās bibliotēkas mūsdienu sabiedrībā" un "Informācijas avotu apstrādes problēmas".
Enhanced query processing for NoSQL crowdsourcing systems
2014
In this paper, we provide a novel approach for effectively and efficiently support query processing tasks in novel NoSQL crowdsourcing systems. The idea of our method is to exploit the social knowledge available from reviews about products of any kind, freely provided by customers through specialized web sites. We thus define a NoSQL database system for large collections of product reviews, where queries can be expressed in terms of natural language sentences whose answers are modeled as lists of products ranked based on the relevance of reviews w.r.t. the natural language sentences. The best ranked products in the result list can be seen as the best hints for the user based on crowd opinio…