Search results for "Indexing"
showing 10 items of 94 documents
Large Scale Knowledge Matching with Balanced Efficiency-Effectiveness Using LSH Forest
2017
Evolving Knowledge Ecosystems were proposed to approach the Big Data challenge, following the hypothesis that knowledge evolves in a way similar to biological systems. Therefore, the inner working of the knowledge ecosystem can be spotted from natural evolution. An evolving knowledge ecosystem consists of Knowledge Organisms, which form a representation of the knowledge, and the environment in which they reside. The environment consists of contexts, which are composed of so-called knowledge tokens. These tokens are ontological fragments extracted from information tokens, in turn, which originate from the streams of information flowing into the ecosystem. In this article we investigate the u…
Scientific abstracts and plain language summaries in psychology: A comparison based on readability indices.
2020
Findings from psychological research are usually difficult to interpret for non-experts. Yet, non-experts resort to psychological findings to inform their decisions (e.g., whether to seek a psychotherapeutic treatment or not). Thus, the communication of psychological research to non-expert audiences has received increasing attention over the last years. Plain language summaries (PLS) are abstracts of peer-reviewed journal articles that aim to explain the rationale, methods, findings, and interpretation of a scientific study to non-expert audiences using non-technical language. Unlike media articles or other forms of accessible research summaries, PLS are usually written by the authors of th…
The 100 most-cited articles in orthodontics: A bibliometric study
2018
ABSTRACT Objectives: To identify and analyze the 100 most-cited articles in orthodontics indexed in the Web of Science Category of “Dental, Oral Surgery and Medicine” from 1946 to 2016. Materials and Methods: On hundred articles were identified in a search of the database of the ISI Web of Science and Journal Citation Reports, applying the truncated search term “orthodon*.” Records were manually refined and normalized to unify terms and to remove typographical, transcription, and/or indexing errors. Results: The 100 most-cited articles were published between 1946 and 2012, with numbers of citations ranging from 115 to 881. Of the 251 authors participating, 87.65% published a single work, wh…
Indexing Multimedia Learning Materials in Ultimate Course Search
2016
International audience; Multimedia is the main support for online learning materials and the size of multimedia learning materials is growing with the popularity of online programs offered by Universities. Ultimate Course Search (UCS) is a tool that aims to provide efficient search of course materials. UCS integrates slides, lecture videos and textbook content into a single platform with search capabilities. The keywords extracted from the textbook index and the power-point slides are the basis of the indexing scheme. The slides are indexed on the keywords and the videos are indexed on the slides. The correspondence between the slides and video segments is established using the meta-data pr…
Languages with mismatches
2007
AbstractIn this paper we study some combinatorial properties of a class of languages that represent sets of words occurring in a text S up to some errors. More precisely, we consider sets of words that occur in a text S with k mismatches in any window of size r. The study of this class of languages mainly focuses both on a parameter, called repetition index, and on the set of the minimal forbidden words of the language of factors of S with errors. The repetition index of a string S is defined as the smallest integer such that all strings of this length occur at most in a unique position of the text S up to errors. We prove that there is a strong relation between the repetition index of S an…
Indexed Two-Dimensional String Matching
2016
Fauna Europaea: Hymenoptera - Apocrita (excl. Ichneumonoidea)
2015
Fauna Europaea provides a public web-service with an index of scientific names (including important synonyms) of all living European land and freshwater animals, their geographical distribution at country level (up to the Urals, excluding the Caucasus region), and some additional information. The Fauna Europaea project covers about 230,000 taxonomic names, including 130,000 accepted species and 14,000 accepted subspecies. This represents a huge effort by more than 400 contributing specialists throughout Europe and is a unique (standard) reference suitable for many users in science, government, industry, nature conservation and education. Hymenoptera is one of the four largest orders of inse…
Combining textual and visual cues for content-based image retrieval on the World Wide Web
2002
A system is proposed that combines textual and visual statistics in a single index vector for content-based search of a WWW image database. Textual statistics are captured in vector form using latent semantic indexing (LSI) based on text in the containing HTML document. Visual statistics are captured in vector form using color and orientation histograms. By using an integrated approach, it becomes possible to take advantage of possible statistical couplings between the content of the document (latent semantic content) and the contents of images (visual statistics). The combined approach allows improved performance in conducting content-based search. Search performance experiments are report…
Reverse-safe data structures for text indexing
2021
We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm which constructs a z-reverse-safe data structure that has size O(n) and answers pattern matching queries of length at most d optim…
Approximate Matching over Biological RDF Graphs
2012
In the last few years, the amount of biological interaction data discovered and stored in public databases (e.g., KEGG [2]) considerably increased. To this aim, RDF is a powerful representation for interactions (or pathways), since they can be modeled as directed graphs, often referred to as biological networks, where nodes represent cellular components and the (labeled or unlabeled) edges correspond to interactions among components. Often for a given organism some components are known to be linked by well studied interactions. Such groups of components are called modules and they can be represented by sub-graphs in the corresponding biological network model. At today, one of the most impor…