Search results for " retrieval."

showing 10 items of 1102 documents

Multilingual Clustering of Streaming News

2018

Clustering news across languages enables efficient media monitoring by aggregating articles from multilingual sources into coherent stories. Doing so in an online setting allows scalable processing of massive news streams. To this end, we describe a novel method for clustering an incoming stream of multilingual documents into monolingual and crosslingual story clusters. Unlike typical clustering approaches that consider a small and known number of labels, we tackle the problem of discovering an ever growing number of cluster labels in an online fashion, using real news datasets in multiple languages. Our method is simple to implement, computationally efficient and produces state-of-the-art …

FOS: Computer and information sciencesComputer Science - Computation and LanguageInformation retrievalComputer scienceInformationSystems_INFORMATIONSTORAGEANDRETRIEVAL02 engineering and technologyClusteringMedia MonitoringComputer Science - Information RetrievalComputingMethodologies_PATTERNRECOGNITIONMultilingual Methods0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingCluster analysisComputation and Language (cs.CL)Information Retrieval (cs.IR)
researchProduct

Investigating label suggestions for opinion mining in German Covid-19 social media

2021

This work investigates the use of interactively updated label suggestions to improve upon the efficiency of gathering annotations on the task of opinion mining in German Covid-19 social media data. We develop guidelines to conduct a controlled annotation study with social science students and find that suggestions from a model trained on a small, expert-annotated dataset already lead to a substantial improvement - in terms of inter-annotator agreement(+.14 Fleiss' $\kappa$) and annotation quality - compared to students that do not receive any label suggestions. We further find that label suggestions from interactively trained models do not lead to an improvement over suggestions from a stat…

FOS: Computer and information sciencesComputer Science - Computation and LanguageInformation retrievalCoronavirus disease 2019 (COVID-19)Computer sciencemedia_common.quotation_subjectSentiment analysislanguage.human_languageTask (project management)GermanAnnotationlanguageQuality (business)Social mediaTransfer of learningComputation and Language (cs.CL)media_common
researchProduct

Untrue.News: A New Search Engine For Fake Stories

2020

In this paper, we demonstrate Untrue News, a new search engine for fake stories. Untrue News is easy to use and offers useful features such as: a) a multi-language option combining fake stories from different countries and languages around the same subject or person; b) an user privacy protector, avoiding the filter bubble by employing a bias-free ranking scheme; and c) a collaborative platform that fosters the development of new tools for fighting disinformation. Untrue News relies on Elasticsearch, a new scalable analytic search engine based on the Lucene library that provides near real-time results. We demonstrate two key scenarios: the first related to a politician - looking how the cat…

FOS: Computer and information sciencesComputer Science - Computers and SocietyComputers and Society (cs.CY)Information Retrieval (cs.IR)Computer Science - Information Retrieval
researchProduct

Focusing Knowledge-based Graph Argument Mining via Topic Modeling

2021

Decision-making usually takes five steps: identifying the problem, collecting data, extracting evidence, identifying pro and con arguments, and making decisions. Focusing on extracting evidence, this paper presents a hybrid model that combines latent Dirichlet allocation and word embeddings to obtain external knowledge from structured and unstructured data. We study the task of sentence-level argument mining, as arguments mostly require some degree of world knowledge to be identified and understood. Given a topic and a sentence, the goal is to classify whether a sentence represents an argument in regard to the topic. We use a topic model to extract topic- and sentence-specific evidence from…

FOS: Computer and information sciencesComputer Science - Machine LearningArtificial Intelligence (cs.AI)Computer Science - Artificial IntelligenceInformation Retrieval (cs.IR)Computer Science - Information RetrievalMachine Learning (cs.LG)
researchProduct

Combining a Context Aware Neural Network with a Denoising Autoencoder for Measuring String Similarities

2018

Measuring similarities between strings is central for many established and fast growing research areas including information retrieval, biology, and natural language processing. The traditional approach for string similarity measurements is to define a metric over a word space that quantifies and sums up the differences between characters in two strings. The state-of-the-art in the area has, surprisingly, not evolved much during the last few decades. The majority of the metrics are based on a simple comparison between character and character distributions without consideration for the context of the words. This paper proposes a string metric that encompasses similarities between strings bas…

FOS: Computer and information sciencesComputer Science - Machine LearningArtificial Intelligence (cs.AI)Computer Science - Computation and LanguageComputer Science - Artificial IntelligenceComputation and Language (cs.CL)Information Retrieval (cs.IR)Machine Learning (cs.LG)Computer Science - Information Retrieval
researchProduct

Transfer Learning with Convolutional Networks for Atmospheric Parameter Retrieval

2018

The Infrared Atmospheric Sounding Interferometer (IASI) on board the MetOp satellite series provides important measurements for Numerical Weather Prediction (NWP). Retrieving accurate atmospheric parameters from the raw data provided by IASI is a large challenge, but necessary in order to use the data in NWP models. Statistical models performance is compromised because of the extremely high spectral dimensionality and the high number of variables to be predicted simultaneously across the atmospheric column. All this poses a challenge for selecting and studying optimal models and processing schemes. Earlier work has shown non-linear models such as kernel methods and neural networks perform w…

FOS: Computer and information sciencesComputer Science - Machine LearningComputer scienceFeature extraction0211 other engineering and technologiesTranfer learningFOS: Physical sciences02 engineering and technologyAtmospheric modelInfrared atmospheric sounding interferometercomputer.software_genreConvolutional neural networkMachine Learning (cs.LG)0202 electrical engineering electronic engineering information engineeringInfrared measurements021101 geological & geomatics engineeringArtificial neural networkStatistical modelNumerical weather predictionParameter retrievalPhysics - Atmospheric and Oceanic PhysicsKernel method13. Climate actionAtmospheric and Oceanic Physics (physics.ao-ph)Convolutional neural networks020201 artificial intelligence & image processingData miningcomputerCurse of dimensionalityIGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium
researchProduct

Do-search -- a tool for causal inference and study design with multiple data sources

2020

Epidemiologic evidence is based on multiple data sources including clinical trials, cohort studies, surveys, registries, and expert opinions. Merging information from different sources opens up new possibilities for the estimation of causal effects. We show how causal effects can be identified and estimated by combining experiments and observations in real and realistic scenarios. As a new tool, we present do-search, a recently developed algorithmic approach that can determine the identifiability of a causal effect. The approach is based on do-calculus, and it can utilize data with nontrivial missing data and selection bias mechanisms. When the effect is identifiable, do-search outputs an i…

FOS: Computer and information sciencesEpidemiologyComputer sciencemedia_common.quotation_subjectInformation Storage and RetrievalMachine learningcomputer.software_genre01 natural sciencesStatistics - ApplicationsMethodology (stat.ME)010104 statistics & probability03 medical and health sciences0302 clinical medicineHumansApplications (stat.AP)030212 general & internal medicine0101 mathematicsSalt intakeStatistics - Methodologymedia_commonSelection biasbusiness.industryNutrition SurveysMissing dataCausalityCausalityResearch DesignCausal inferenceMeta-analysisSurvey data collectionIdentifiabilityArtificial intelligencebusinesscomputer
researchProduct

Open Data Quality Evaluation: A Comparative Analysis of Open Data in Latvia

2020

Nowadays open data is entering the mainstream - it is free available for every stakeholder and is often used in business decision-making. It is important to be sure data is trustable and error-free as its quality problems can lead to huge losses. The research discusses how (open) data quality could be assessed. It also covers main points which should be considered developing a data quality management solution. One specific approach is applied to several Latvian open data sets. The research provides a step-by-step open data sets analysis guide and summarizes its results. It is also shown there could exist differences in data quality depending on data supplier (centralized and decentralized d…

FOS: Computer and information sciencesGeneral Computer ScienceComputer sciencemedia_common.quotation_subjectStakeholderLatvianDatabases (cs.DB)Statistics - ApplicationsStatistics - Computationlanguage.human_languageComputer Science - Information RetrievalComputer Science - Computers and SocietyOpen dataLead (geology)Computer Science - DatabasesRisk analysis (engineering)Data qualityComputers and Society (cs.CY)languageMainstreamQuality (business)Applications (stat.AP)Information Retrieval (cs.IR)Computation (stat.CO)media_common
researchProduct

Weakly Supervised Object Detection in Artworks

2018

We propose a method for the weakly supervised detection of objects in paintings. At training time, only image-level annotations are needed. This, combined with the efficiency of our multiple-instance learning method, enables one to learn new classes on-the-fly from globally annotated databases, avoiding the tedious task of manually marking objects. We show on several databases that dropping the instance-level annotations only yields mild performance losses. We also introduce a new database, IconArt, on which we perform detection experiments on classes that could not be learned on photographs, such as Jesus Child or Saint Sebastian. To the best of our knowledge, these are the first experimen…

FOS: Computer and information sciencesInformation retrievalComputer scienceComputer Vision and Pattern Recognition (cs.CV)Computer Science - Computer Vision and Pattern Recognition[INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV]020207 software engineering02 engineering and technologyObject detectionTask (project management)Art HistoryDeep LearningWeakly Supervised Learning0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processing
researchProduct

Multiscale Information Decomposition: Exact Computation for Multivariate Gaussian Processes

2017

Exploiting the theory of state space models, we derive the exact expressions of the information transfer, as well as redundant and synergistic transfer, for coupled Gaussian processes observed at multiple temporal scales. All of the terms, constituting the frameworks known as interaction information decomposition and partial information decomposition, can thus be analytically obtained for different time scales from the parameters of the VAR model that fits the processes. We report the application of the proposed methodology firstly to benchmark Gaussian systems, showing that this class of systems may generate patterns of information decomposition characterized by prevalently redundant or sy…

FOS: Computer and information sciencesInformation transferComputer scienceGaussianSocial SciencesGeneral Physics and AstronomyInformation theory01 natural sciences010305 fluids & plasmasState spaceStatistical physicslcsh:Scienceinformation theorymultiscale entropylcsh:QC1-999Interaction informationMathematics and Statisticssymbolsinformation dynamicsInformation dynamics; Information transfer; Multiscale entropy; Multivariate time series analysis; Redundancy and synergy; State space models; Vector autoregressive models; Physics and Astronomy (all)information dynamics; information transfer; multiscale entropy; multivariate time series analysis; redundancy and synergy; state space models; vector autoregressive modelsMultivariate time series analysiMathematics - Statistics Theorylcsh:AstrophysicsStatistics Theory (math.ST)Statistics - ApplicationsMethodology (stat.ME)symbols.namesakePhysics and Astronomy (all)0103 physical scienceslcsh:QB460-466FOS: Mathematicsinformation transferRelevance (information retrieval)Applications (stat.AP)Transfer Entropy010306 general physicsGaussian processStatistics - MethodologyState space modelstate space modelsmultivariate time series analysisredundancy and synergyvector autoregressive modelsInformation dynamicVector autoregressive modelSettore ING-INF/06 - Bioingegneria Elettronica E InformaticaTransfer entropylcsh:Qlcsh:PhysicsEntropy
researchProduct