6533b825fe1ef96bd128291d

RESEARCH PRODUCT

Towards A Twitter Observatory: A Multi-Paradigm Framework For Collecting, Storing And Analysing Tweets

Sergey KirgizovEric LeclercqIan BasailleMarinette SavonnetNadine Cullot

subject

[ INFO.INFO-IR ] Computer Science [cs]/Information Retrieval [cs.IR][ INFO ] Computer Science [cs]Computer scienceknowledge discovery02 engineering and technology[INFO] Computer Science [cs][INFO.INFO-SI]Computer Science [cs]/Social and Information Networks [cs.SI]Data modelingmassive datasetsopen source softwareData visualization[ INFO.INFO-IT ] Computer Science [cs]/Information Theory [cs.IT]polyglot storage020204 information systems0202 electrical engineering electronic engineering information engineering[INFO]Computer Science [cs]Twitter analysis . SystemsComputingMilieux_MISCELLANEOUS[INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB]business.industryPolyglotInductive reasoningData science[SPI.TRON] Engineering Sciences [physics]/ElectronicsData independence[ SPI.TRON ] Engineering Sciences [physics]/Electronics[SPI.TRON]Engineering Sciences [physics]/ElectronicsData model[INFO.INFO-IT]Computer Science [cs]/Information Theory [cs.IT][INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]020201 artificial intelligence & image processing[INFO.INFO-IR] Computer Science [cs]/Information Retrieval [cs.IR][INFO.INFO-IT] Computer Science [cs]/Information Theory [cs.IT]Data architecturebusinessSoftware architecture

description

International audience; In this article we show how a multi-paradigm framework can fulfil the requirements of tweets analysis and reduce the waiting time for researchers that use computational resources and storage systems to support large-scale data analysis. The originality of our approach is to combine concerns about data harvesting, data storage, data analysis and data visualisation into a framework that supports inductive reasoning in multidisciplinary scientific research. Our main contribution is a polyglot storage system with a generic data model to support logical data independence and a set of tools that can provide a suitable solution for mixing different types of algorithms in order to maximise the extraction of knowledge. We describe the software architecture of our framework, the generic model and we show how it has been used in major projects and what characteristics have been validated.

https://hal-univ-bourgogne.archives-ouvertes.fr/hal-01441580