6533b7d8fe1ef96bd126b35d
RESEARCH PRODUCT
Modelling and development of a generic observatory to harvest and analyze big data
Annabelle Gilletsubject
Big DataStream processing[INFO.INFO-OH] Computer Science [cs]/Other [cs.OH]TenseursData modelsCategory TheoryArchitectures logiciellesTensorsThéorie des catégoriesDonnées massivesModèles de donnéesSoftware Architecturesdescription
Big Data fascinate, both because of the value they hold that can provide a significant advantage in decision-making, and because of the challenges that their exploitation represents. These challenges are present at several levels of analytics workflows. At the level of the creation of software architectures, the volume and the velocity require at least enough performance to handle the ingestion and storage of data. The data variety has also an impact, as several new storage systems have emerged, each one corresponding to a specific need. The polystores are systems that integrate this diversity, to gain flexibility compared to the data warehouses, now too rigid. However, this diversification comes at a cost, that of the difficulty of taking into consideration the various data models in analyzes.This thesis is placed in this context, and proposes the Lambda+ Architecture, a architecture pattern that improves the Lambda Architecture to make it suitable for processing of Big Data while supporting simultaneously the correctness and the real-time properties. The category theory is used as formal basis to study the conservation of properties and opens new perspectives for software architectures that rely on compositions of components. The second contribution is the Tensor Data Model, a pivot model that act as an overlay to polystores. Based on tensors, it adds the notion of schema to them, to benefit from data manipulation operators on top of tensorial operators, as well from a strong type safety and schema inference systems, with good performance. Each one of these contributions benefit from an implementation, and the are gathered into an observatory that aims to analyze social data from Twitter and to make the results available for business experts.
year | journal | country | edition | language |
---|---|---|---|---|
2021-01-01 |