Search results for "Big data"
showing 10 items of 311 documents
Statistical Learning Algorithms to Forecast the Equity Risk Premium in the European Union
2018
With the explosion of “Big Data”, the application of statistical learning models has become popular in multiple scientific areas as well as in marketing, finance or other business disciplines. Nonetheless, there is not yet an abundant literature that covers the application of these learning algorithms to forecast the equity risk premium. In this paper we investigate whether Classification and Regression Trees (CART) algorithms and several ensemble methods, such as bagging, random forests and boosting, improve traditional parametric models to forecast the equity risk premium. In particular, we work with European Monetary Union data for a period that spans from the EMU foundation at the begin…
The state of Business Intelligence : in Finnish enterprises
2015
Business Intelligence (BI) has recently been of interest both in information technology and accounting fields of research. This owes at least partly to how organisations today have increasing amounts of data and information at their disposal and they are attempting to reap benefits and competitive advantage from them. This study focuses on large Finnish enterprises and examines how they are applying business intelligence today. Especially the process nature of transforming data in to knowledge is under scrutiny and how BI is utilized in decision making. The results indicate that organisations are perceiving benefits from utilising their BI processes and while the technological factors are o…
Gene Set to Diseases (GS2D): disease enrichment analysis on human gene sets with literature data
2016
Large sets of candidate genes derived from high-throughput biological experiments can be characterized by functional enrichment analysis. The analysis consists of comparing the functions of one gene set against that of a background gene set. Then, functions related to a significant number of genes in the gene set are expected to be relevant. Web tools offering disease enrichment analysis on gene sets are often based on gene-disease associations from manually curated or experimental data that is accurate but does not cover all diseases discussed in the literature. Using associations automatically derived from literature data could be a cost effective method to improve the coverage of disease…
Computation Cluster Validation in the Big Data Era
2017
Data-driven class discovery, i.e., the inference of cluster structure in a dataset, is a fundamental task in Data Analysis, in particular for the Life Sciences. We provide a tutorial on the most common approaches used for that task, focusing on methodologies for the prediction of the number of clusters in a dataset. Although the methods that we present are general in terms of the data for which they can be used, we offer a case study relevant for Microarray Data Analysis.
Predictive and Evolutive Cross-Referencing for Web Textual Sources
2017
International audience; One of the main challenges in the domain of competitive intelligence is to harness important volumes of information from the web, and extract the most valuable pieces of information. As the amount of information available on the web grows rapidly and is very heterogeneous, this process becomes overwhelming for experts. To leverage this challenge, this paper presents a vision for a novel process that performs cross-referencing at web scale. This process uses a focused crawler and a semantic-based classifier to cross-reference textual items without expert intervention, based on Big Data and Semantic Web technologies. The system is described thoroughly, and interests of…
Table Compression
2016
Data Compression Techniques for massive tables are described. Related methodological results are also presented.
From Deep Learning to Deep University: Cognitive Development of Intelligent Systems
2018
Search is not only an instrument to find intended information. Ability to search is a basic cognitive skill helping people to explore the world. It is largely based on personal intuition and creativity. However, due to the emerged big data challenge, people require new forms of training to develop or improve this ability. Current developments within Cognitive Computing and Deep Learning enable artificial systems to learn and gain human-like cognitive abilities. This means that the skill how to search efficiently and creatively within huge data spaces becomes one of the most important ones for the cognitive systems aiming at autonomy. This skill cannot be pre-programmed, it requires learning…
Digitization, Epistemic Proximity, and the Education System: Insights from a Bibliometric Analysis
2021
Advances in IoT, AI, Cyber-Physical Systems, Computational Intelligence, and Big Data Analytics require organizations and workforce to be able and willing to learn how to interact with digital technology. In organizations, coordination and cooperation between actors with expertise in business and technology is fundamental, but integration is hard without understanding the terminology and problems of the interlocutor. Epistemic proximity becomes prominent, underlining the importance of an education focused on flexibility, willingness to cope with the unknown, and interdisciplinarity. The main goal of this work is to provide a perspective on how the education system is evolving to support org…
Dades massives i estadística: La perspectiva d'un estadístic
2014
Les dades massives (big data) representen un recurs sense precedents per a afrontar reptes científics, econòmics i socials, però també incrementen la possibilitat de traure conclusions enganyoses. Per exemple, l’ús d’enfocaments basats exclusivament en dades i que es despreocupen de comprendre el fenomen en estudi, que s’orienten a un objectiu esmunyedís i canviant, que no tenen en compte problemes determinants en la recopilació de dades, que resumeixen o «cuinen» inadequadament les dades i que confonen el soroll amb el senyal. Repassarem alguns casos reeixits i il·lustrarem com poden ajudar els principis de l’estadística a obtenir una informació més fiable de les dades. També abordarem els…
Querying and reasoning over large scale building data sets
2016
International audience; The architectural design and construction domains work on a daily basis with massive amounts of data. Properly managing, exchanging and exploiting these data is an ever ongoing challenge in this domain. This has resulted in large semantic RDF graphs that are to be combined with a significant number of other data sets (building product catalogues, regulation data, geometric point cloud data, simulation data, sensor data), thus making an already huge dataset even larger. Making these big data available at high performance rates and speeds and into the correct (intuitive) formats is therefore an incredibly high challenge in this domain. Yet, hardly any benchmark is avai…