Search results for "Big data"
showing 10 items of 311 documents
A comparison of HDFS compact data formats: Avro versus Parquet
2017
In this paper, file formats like Avro and Parquet are compared with text formats to evaluate the performance of the data queries. Different data query patterns have been evaluated. Cloudera’s open-source Apache Hadoop distribution CDH 5.4 has been chosen for the experiments presented in this article. The results show that compact data formats (Avro and Parquet) take up less storage space when compared with plain text data formats because of binary data format and compression advantage. Furthermore, data queries from the column based data format Parquet are faster when compared with text data formats and Avro. Article in English. HDFS glaustųjų duomenų formatų palyginimas: Avro prieš Parquet…
The Datafication of Hate: Expectations and Challenges in Automated Hate Speech Monitoring.
2020
Laaksonen, S-M.; Haapoja, J.; Kinnunen, T., Nelimarkka, M. & Pöyhtäri, R. (2020, accepted). . Frontiers in Big Data: Data Mining and Management / Critical Data and Algorithm Studies. doi:10.3389/fdata.2020.00003 Hate speech has been identified as a pressing problem in society and several automated approaches have been designed to detect and prevent it. This paper reports and reflects upon an action research setting consisting of multi-organizational collaboration conducted during Finnish municipal elections in 2017, wherein a technical infrastructure was designed to automatically monitor candidates' social media updates for hate speech. The setting allowed us to engage in a 2-fold investiga…
Digitālo mediju ietekme uz lēmumu pieņemšanu un demokrātiju
2018
Mūsdienās tādu tehnoloģiju progress kā mākslīgais intelekts apvienojumā ar Lielo Datu analīzi ir devis iespēju ievākt, uzturēt un analizēt milzīgus datu apjomus. Progress digitālajā jomā ietekmē ne tikai tirgus ekonomiku, bet arī politisko līdzdalību. Digitālie mediji spēj apvienot līdzīgi domājošos, iedrošināt sociālas aktivitātes, mazināt plaisas starp sabiedrības grupām, kā arī paaugstināt politisko līdzdalību valstī un veicināt izglītību, tomēr ir pastiprinātas bažas par to, ka digitālo mediju platformas var tikt izmantotas, lai izplatītu propagandas vēstījumus, kas is spējīga piesaistīt lielu auditoriju. Automātiskajiem filtrēšanas mehānismiem var piemist diskriminējošs raksturs, ietek…
FASTA/Q data compressors for MapReduce-Hadoop genomics: space and time savings made easy
2021
Abstract Background Storage of genomic data is a major cost for the Life Sciences, effectively addressed via specialized data compression methods. For the same reasons of abundance in data production, the use of Big Data technologies is seen as the future for genomic data storage and processing, with MapReduce-Hadoop as leaders. Somewhat surprisingly, none of the specialized FASTA/Q compressors is available within Hadoop. Indeed, their deployment there is not exactly immediate. Such a State of the Art is problematic. Results We provide major advances in two different directions. Methodologically, we propose two general methods, with the corresponding software, that make very easy to deploy …
Can google trends and wikipedia help traditional surveillance? A pilot study on measles
2019
Introduction: Cases of measles in some European countries are increasing. The aim of this study is to find the correlation between Google Trends and Wikipedia searches and the real number of cases notified. Materials and Methods: The data on Internet searches have been obtained from Google Trends and Wikipedia. The reported cases of measles were selected from January 2013 until December 2018 for Google Trends and July 2015 until December 2018 from for Wikipedia. We have selected data from four European Countries: Italy, France, Germany and Romania. The data extracted from Wikipedia and Google Trends have been moved over time (Lag), one month in the future and one month in the past. Cross-co…
Digital epidemiology: assessment of measles infection through Google Trends mechanism in Italy.
2019
Introduction. The primary aim of this study is to evaluate the temporal correlation between Google Trends and the data on measles infection arising from the conventional surveillance system, reported by the Istituto Superiore di Sanità's (ISS) bulletin. Moreover, this study is also aimed at forecasting the trends of the reported infectious diseases cases over time. Materials and Methods. The reported cases of measles were selected from January 2013 until October 2018. The data on Internet searches have been obtained from Google Trends; the research data referred to the first 48 weeks of year 2017 have been aggregated on a weekly basis. The search volume provided by Google Trends has a relat…
Discours Numériques : Quels enjeux pour la linguistique (appliquée) ?
2017
International audience; L'exposé est consacré à une présentation des enjeux potentiels, pour la recherche en linguistique (appliquée), du traitement, aussi par les industries de la langue, de nouveaux types de corpus numériques issus de l'essor des TIC. A partir de trois projets en cours à la MSH de Dijon, la partie centrale discute des besoins en recherche linguistique fondamentale qui émergent de ces nouvelles problématiques.
SOCIAL NETWORKS, BIG DATA AND TRANSPORT PLANNING
2016
[EN] The characteristics of people who are related or tied to each individual affects her activity-travel behavior. That influence is especially associated to social and recreational activities, which are increasingly important. Collecting high quality data from those social networks is very difficult using traditional travel surveys, because respondents are asked about their general social life, which is most demanding to remember that specific facts. On the other hand, currently there are different potential sources of transport data, which is characterized by the huge amount of information available, the velocity with it is obtained and the variety of format in which is presented. This s…
Proposed use of a conversational agent for patient empowerment
2021
Empowerment is a process through which people acquire the necessary knowledge and self-awareness to understand their conditions and treatment options, make informed choices and self-manage their health conditions in daily life, in collaboration with medical professionals. Conversational Agents in healthcare could play an important role in the process of empowering a person but, so far, they have been seldom been used for this purpose. This paper presents the basic principles and preliminary implementation of a conversational health agent for patient empowerment. It dialogues with the user in a "natural" way, collects health data from heterogeneous sources and provides the user wit…
A systematic review of SQL-on-Hadoop by using compact data formats
2016
Article also submitted for publication in Baltic J. Modern Computing (BJMC) on October 5, 2016.