Search results for "Big data"

showing 10 items of 311 documents

A comparison of HDFS compact data formats: Avro versus Parquet

2017

In this paper, file formats like Avro and Parquet are compared with text formats to evaluate the performance of the data queries. Different data query patterns have been evaluated. Cloudera’s open-source Apache Hadoop distribution CDH 5.4 has been chosen for the experiments presented in this article. The results show that compact data formats (Avro and Parquet) take up less storage space when compared with plain text data formats because of binary data format and compression advantage. Furthermore, data queries from the column based data format Parquet are faster when compared with text data formats and Avro. Article in English. HDFS glaustųjų duomenų formatų palyginimas: Avro prieš Parquet…

Big DataComputer scienceBig dataEnergy Engineering and Power Technology02 engineering and technologyManagement Science and Operations Researchcomputer.software_genreColumn (database)020204 information systemsData query0202 electrical engineering electronic engineering information engineeringHDFSDatabasebusiness.industryPlain textMechanical Engineeringcomputer.file_formatAvroFile formatHiveParquetData formatHadoopBinary data020201 artificial intelligence & image processingbusinesscomputerMokslas – Lietuvos ateitis / Science – Future of Lithuania
researchProduct

The Datafication of Hate: Expectations and Challenges in Automated Hate Speech Monitoring.

2020

Laaksonen, S-M.; Haapoja, J.; Kinnunen, T., Nelimarkka, M. & Pöyhtäri, R. (2020, accepted). . Frontiers in Big Data: Data Mining and Management / Critical Data and Algorithm Studies. doi:10.3389/fdata.2020.00003 Hate speech has been identified as a pressing problem in society and several automated approaches have been designed to detect and prevent it. This paper reports and reflects upon an action research setting consisting of multi-organizational collaboration conducted during Finnish municipal elections in 2017, wherein a technical infrastructure was designed to automatically monitor candidates' social media updates for hate speech. The setting allowed us to engage in a 2-fold investiga…

Big DataComputer sciencehate speechsocial media518 Media and communicationssosiaalinen mediamonitorointi050801 communication & media studiesSocial issues0508 media and communicationspolitiikkadatatiedeArtificial Intelligencealgoritmit050602 political science & public administrationComputer Science (miscellaneous)Social mediaalgorithmic systemvihapuheAction researchObjectivity (science)Original Researchlcsh:T58.5-58.64DataficationSocial phenomenonlcsh:Information technologytekstinlouhinta05 social sciencesCitizen journalism16. Peace & justice113 Computer and information sciencesData science0506 political sciencekoneoppiminenmachine learningNeutralitydata sciencepoliticsInformation Systems
researchProduct

Digitālo mediju ietekme uz lēmumu pieņemšanu un demokrātiju

2018

Mūsdienās tādu tehnoloģiju progress kā mākslīgais intelekts apvienojumā ar Lielo Datu analīzi ir devis iespēju ievākt, uzturēt un analizēt milzīgus datu apjomus. Progress digitālajā jomā ietekmē ne tikai tirgus ekonomiku, bet arī politisko līdzdalību. Digitālie mediji spēj apvienot līdzīgi domājošos, iedrošināt sociālas aktivitātes, mazināt plaisas starp sabiedrības grupām, kā arī paaugstināt politisko līdzdalību valstī un veicināt izglītību, tomēr ir pastiprinātas bažas par to, ka digitālo mediju platformas var tikt izmantotas, lai izplatītu propagandas vēstījumus, kas is spējīga piesaistīt lielu auditoriju. Automātiskajiem filtrēšanas mehānismiem var piemist diskriminējošs raksturs, ietek…

Big DataEkonomikadecision-makingmircrotargetingdigital mediafilter bubbles
researchProduct

FASTA/Q data compressors for MapReduce-Hadoop genomics: space and time savings made easy

2021

Abstract Background Storage of genomic data is a major cost for the Life Sciences, effectively addressed via specialized data compression methods. For the same reasons of abundance in data production, the use of Big Data technologies is seen as the future for genomic data storage and processing, with MapReduce-Hadoop as leaders. Somewhat surprisingly, none of the specialized FASTA/Q compressors is available within Hadoop. Indeed, their deployment there is not exactly immediate. Such a State of the Art is problematic. Results We provide major advances in two different directions. Methodologically, we propose two general methods, with the corresponding software, that make very easy to deploy …

Big DataFASTQ formatComputer scienceBig data02 engineering and technologycomputer.software_genrelcsh:Computer applications to medicine. Medical informaticsBiochemistry03 medical and health sciencesSoftwareStructural BiologySpark (mathematics)0202 electrical engineering electronic engineering information engineeringData_FILESMapReduceMapReduce; hadoop; sequence analysis; data compressionMolecular Biologylcsh:QH301-705.5030304 developmental biologyFile system0303 health sciencesSettore INF/01 - InformaticaDatabasebusiness.industryMethodology ArticleApplied MathematicsSequence analysisGenomicsData compression; Hadoop; MapReduce; Sequence analysis; Algorithms; Big Data; Data Compression; Genomics; SoftwareComputer Science Applicationslcsh:Biology (General)Software deploymentHadoopData compressionlcsh:R858-859.7020201 artificial intelligence & image processingState (computer science)businesscomputerAlgorithmsSoftwareData compressionBMC Bioinformatics
researchProduct

Can google trends and wikipedia help traditional surveillance? A pilot study on measles

2019

Introduction: Cases of measles in some European countries are increasing. The aim of this study is to find the correlation between Google Trends and Wikipedia searches and the real number of cases notified. Materials and Methods: The data on Internet searches have been obtained from Google Trends and Wikipedia. The reported cases of measles were selected from January 2013 until December 2018 for Google Trends and July 2015 until December 2018 from for Wikipedia. We have selected data from four European Countries: Italy, France, Germany and Romania. The data extracted from Wikipedia and Google Trends have been moved over time (Lag), one month in the future and one month in the past. Cross-co…

Big DataInternetRomaniaMedical Informatics ComputingVaccine-preventable diseases Italy Germany France Romania Measles vaccine Big Data Internet Measles Medical Informatics Computing Medical InformaticsPilot ProjectsEuropevaccine-preventable diseasesItalyGermanyHumansOriginal ArticleFranceMeasles vaccineMedical InformaticsMeasles
researchProduct

Digital epidemiology: assessment of measles infection through Google Trends mechanism in Italy.

2019

Introduction. The primary aim of this study is to evaluate the temporal correlation between Google Trends and the data on measles infection arising from the conventional surveillance system, reported by the Istituto Superiore di Sanità's (ISS) bulletin. Moreover, this study is also aimed at forecasting the trends of the reported infectious diseases cases over time. Materials and Methods. The reported cases of measles were selected from January 2013 until October 2018. The data on Internet searches have been obtained from Google Trends; the research data referred to the first 48 weeks of year 2017 have been aggregated on a weekly basis. The search volume provided by Google Trends has a relat…

Big DataInternetTime FactorsDatabases FactualMedical Informatics ComputingMeasles VaccineMedical InformaticSearch EngineEpidemiologic StudiesItalyMeasleVaccine-preventable diseasesPopulation SurveillanceHumansPublic HealthEpidemiologic MethodsMeaslesAnnali di igiene : medicina preventiva e di comunita
researchProduct

Discours Numériques : Quels enjeux pour la linguistique (appliquée) ?

2017

International audience; L'exposé est consacré à une présentation des enjeux potentiels, pour la recherche en linguistique (appliquée), du traitement, aussi par les industries de la langue, de nouveaux types de corpus numériques issus de l'essor des TIC. A partir de trois projets en cours à la MSH de Dijon, la partie centrale discute des besoins en recherche linguistique fondamentale qui émergent de ces nouvelles problématiques.

Big DataLinked DataIndustries de la langue[ SHS.LANGUE ] Humanities and Social Sciences/LinguisticsSémantiqueLinguistique appliquéeLinguistique de corpus[SHS.LANGUE]Humanities and Social Sciences/LinguisticsCorpus[SHS.LANGUE] Humanities and Social Sciences/Linguistics
researchProduct

SOCIAL NETWORKS, BIG DATA AND TRANSPORT PLANNING

2016

[EN] The characteristics of people who are related or tied to each individual affects her activity-travel behavior. That influence is especially associated to social and recreational activities, which are increasingly important. Collecting high quality data from those social networks is very difficult using traditional travel surveys, because respondents are asked about their general social life, which is most demanding to remember that specific facts. On the other hand, currently there are different potential sources of transport data, which is characterized by the huge amount of information available, the velocity with it is obtained and the variety of format in which is presented. This s…

Big DataOperations researchTransport PlanningComputer scienceBig data02 engineering and technologyINGENIERÍA DEL TRANSPORTEINGENIERIA E INFRAESTRUCTURA DE LOS TRANSPORTESSocial life0502 economics and business0202 electrical engineering electronic engineering information engineeringTransporte y movilidad 34807 / C - Máster universitario en sistemas inteligentes de transporte 2283sortRecreation050210 logistics & transportationTransportation planningSocial networkMINERVA projectbusiness.industry05 social sciencesData scienceVariety (cybernetics)Social NetworksData quality020201 artificial intelligence & image processingbusinessLibro de Actas CIT2016. XII Congreso de Ingeniería del Transporte
researchProduct

Proposed use of a conversational agent for patient empowerment

2021

Empowerment is a process through which people acquire the necessary knowledge and self-awareness to understand their conditions and treatment options, make informed choices and self-manage their health conditions in daily life, in collaboration with medical professionals. Conversational Agents in healthcare could play an important role in the process of empowering a person but, so far, they have been seldom been used for this purpose. This paper presents the basic principles and preliminary implementation of a conversational health agent for patient empowerment. It dialogues with the user in a "natural" way, collects health data from heterogeneous sources and provides the user wit…

Big DataPatient EmpowermentSettore INF/01 - InformaticaPatient EmpowermentArtificial IntelligenceApplied psychologyConversational AgentDigital HealthDialog systemPsychologycomputer.software_genrecomputerTailored Health Communication
researchProduct

A systematic review of SQL-on-Hadoop by using compact data formats

2016

Article also submitted for publication in Baltic J. Modern Computing (BJMC) on October 5, 2016.

Big DataSQLHDFSGeneral Computer ScienceDatabaseComputer sciencebusiness.industryBig dataAvrocomputer.software_genreParquetWorld Wide WebHadoopSystematic reviewbusinesscomputercomputer.programming_language
researchProduct