Search results for "Base"
showing 10 items of 8362 documents
Big Data Processing in the ATLAS Experiment: Use Cases and Experience
2015
Abstract The physics goals of the next Large Hadron Collider run include high precision tests of the Standard Model and searches for new physics. These goals require detailed comparison of data with computational models simulating the expected data behavior. To highlight the role which modeling and simulation plays in future scientific discovery, we report on use cases and experience with a unified system built to process both real and simulated data of growing volume and variety.
Big Data in metagenomics: Apache Spark vs MPI.
2020
The progress of next-generation sequencing has lead to the availability of massive data sets used by a wide range of applications in biology and medicine. This has sparked significant interest in using modern Big Data technologies to process this large amount of information in distributed memory clusters of commodity hardware. Several approaches based on solutions such as Apache Hadoop or Apache Spark, have been proposed. These solutions allow developers to focus on the problem while the need to deal with low level details, such as data distribution schemes or communication patterns among processing nodes, can be ignored. However, performance and scalability are also of high importance when…
A comparison of HDFS compact data formats: Avro versus Parquet
2017
In this paper, file formats like Avro and Parquet are compared with text formats to evaluate the performance of the data queries. Different data query patterns have been evaluated. Cloudera’s open-source Apache Hadoop distribution CDH 5.4 has been chosen for the experiments presented in this article. The results show that compact data formats (Avro and Parquet) take up less storage space when compared with plain text data formats because of binary data format and compression advantage. Furthermore, data queries from the column based data format Parquet are faster when compared with text data formats and Avro. Article in English. HDFS glaustųjų duomenų formatų palyginimas: Avro prieš Parquet…
FASTA/Q data compressors for MapReduce-Hadoop genomics: space and time savings made easy
2021
Abstract Background Storage of genomic data is a major cost for the Life Sciences, effectively addressed via specialized data compression methods. For the same reasons of abundance in data production, the use of Big Data technologies is seen as the future for genomic data storage and processing, with MapReduce-Hadoop as leaders. Somewhat surprisingly, none of the specialized FASTA/Q compressors is available within Hadoop. Indeed, their deployment there is not exactly immediate. Such a State of the Art is problematic. Results We provide major advances in two different directions. Methodologically, we propose two general methods, with the corresponding software, that make very easy to deploy …
Digital epidemiology: assessment of measles infection through Google Trends mechanism in Italy.
2019
Introduction. The primary aim of this study is to evaluate the temporal correlation between Google Trends and the data on measles infection arising from the conventional surveillance system, reported by the Istituto Superiore di Sanità's (ISS) bulletin. Moreover, this study is also aimed at forecasting the trends of the reported infectious diseases cases over time. Materials and Methods. The reported cases of measles were selected from January 2013 until October 2018. The data on Internet searches have been obtained from Google Trends; the research data referred to the first 48 weeks of year 2017 have been aggregated on a weekly basis. The search volume provided by Google Trends has a relat…
A systematic review of SQL-on-Hadoop by using compact data formats
2016
Article also submitted for publication in Baltic J. Modern Computing (BJMC) on October 5, 2016.
A distance metric on binary trees using lattice-theoretic measures
1990
A so called height function which is a strictly antitone supervaluation is defined on binary trees. Via lattice-theoretic results and using the height function, we can define a distance metric on binary trees of size n which can be computed in expected time O(n 3/2 )
A Practical Perspective: The Effect of Ligand Conformers on the Negative Image-Based Screening.
2019
Negative image-based (NIB) screening is a rigid molecular docking methodology that can also be employed in docking rescoring. During the NIB screening, a negative image is generated based on the target protein’s ligand-binding cavity by inverting its shape and electrostatics. The resulting NIB model is a drug-like entity or pseudo-ligand that is compared directly against ligand 3D conformers, as is done with a template compound in the ligand-based screening. This cavity-based rigid docking has been demonstrated to work with genuine drug targets in both benchmark testing and drug candidate/lead discovery. Firstly, the study explores in-depth the applicability of different ligand 3D conformer…
Promoting Deoxygenation of Bio-Oil by Metal-Loaded Hierarchical ZSM-5 Zeolites
2016
3 Figuras.- 5 tablas.-1 Esquema.- This document is the Accepted Manuscript version of a Published Work that appeared in final form in ACS Sustainable Chemistry & Engineering, copyright © American Chemical Society after peer review and technical editing by the publisher. To access the final edited and published work see https://doi.org/10.1021/acssuschemeng.5b01606 ”
A class-selective immunoassay for simultaneous analysis of anilinopyrimidine fungicides using a rationally designed hapten
2017
he development of multianalyte immunoassays constitutes a main research issue in the field of bioanalytical techniques. In the present study, class-specific antibodies against the three members of the anilinopyrimidine family of fungicides (pyrimethanil, cyprodinil and mepanipyrim) were raised by using a bioconjugate of a rationally designed hapten [5-(6-methyl-2-(phenylamino)pyrimidin-4-yl)pentanoic acid]. Highly sensitive immunoassays were developed for the generic determination of these compounds, using the competitive enzyme-linked immunosorbent assay (ELISA). Particularly, a direct antibody-coated competitive ELISA afforded identical sensitivity for the three anilinopyrimidines, with I…