Search results for "Database"

showing 10 items of 2136 documents

A comparison of HDFS compact data formats: Avro versus Parquet

2017

In this paper, file formats like Avro and Parquet are compared with text formats to evaluate the performance of the data queries. Different data query patterns have been evaluated. Cloudera’s open-source Apache Hadoop distribution CDH 5.4 has been chosen for the experiments presented in this article. The results show that compact data formats (Avro and Parquet) take up less storage space when compared with plain text data formats because of binary data format and compression advantage. Furthermore, data queries from the column based data format Parquet are faster when compared with text data formats and Avro. Article in English. HDFS glaustųjų duomenų formatų palyginimas: Avro prieš Parquet…

Big DataComputer scienceBig dataEnergy Engineering and Power Technology02 engineering and technologyManagement Science and Operations Researchcomputer.software_genreColumn (database)020204 information systemsData query0202 electrical engineering electronic engineering information engineeringHDFSDatabasebusiness.industryPlain textMechanical Engineeringcomputer.file_formatAvroFile formatHiveParquetData formatHadoopBinary data020201 artificial intelligence & image processingbusinesscomputerMokslas – Lietuvos ateitis / Science – Future of Lithuania
researchProduct

FASTA/Q data compressors for MapReduce-Hadoop genomics: space and time savings made easy

2021

Abstract Background Storage of genomic data is a major cost for the Life Sciences, effectively addressed via specialized data compression methods. For the same reasons of abundance in data production, the use of Big Data technologies is seen as the future for genomic data storage and processing, with MapReduce-Hadoop as leaders. Somewhat surprisingly, none of the specialized FASTA/Q compressors is available within Hadoop. Indeed, their deployment there is not exactly immediate. Such a State of the Art is problematic. Results We provide major advances in two different directions. Methodologically, we propose two general methods, with the corresponding software, that make very easy to deploy …

Big DataFASTQ formatComputer scienceBig data02 engineering and technologycomputer.software_genrelcsh:Computer applications to medicine. Medical informaticsBiochemistry03 medical and health sciencesSoftwareStructural BiologySpark (mathematics)0202 electrical engineering electronic engineering information engineeringData_FILESMapReduceMapReduce; hadoop; sequence analysis; data compressionMolecular Biologylcsh:QH301-705.5030304 developmental biologyFile system0303 health sciencesSettore INF/01 - InformaticaDatabasebusiness.industryMethodology ArticleApplied MathematicsSequence analysisGenomicsData compression; Hadoop; MapReduce; Sequence analysis; Algorithms; Big Data; Data Compression; Genomics; SoftwareComputer Science Applicationslcsh:Biology (General)Software deploymentHadoopData compressionlcsh:R858-859.7020201 artificial intelligence & image processingState (computer science)businesscomputerAlgorithmsSoftwareData compressionBMC Bioinformatics
researchProduct

Digital epidemiology: assessment of measles infection through Google Trends mechanism in Italy.

2019

Introduction. The primary aim of this study is to evaluate the temporal correlation between Google Trends and the data on measles infection arising from the conventional surveillance system, reported by the Istituto Superiore di Sanità's (ISS) bulletin. Moreover, this study is also aimed at forecasting the trends of the reported infectious diseases cases over time. Materials and Methods. The reported cases of measles were selected from January 2013 until October 2018. The data on Internet searches have been obtained from Google Trends; the research data referred to the first 48 weeks of year 2017 have been aggregated on a weekly basis. The search volume provided by Google Trends has a relat…

Big DataInternetTime FactorsDatabases FactualMedical Informatics ComputingMeasles VaccineMedical InformaticSearch EngineEpidemiologic StudiesItalyMeasleVaccine-preventable diseasesPopulation SurveillanceHumansPublic HealthEpidemiologic MethodsMeaslesAnnali di igiene : medicina preventiva e di comunita
researchProduct

A systematic review of SQL-on-Hadoop by using compact data formats

2016

Article also submitted for publication in Baltic J. Modern Computing (BJMC) on October 5, 2016.

Big DataSQLHDFSGeneral Computer ScienceDatabaseComputer sciencebusiness.industryBig dataAvrocomputer.software_genreParquetWorld Wide WebHadoopSystematic reviewbusinesscomputercomputer.programming_language
researchProduct

A distance metric on binary trees using lattice-theoretic measures

1990

A so called height function which is a strictly antitone supervaluation is defined on binary trees. Via lattice-theoretic results and using the height function, we can define a distance metric on binary trees of size n which can be computed in expected time O(n 3/2 )

Binary treeData structureRandom binary treeComputer Science ApplicationsTheoretical Computer ScienceHeight functionCombinatoricsTree structureLattice (order)Signal ProcessingMetric (mathematics)Metric treeComputer Science::DatabasesInformation SystemsMathematicsInformation Processing Letters
researchProduct

L’Erbario informatico del Dipartimento di Botanica di Catania

2004

The Herbarium electronic archives of the Department of Botany of Catania University are here illustrated. A Microsoft Access database has been developed for digital cataloguing of specimens from historical and recent Herbarium collections. A simple interface allows a fast consultation by quick ordering, retrivial and search of data in all records of the catalogue. Digital high definition photographs of the exsiccata are included in the Herbarium archives and linked to the relative record.

Biodiversity Herbaria Database Catalogue
researchProduct

Sparse Manifold Clustering and Embedding to discriminate gene expression profiles of glioblastoma and meningioma tumors.

2013

Sparse Manifold Clustering and Embedding (SMCE) algorithm has been recently proposed for simultaneous clustering and dimensionality reduction of data on nonlinear manifolds using sparse representation techniques. In this work, SMCE algorithm is applied to the differential discrimination of Glioblastoma and Meningioma Tumors by means of their Gene Expression Profiles. Our purpose was to evaluate the robustness of this nonlinear manifold to classify gene expression profiles, characterized by the high-dimensionality of their representations and the low discrimination power of most of the genes. For this objective, we used SMCE to reduce the dimensionality of a preprocessed dataset of 35 single…

BioinformaticsHealth InformaticsMicroarray data analysisRobustness (computer science)Databases GeneticCluster AnalysisHumansManifoldsCluster analysisMathematicsOligonucleotide Array Sequence Analysisbusiness.industryDimensionality reductionGene Expression ProfilingComputational BiologyDiscriminant AnalysisPattern recognitionSparse approximationLinear discriminant analysisManifoldComputer Science ApplicationsFISICA APLICADAEmbeddingAutomatic classificationArtificial intelligencebusinessGlioblastomaMeningiomaTranscriptomeAlgorithmsCurse of dimensionalityComputers in biology and medicine
researchProduct

A summary of genomic databases: overview and discussion

2009

In the last few years both the amount of electronically stored biological data and the number of biological data repositories grew up significantly (today, more than eight hundred can be counted thereof). In spite of the enormous amount of available resources, a user may be disoriented when he/she searches for specific data. Thus, the accurate analysis of biological data and repositories turn out to be useful to obtain a systematic view of biological database structures, tools and contents and, eventually, to facilitate the access and recovery of such data. In this chapter, we propose an analysis of genomic databases, which are databases of fundamental importance for the research in bioinfo…

Biological dataInformation retrievalComputer scienceBioinformatics Biological Databases AnalysisDatabase schemaBiological databaseGenomic databases
researchProduct

Evo-devo mechanisms underlying the continuum between homology and homoplasy

2015

The different manifestations of equivalence and similarity in structure throughout evolution suggest a continuous and hierarchical process that starts out with the origin of a morphological novelty, unit, or homologue. Once a morphological unit has originated, its properties change subsequently into variants that differ, in magnitude, from the original properties found in the common ancestor. We will look into the nature of morphological units and their degrees of modification, which will provide the starting point for restructuring the concept of “homology,” keeping the use of homology as the identity of an anatomical part, and homogeny, as the specific variation of that anatomical part du…

BiologyAnatomical partHomology (biology)Hierarchical database modelEvolutionary biologyPhenomenonConvergent evolutionGeneticsMorphological noveltyEvolutionary developmental biologyMolecular MedicineAnimal Science and ZoologyEcology Evolution Behavior and SystematicsDevelopmental BiologyJournal of Experimental Zoology Part B: Molecular and Developmental Evolution
researchProduct

EVpedia: a community web portal for extracellular vesicles research

2014

Abstract Motivation: Extracellular vesicles (EVs) are spherical bilayered proteolipids, harboring various bioactive molecules. Due to the complexity of the vesicular nomenclatures and components, online searches for EV-related publications and vesicular components are currently challenging. Results: We present an improved version of EVpedia, a public database for EVs research. This community web portal contains a database of publications and vesicular components, identification of orthologous vesicular components, bioinformatic tools and a personalized function. EVpedia includes 6879 publications, 172 080 vesicular components from 263 high-throughput datasets, and has been accessed more tha…

Biomedical ResearchDatabases FactualComputer scienceBioactive moleculesMedizinBioinformaticsBiochemistryMathematical SciencesUser-Computer InterfaceNon-U.S. Gov'tdatabasecomputer.programming_languagePLASMAMICROPARTICLESResearch Support Non-U.S. Gov'tbioinformaticsBiological SciencesOriginal PapersCANCERComputer Science ApplicationsIdentification (information)Cell and molecular biologyComputational MathematicsComputational Theory and MathematicsPROTEOMIC ANALYSISMEMBRANE-VESICLESEXPRESSIONStatistics and ProbabilityPROSTASOMESJavaBioinformaticsexosomesResearch SupportExtracellular vesiclesWorld Wide WebDatabasesDELIVERYInformation and Computing SciencesJournal ArticleHumansMembrane vesicleMolecular BiologyFactualEXOSOMESComputational BiologyCELLSDatabase Management SystemsExtracellular SpacecomputerSoftware
researchProduct