0000000000024013

AUTHOR

Mara Sorella

showing 1 related works from this author

Analyzing big datasets of genomic sequences: fast and scalable collection of k-mer statistics

2019

Abstract Background Distributed approaches based on the MapReduce programming paradigm have started to be proposed in the Bioinformatics domain, due to the large amount of data produced by the next-generation sequencing techniques. However, the use of MapReduce and related Big Data technologies and frameworks (e.g., Apache Hadoop and Spark) does not necessarily produce satisfactory results, in terms of both efficiency and effectiveness. We discuss how the development of distributed and Big Data management technologies has affected the analysis of large datasets of biological sequences. Moreover, we show how the choice of different parameter configurations and the careful engineering of the …

Data AnalysisFOS: Computer and information sciencesTime FactorsTime FactorComputer scienceStatistics as TopicBig dataApache Spark; distributed computing; performance evaluation; k-mer countinglcsh:Computer applications to medicine. Medical informaticsBiochemistryDomain (software engineering)Databases03 medical and health sciences0302 clinical medicineStructural BiologyComputer clusterStatisticsSpark (mathematics)Molecular Biologylcsh:QH301-705.5030304 developmental biology0303 health sciencesGenomeSettore INF/01 - InformaticaBase SequenceNucleic AcidApache Sparkbusiness.industryResearchApache Spark; Distributed computing; k-mer counting; Performance evaluation; Algorithms; Base Sequence; Software; Time Factors; Data Analysis; Databases Nucleic Acid; Genome; Statistics as TopicApplied Mathematicsk-mer countingDistributed computingComputer Science ApplicationsAlgorithmData AnalysiComputer Science - Distributed Parallel and Cluster Computinglcsh:Biology (General)030220 oncology & carcinogenesisScalabilityPerformance evaluationlcsh:R858-859.7Algorithm designDistributed Parallel and Cluster Computing (cs.DC)Databases Nucleic AcidbusinessAlgorithmsSoftware
researchProduct