0000000001091739

AUTHOR

Francesco Palini

showing 4 related works from this author

The Power of Word-Frequency Based Alignment-Free Functions: a Comprehensive Large-Scale Experimental Analysis

2021

Abstract Motivation Alignment-free (AF) distance/similarity functions are a key tool for sequence analysis. Experimental studies on real datasets abound and, to some extent, there are also studies regarding their control of false positive rate (Type I error). However, assessment of their power, i.e. their ability to identify true similarity, has been limited to some members of the D2 family. The corresponding experimental studies have concentrated on short sequences, a scenario no longer adequate for current applications, where sequence lengths may vary considerably. Such a State of the Art is methodologically problematic, since information regarding a key feature such as power is either mi…

Statistics and ProbabilitySequenceSimilarity (geometry)Settore INF/01 - Informaticasequence analysisComputer sciencepower statisticsAlignment-Free Genomic Analysis Big Data Software Platforms Bioinformatics AlgorithmsScale (descriptive set theory)Function (mathematics)computer.software_genreBiochemistryComputer Science ApplicationsSet (abstract data type)Computational MathematicsRange (mathematics)Computational Theory and Mathematicssequence analysis; power statistics; alignment-free functionsalignment-free functionsData miningCompleteness (statistics)Molecular BiologycomputerType I and type II errors
researchProduct

FASTA/Q data compressors for MapReduce-Hadoop genomics: space and time savings made easy

2021

Abstract Background Storage of genomic data is a major cost for the Life Sciences, effectively addressed via specialized data compression methods. For the same reasons of abundance in data production, the use of Big Data technologies is seen as the future for genomic data storage and processing, with MapReduce-Hadoop as leaders. Somewhat surprisingly, none of the specialized FASTA/Q compressors is available within Hadoop. Indeed, their deployment there is not exactly immediate. Such a State of the Art is problematic. Results We provide major advances in two different directions. Methodologically, we propose two general methods, with the corresponding software, that make very easy to deploy …

Big DataFASTQ formatComputer scienceBig data02 engineering and technologycomputer.software_genrelcsh:Computer applications to medicine. Medical informaticsBiochemistry03 medical and health sciencesSoftwareStructural BiologySpark (mathematics)0202 electrical engineering electronic engineering information engineeringData_FILESMapReduceMapReduce; hadoop; sequence analysis; data compressionMolecular Biologylcsh:QH301-705.5030304 developmental biologyFile system0303 health sciencesSettore INF/01 - InformaticaDatabasebusiness.industryMethodology ArticleApplied MathematicsSequence analysisGenomicsData compression; Hadoop; MapReduce; Sequence analysis; Algorithms; Big Data; Data Compression; Genomics; SoftwareComputer Science Applicationslcsh:Biology (General)Software deploymentHadoopData compressionlcsh:R858-859.7020201 artificial intelligence & image processingState (computer science)businesscomputerAlgorithmsSoftwareData compressionBMC Bioinformatics
researchProduct

Alignment-free Genomic Analysis via a Big Data Spark Platform

2021

Abstract Motivation Alignment-free distance and similarity functions (AF functions, for short) are a well-established alternative to pairwise and multiple sequence alignments for many genomic, metagenomic and epigenomic tasks. Due to data-intensive applications, the computation of AF functions is a Big Data problem, with the recent literature indicating that the development of fast and scalable algorithms computing AF functions is a high-priority task. Somewhat surprisingly, despite the increasing popularity of Big Data technologies in computational biology, the development of a Big Data platform for those tasks has not been pursued, possibly due to its complexity. Results We fill this impo…

FOS: Computer and information sciencesStatistics and Probabilitysequence analysisComputer science0206 medical engineeringBig data02 engineering and technologyMachine learningcomputer.software_genreBiochemistry03 medical and health sciencesSpark (mathematics)MapReduceMolecular Biology030304 developmental biology0303 health sciencesSettore INF/01 - Informaticabusiness.industryBioinformatics High Performance Computing Compressed Data StructuresMapReduce; hadoop; sequence analysisComputer Science ApplicationsComputational MathematicsTask (computing)Computer Science - Distributed Parallel and Cluster ComputingComputational Theory and MathematicsDistributed Parallel and Cluster Computing (cs.DC)Artificial intelligencehadoopbusinesscomputer020602 bioinformaticsBioinformatics
researchProduct

Additional file 1 of FASTA/Q data compressors for MapReduce-Hadoop genomics: space and time savings made easy

2021

Additional file 1. Supplementary Material.

researchProduct