Search results for "Map"

showing 10 items of 3484 documents

FASTA/Q data compressors for MapReduce-Hadoop genomics: space and time savings made easy

2021

Abstract Background Storage of genomic data is a major cost for the Life Sciences, effectively addressed via specialized data compression methods. For the same reasons of abundance in data production, the use of Big Data technologies is seen as the future for genomic data storage and processing, with MapReduce-Hadoop as leaders. Somewhat surprisingly, none of the specialized FASTA/Q compressors is available within Hadoop. Indeed, their deployment there is not exactly immediate. Such a State of the Art is problematic. Results We provide major advances in two different directions. Methodologically, we propose two general methods, with the corresponding software, that make very easy to deploy …

Big DataFASTQ formatComputer scienceBig data02 engineering and technologycomputer.software_genrelcsh:Computer applications to medicine. Medical informaticsBiochemistry03 medical and health sciencesSoftwareStructural BiologySpark (mathematics)0202 electrical engineering electronic engineering information engineeringData_FILESMapReduceMapReduce; hadoop; sequence analysis; data compressionMolecular Biologylcsh:QH301-705.5030304 developmental biologyFile system0303 health sciencesSettore INF/01 - InformaticaDatabasebusiness.industryMethodology ArticleApplied MathematicsSequence analysisGenomicsData compression; Hadoop; MapReduce; Sequence analysis; Algorithms; Big Data; Data Compression; Genomics; SoftwareComputer Science Applicationslcsh:Biology (General)Software deploymentHadoopData compressionlcsh:R858-859.7020201 artificial intelligence & image processingState (computer science)businesscomputerAlgorithmsSoftwareData compressionBMC Bioinformatics
researchProduct

Deep learning and process understanding for data-driven Earth system science

2017

Machine learning approaches are increasingly used to extract patterns and insights from the ever-increasing stream of geospatial data, but current approaches may not be optimal when system behaviour is dominated by spatial or temporal context. Here, rather than amending classical machine learning, we argue that these contextual cues should be used as part of deep learning (an approach that is able to extract spatio-temporal features automatically) to gain further process understanding of Earth system science problems, improving the predictive ability of seasonal forecasting and modelling of long-range spatial connections across multiple timescales, for example. The next step will be a hybri…

Big DataTime FactorsProcess modelingGeospatial analysis010504 meteorology & atmospheric sciencesProcess (engineering)0208 environmental biotechnologyBig dataGeographic Mapping02 engineering and technologycomputer.software_genreMachine learning01 natural sciencesPattern Recognition AutomatedData-drivenDeep LearningSpatio-Temporal AnalysisHumansComputer SimulationWeather0105 earth and related environmental sciencesMultidisciplinarybusiness.industryDeep learningUncertaintyReproducibility of ResultsTranslatingRegression Psychology020801 environmental engineeringEarth system scienceKnowledgePattern recognition (psychology)Earth SciencesFemaleSeasonsArtificial intelligencebusinessPsychologyFacial RecognitioncomputerForecastingNature
researchProduct

Salah Methnani’s Immigrato: Portrait of a Migrant as a Young Man

2012

Although after Unification Italy was predominantly a country of emigration, in recent years it has become a hub for migrants from different parts of the world. Among these migrants a group of writers has contributed to re-configuring Italy's national literary identity. Read by critics primarily as an autobiographical text with remarkable sociological value, Tunisian-born Salah Methnani’s Immigrato is, I argue, first and foremost a classic Bildungsroman. Salah, the 'immigrant' in the title, is the story's protagonist, point of view and leading metaphor. His Bildung follows a double path. On the one hand Italy is a country imagined through TV and books read in school, on the other it is the c…

BildungsromanCity maps: Law and literatureMigrant literatureAfrican migration to EuropeMigration and national identity
researchProduct

De novo design of protein kinase inhibitors by in silico identification of hinge region-binding fragments.

2013

Protein kinases constitute an attractive family of enzyme targets with high relevance to cell and disease biology. Small molecule inhibitors are powerful tools to dissect and elucidate the function of kinases in chemical biology research and to serve as potential starting points for drug discovery. However, the discovery and development of novel inhibitors remains challenging. Here, we describe a structure-based de novo design approach that generates novel, hinge-binding fragments that are synthetically feasible and can be elaborated to small molecule libraries. Starting from commercially available compounds, core fragments were extracted, filtered for pharmacophoric properties compatible w…

Binding SitesMolecular StructureProtein ConformationIntracellular Signaling Peptides and ProteinsArticlesProtein Serine-Threonine KinasesCrystallography X-RayMAP Kinase Kinase KinasesImmediate-Early ProteinsCSK Tyrosine-Protein KinaseMolecular Docking SimulationSmall Molecule Librariessrc-Family KinasesDrug DesignComputer SimulationProtein Kinase InhibitorsACS chemical biology
researchProduct

Mapping and determinism of soil microbial community distribution across an agricultural landscape.

2015

Article en open access; International audience; Despite the relevance of landscape, regarding the spatial patterning of microbial communities and the relative influence of environmental parameters versus human activities, few investigations have been conducted at this scale. Here, we used a systematic grid to characterize the distribution of soil microbial communities at 278 sites across a monitored agricultural landscape of 13km(2). Molecular microbial biomass was estimated by soil DNA recovery and bacterial diversity by 16S rRNA gene pyrosequencing. Geostatistics provided the first maps of microbial community at this scale and revealed a heterogeneous but spatially structured distribution…

Biodiversity[SDV.SA.AGRO]Life Sciences [q-bio]/Agricultural sciences/AgronomyGeostatisticsEnvironmentMicrobiologysoil microbial ecologySciences de la TerreDiversity index[ SDV.SA.AGRO ] Life Sciences [q-bio]/Agricultural sciences/Agronomydiversité microbienneSoil pHRNA Ribosomal 16Sécologie du solBiomassbiomasse microbiennemappingpratique culturaleEcosystemSoil Microbiologypaysage agricoleOriginal Research2. Zero hungerBiomass (ecology)communauté microbienneenvironmental filtersBacteriaEcologyMicrobiotabacterial diversitydistribution spatialeAgricultureBiodiversitySequence Analysis DNA15. Life on landlandscapeAgricultural practicesAgronomyMicrobial population biologyAgricultural practices;bacterial diversity;environmental filters;landscape;mapping;soil microbial ecologyEarth SciencescartographieEnvironmental scienceSpecies evennessSpecies richnessactivité microbienne du solhuman activitiesMicrobiologyOpen
researchProduct

Mapreduce in computational biology - A synopsis

2017

In the past 20 years, the Life Sciences have witnessed a paradigm shift in the way research is performed. Indeed, the computational part of biological and clinical studies has become central or is becoming so. Correspondingly, the amount of data that one needs to process, compare and analyze, has experienced an exponential growth. As a consequence, High Performance Computing (HPC, for short) is being used intensively, in particular in terms of multi-core architectures. However, recently and thanks to the advances in the processing of other scientific and commercial data, Distributed Computing is also being considered for Bioinformatics applications. In particular, the MapReduce paradigm, to…

BioinformaticSpark0301 basic medicineSettore INF/01 - InformaticaBioinformaticsProcess (engineering)Computer scienceComputer Science (all)Computational biologybioinformatics; distributed computing; hadoop; MapReduce; spark; computer science (all)Supercomputercomputer.software_genreDistributed computing03 medical and health sciences030104 developmental biologyExponential growthHadoopParadigm shiftMiddleware (distributed applications)Spark (mathematics)MapReducecomputer
researchProduct

Mapreduce in computational biology via hadoop and spark

2017

Bioinformatics has a long history of software solutions developed on multi-core computing systems for solving computational intensive problems. This option suffer from some issues solvable by shifting to Distributed Systems. In particular, the MapReduce computing paradigm, and its implementations, Hadoop and Spark, is becoming increasingly popular in the Bioinformatics field because it allows for virtual-unlimited horizontal scalability while being easy-to-use. Here we provide a qualitative evaluation of some of the most significant MapReduce bioinformatics applications. We also focus on one of these applications to show the importance of correctly engineering an application to fully exploi…

BioinformaticSparkSettore INF/01 - InformaticaExploitbusiness.industryComputer scienceBioinformaticsDistributed computingScalabilityAlgorithm engineeringField (computer science)Distributed computingSoftwareAlgorithm engineering; Bioinformatics; Distributed computing; Hadoop; MapReduce; Scalability; SparkHadoopSpark (mathematics)ScalabilityData-intensive computingMapReducebusinessImplementationAlgorithm engineering
researchProduct

Segmental duplication associated with evolutionary instability of human chromosome 3p25.1

2005

Fluorescence in situ hybridization (FISH) of human bacterial artificial chromosome (BAC) clones to orangutan metaphase spreads localized a breakpoint between human chromosome 3p25.1 and orangutan chromosome 2 to a <30-kb interval. The inversion occurred in a relatively gene-rich region with seven genes within 500 kb. The underlying breakpoint is closely juxtaposed to validated genes, however no functional gene has been disrupted by the evolutionary rearrangement. An approximately 21-kb DNA segment at the 3p25.1 breakpoint region has been duplicated intrachromosomally and interchromosomally to multiple regions in the orangutan and human genomes, providing additional evidence for the role …

BiologyEvolution MolecularChromosomal InstabilityGene DuplicationYeastsChromosome regionsGeneticsmedicineAnimalsHumansMolecular BiologyIn Situ Hybridization FluorescencePhylogenyGenetics (clinical)Segmental duplicationGeneticsBacterial artificial chromosomeGorilla gorillamedicine.diagnostic_testChromosome MappingKaryotypeChromosome 17 (human)KaryotypingChromosomes Human Pair 3Chromosome 21Chromosome 22Fluorescence in situ hybridizationCytogenetic and Genome Research
researchProduct

A Coclustering Approach for Mining Large Protein-Protein Interaction Networks

2012

Several approaches have been presented in the literature to cluster Protein-Protein Interaction (PPI) networks. They can be grouped in two main categories: those allowing a protein to participate in different clusters and those generating only nonoverlapping clusters. In both cases, a challenging task is to find a suitable compromise between the biological relevance of the results and a comprehensive coverage of the analyzed networks. Indeed, methods returning high accurate results are often able to cover only small parts of the input PPI network, especially when low-characterized networks are considered. We present a coclustering-based technique able to generate both overlapping and nonove…

Biologycomputer.software_genreBioinformatics network analysis co-clusteringTask (project management)Set (abstract data type)Protein Interaction MappingGeneticsCluster (physics)Cluster AnalysisHumansRelevance (information retrieval)Protein Interaction MapsCluster analysisStructure (mathematical logic)Applied MathematicsProteinsprotein-protein interaction networksbiological networksComputingMethodologies_PATTERNRECOGNITIONCover (topology)Co-clusteringData miningcomputerAlgorithmsBiological networkBiotechnologyIEEE/ACM Transactions on Computational Biology and Bioinformatics
researchProduct

Monitoring barley and corn growth from remote sensing data at field scale

2004

Vegetation indices have been used for operational quantitative monitoring of vegetation. Here, corn and barley cultures have been used to relate meaningful biophysical parameters such as dry biomass and Crop Growth Rate (CGR) to the well-established Normalized Difference Vegetation Index (NDVI). We explain these relationships by means of the use of the Light Use Efficiency (LUE) models, based on the positive relation between primary production and Absorbed Photosynthetically Active Radiation (APAR). In these models we introduce NDVI as a linear estimator of f APAR. Experimental data over corn and barley show that dry biomass is linearly related to the Time-Integrated Value of the NDVI (TIND…

Biomass (ecology)Photosynthetically active radiationmedicineGeneral Earth and Planetary SciencesEnvironmental scienceStage (hydrology)medicine.symptomScale (map)Linear growthVegetation (pathology)Normalized Difference Vegetation IndexField (geography)Remote sensingInternational Journal of Remote Sensing
researchProduct