Search results for "computer.software_genre"

showing 10 items of 3858 documents

A comparison of HDFS compact data formats: Avro versus Parquet

2017

In this paper, file formats like Avro and Parquet are compared with text formats to evaluate the performance of the data queries. Different data query patterns have been evaluated. Cloudera’s open-source Apache Hadoop distribution CDH 5.4 has been chosen for the experiments presented in this article. The results show that compact data formats (Avro and Parquet) take up less storage space when compared with plain text data formats because of binary data format and compression advantage. Furthermore, data queries from the column based data format Parquet are faster when compared with text data formats and Avro. Article in English. HDFS glaustųjų duomenų formatų palyginimas: Avro prieš Parquet…

Big DataComputer scienceBig dataEnergy Engineering and Power Technology02 engineering and technologyManagement Science and Operations Researchcomputer.software_genreColumn (database)020204 information systemsData query0202 electrical engineering electronic engineering information engineeringHDFSDatabasebusiness.industryPlain textMechanical Engineeringcomputer.file_formatAvroFile formatHiveParquetData formatHadoopBinary data020201 artificial intelligence & image processingbusinesscomputerMokslas – Lietuvos ateitis / Science – Future of Lithuania
researchProduct

FASTA/Q data compressors for MapReduce-Hadoop genomics: space and time savings made easy

2021

Abstract Background Storage of genomic data is a major cost for the Life Sciences, effectively addressed via specialized data compression methods. For the same reasons of abundance in data production, the use of Big Data technologies is seen as the future for genomic data storage and processing, with MapReduce-Hadoop as leaders. Somewhat surprisingly, none of the specialized FASTA/Q compressors is available within Hadoop. Indeed, their deployment there is not exactly immediate. Such a State of the Art is problematic. Results We provide major advances in two different directions. Methodologically, we propose two general methods, with the corresponding software, that make very easy to deploy …

Big DataFASTQ formatComputer scienceBig data02 engineering and technologycomputer.software_genrelcsh:Computer applications to medicine. Medical informaticsBiochemistry03 medical and health sciencesSoftwareStructural BiologySpark (mathematics)0202 electrical engineering electronic engineering information engineeringData_FILESMapReduceMapReduce; hadoop; sequence analysis; data compressionMolecular Biologylcsh:QH301-705.5030304 developmental biologyFile system0303 health sciencesSettore INF/01 - InformaticaDatabasebusiness.industryMethodology ArticleApplied MathematicsSequence analysisGenomicsData compression; Hadoop; MapReduce; Sequence analysis; Algorithms; Big Data; Data Compression; Genomics; SoftwareComputer Science Applicationslcsh:Biology (General)Software deploymentHadoopData compressionlcsh:R858-859.7020201 artificial intelligence & image processingState (computer science)businesscomputerAlgorithmsSoftwareData compressionBMC Bioinformatics
researchProduct

Proposed use of a conversational agent for patient empowerment

2021

Empowerment is a process through which people acquire the necessary knowledge and self-awareness to understand their conditions and treatment options, make informed choices and self-manage their health conditions in daily life, in collaboration with medical professionals. Conversational Agents in healthcare could play an important role in the process of empowering a person but, so far, they have been seldom been used for this purpose. This paper presents the basic principles and preliminary implementation of a conversational health agent for patient empowerment. It dialogues with the user in a "natural" way, collects health data from heterogeneous sources and provides the user wit…

Big DataPatient EmpowermentSettore INF/01 - InformaticaPatient EmpowermentArtificial IntelligenceApplied psychologyConversational AgentDigital HealthDialog systemPsychologycomputer.software_genrecomputerTailored Health Communication
researchProduct

A systematic review of SQL-on-Hadoop by using compact data formats

2016

Article also submitted for publication in Baltic J. Modern Computing (BJMC) on October 5, 2016.

Big DataSQLHDFSGeneral Computer ScienceDatabaseComputer sciencebusiness.industryBig dataAvrocomputer.software_genreParquetWorld Wide WebHadoopSystematic reviewbusinesscomputercomputer.programming_language
researchProduct

Deep learning and process understanding for data-driven Earth system science

2017

Machine learning approaches are increasingly used to extract patterns and insights from the ever-increasing stream of geospatial data, but current approaches may not be optimal when system behaviour is dominated by spatial or temporal context. Here, rather than amending classical machine learning, we argue that these contextual cues should be used as part of deep learning (an approach that is able to extract spatio-temporal features automatically) to gain further process understanding of Earth system science problems, improving the predictive ability of seasonal forecasting and modelling of long-range spatial connections across multiple timescales, for example. The next step will be a hybri…

Big DataTime FactorsProcess modelingGeospatial analysis010504 meteorology & atmospheric sciencesProcess (engineering)0208 environmental biotechnologyBig dataGeographic Mapping02 engineering and technologycomputer.software_genreMachine learning01 natural sciencesPattern Recognition AutomatedData-drivenDeep LearningSpatio-Temporal AnalysisHumansComputer SimulationWeather0105 earth and related environmental sciencesMultidisciplinarybusiness.industryDeep learningUncertaintyReproducibility of ResultsTranslatingRegression Psychology020801 environmental engineeringEarth system scienceKnowledgePattern recognition (psychology)Earth SciencesFemaleSeasonsArtificial intelligencebusinessPsychologyFacial RecognitioncomputerForecastingNature
researchProduct

Cluster-based active learning for compact image classification

2010

In this paper, we consider active sampling to label pixels grouped with hierarchical clustering. The objective of the method is to match the data relationships discovered by the clustering algorithm with the user's desired class semantics. The first is represented as a complete tree to be pruned and the second is iteratively provided by the user. The active learning algorithm proposed searches the pruning of the tree that best matches the labels of the sampled points. By choosing the part of the tree to sample from according to current pruning's uncertainty, sampling is focused on most uncertain clusters. This way, large clusters for which the class membership is already fixed are no longer…

Binary treeContextual image classificationbusiness.industryActive learning (machine learning)Sampling (statistics)Pattern recognitioncomputer.software_genreHierarchical clusteringMulticlass classificationTree (data structure)ComputingMethodologies_PATTERNRECOGNITIONLife ScienceArtificial intelligenceData miningbusinessCluster analysiscomputerMathematics
researchProduct

On the Locality of Standard Search Operators in Grammatical Evolution

2014

Offspring should be similar to their parents and inherit their relevant properties. This general design principle of search operators in evolutionary algorithms is either known as locality or geometry of search operators, respectively. It takes a geometric perspective on search operators and suggests that the distance between an offspring and its parents should be less than or equal to the distance between both parents. This paper examines the locality of standard search operators used in grammatical evolution (GE) and genetic programming (GP) for binary tree problems. Both standard GE and GP search operators suffer from low locality since a substantial number of search steps result in an o…

Binary treeTheoretical computer sciencebusiness.industryPerspective (graphical)LocalityEvolutionary algorithmGenetic programmingcomputer.software_genreRandom walkGrammatical evolutionArtificial intelligencebusinesscomputerNatural language processingMathematics
researchProduct

Mapreduce in computational biology - A synopsis

2017

In the past 20 years, the Life Sciences have witnessed a paradigm shift in the way research is performed. Indeed, the computational part of biological and clinical studies has become central or is becoming so. Correspondingly, the amount of data that one needs to process, compare and analyze, has experienced an exponential growth. As a consequence, High Performance Computing (HPC, for short) is being used intensively, in particular in terms of multi-core architectures. However, recently and thanks to the advances in the processing of other scientific and commercial data, Distributed Computing is also being considered for Bioinformatics applications. In particular, the MapReduce paradigm, to…

BioinformaticSpark0301 basic medicineSettore INF/01 - InformaticaBioinformaticsProcess (engineering)Computer scienceComputer Science (all)Computational biologybioinformatics; distributed computing; hadoop; MapReduce; spark; computer science (all)Supercomputercomputer.software_genreDistributed computing03 medical and health sciences030104 developmental biologyExponential growthHadoopParadigm shiftMiddleware (distributed applications)Spark (mathematics)MapReducecomputer
researchProduct

New Trends in Graph Mining

2010

Searching for repeated features characterizing biological data is fundamental in computational biology. When biological networks are under analysis, the presence of repeated modules across the same network (or several distinct ones) is shown to be very relevant. Indeed, several studies prove that biological networks can be often understood in terms of coalitions of basic repeated building blocks, often referred to as network motifs.This work provides a review of the main techniques proposed for motif extraction from biological networks. In particular, main intrinsic difficulties related to the problem are pointed out, along with solutions proposed in the literature to overcome them. Open ch…

Bioinformatics network analysisNetwork motifBiological dataColoredComputer scienceGraph (abstract data type)Network scienceData miningMotif (music)computer.software_genrecomputerBiological networkInternational Journal of Knowledge Discovery in Bioinformatics
researchProduct

Natural Language Parsing

2009

Automatic natural language processing captures a lion’s share of the attention in open information management. In one way or another, many applications have to deal with natural language input. In this chapter the authors investigate the problem of natural language parsing from the perspective of biolinguistics. They argue that the human mind succeeds in the parsing task without the help of languagespecific rules of parsing and language-specific rules of grammar. Instead, there is a universal parser incorporating a universal grammar. The main argument comes from language acquisition: Children cannot learn language specific parsing rules by rule induction due to the complexity of unconstrain…

BiolinguisticsComputer science05 social sciencesMinimalism (technical communication)Natural language parsingcomputer.software_genre050105 experimental psychologyLinguistics03 medical and health sciencesTheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES0302 clinical medicine0501 psychology and cognitive sciencesMinimalist programcomputer030217 neurology & neurosurgeryNatural language
researchProduct