Search results for "ALGORITHM"

showing 10 items of 4887 documents

PINCoC: a Co-Clustering based Method to Analyze Protein-Protein Interaction Networks

2007

Anovel technique to search for functionalmodules in a protein-protein interaction network is presented. The network is represented by the adjacency matrix associated with the undirected graph modelling it. The algorithm introduces the concept of quality of a sub-matrix of the adjacency matrix, and applies a greedy search technique for finding local optimal solutions made of dense submatrices containing the maximum number of ones. An initial random solution, constituted by a single protein, is evolved to search for a locally optimal solution by adding/removing connected proteins that best contribute to improve the quality function. Experimental evaluations carried out on Saccaromyces Cerevis…

BiclusteringMathematical optimizationBioinformatics network analysisCompact spaceInteraction networkBlock matrixFunction (mathematics)Adjacency matrixGreedy algorithmAlgorithmProtein protein interaction networkMathematics
researchProduct

A Collaborative Filtering Approach for Drug Repurposing

2022

A recommendation system is proposed based on the construction of Knowledge Graphs, where physical interaction between proteins and associations between drugs and targets are taken into account. The system suggests new targets for a given drug depending on how proteins are linked each other in the graph. The framework adopted for the implementation of the proposed approach is Apache Spark, useful for loading, managing and manipulating data by means of appropriate Resilient Distributed Datasets (RDD). Moreover, the Alternating Least Square (ALS) machine learning algorithm, a Matrix Factorization algorithm for distributed and parallel computing, is applied. Preliminary obtained results seem to…

Big Data technologiesLatent factorsSettore INF/01 - InformaticaDrugsMachine learning algorithms
researchProduct

“The datafication and commodification of Italian schools during the Covid-19 crisis. Implications for policy and future research”

2022

Big Data and algorithms increasingly inform public policymaking and institutional practices, producing an impact on people’s everyday life. An emerging body of scholarly research—Critical Data Studies—has been working on this role shedding light on how society’s current platformisation is linked to a much longer privatization and reorganization of the public sector. This chapter intends to reflect on how the Covid-19 pandemic has dramatically accelerated these processes focusing on school education in particular. Health Big Data and apps have been crucial to take concrete measures to fight the pandemic, while platforms have helped organize vaccination rounds. Nevertheless, they have also be…

Big Data algorithms Covid-19 school education platform societySettore SPS/08 - Sociologia Dei Processi Culturali E Comunicativi
researchProduct

Big Data in metagenomics: Apache Spark vs MPI.

2020

The progress of next-generation sequencing has lead to the availability of massive data sets used by a wide range of applications in biology and medicine. This has sparked significant interest in using modern Big Data technologies to process this large amount of information in distributed memory clusters of commodity hardware. Several approaches based on solutions such as Apache Hadoop or Apache Spark, have been proposed. These solutions allow developers to focus on the problem while the need to deal with low level details, such as data distribution schemes or communication patterns among processing nodes, can be ignored. However, performance and scalability are also of high importance when…

Big DataComputer and Information SciencesScienceBig dataMessage Passing InterfaceParallel computingResearch and Analysis MethodsComputing MethodologiesComputing MethodologiesComputer ArchitectureComputer SoftwareDatabase and Informatics MethodsSoftwareSpark (mathematics)GeneticsMammalian GenomicsMultidisciplinarybusiness.industryApplied MathematicsSimulation and ModelingQRBiology and Life SciencesComputational BiologySoftware EngineeringGenomicsDNAGenomic DatabasesGenome AnalysisComputer HardwareSupercomputerBiological DatabasesAnimal GenomicsPhysical SciencesScalabilityEngineering and TechnologyMetagenomeMedicineDistributed memoryMetagenomicsbusinessMathematicsAlgorithmsGenome BacterialSoftwareResearch ArticlePLoS ONE
researchProduct

The Datafication of Hate: Expectations and Challenges in Automated Hate Speech Monitoring.

2020

Laaksonen, S-M.; Haapoja, J.; Kinnunen, T., Nelimarkka, M. & Pöyhtäri, R. (2020, accepted). . Frontiers in Big Data: Data Mining and Management / Critical Data and Algorithm Studies. doi:10.3389/fdata.2020.00003 Hate speech has been identified as a pressing problem in society and several automated approaches have been designed to detect and prevent it. This paper reports and reflects upon an action research setting consisting of multi-organizational collaboration conducted during Finnish municipal elections in 2017, wherein a technical infrastructure was designed to automatically monitor candidates' social media updates for hate speech. The setting allowed us to engage in a 2-fold investiga…

Big DataComputer sciencehate speechsocial media518 Media and communicationssosiaalinen mediamonitorointi050801 communication & media studiesSocial issues0508 media and communicationspolitiikkadatatiedeArtificial Intelligencealgoritmit050602 political science & public administrationComputer Science (miscellaneous)Social mediaalgorithmic systemvihapuheAction researchObjectivity (science)Original Researchlcsh:T58.5-58.64DataficationSocial phenomenonlcsh:Information technologytekstinlouhinta05 social sciencesCitizen journalism16. Peace & justice113 Computer and information sciencesData science0506 political sciencekoneoppiminenmachine learningNeutralitydata sciencepoliticsInformation Systems
researchProduct

FASTA/Q data compressors for MapReduce-Hadoop genomics: space and time savings made easy

2021

Abstract Background Storage of genomic data is a major cost for the Life Sciences, effectively addressed via specialized data compression methods. For the same reasons of abundance in data production, the use of Big Data technologies is seen as the future for genomic data storage and processing, with MapReduce-Hadoop as leaders. Somewhat surprisingly, none of the specialized FASTA/Q compressors is available within Hadoop. Indeed, their deployment there is not exactly immediate. Such a State of the Art is problematic. Results We provide major advances in two different directions. Methodologically, we propose two general methods, with the corresponding software, that make very easy to deploy …

Big DataFASTQ formatComputer scienceBig data02 engineering and technologycomputer.software_genrelcsh:Computer applications to medicine. Medical informaticsBiochemistry03 medical and health sciencesSoftwareStructural BiologySpark (mathematics)0202 electrical engineering electronic engineering information engineeringData_FILESMapReduceMapReduce; hadoop; sequence analysis; data compressionMolecular Biologylcsh:QH301-705.5030304 developmental biologyFile system0303 health sciencesSettore INF/01 - InformaticaDatabasebusiness.industryMethodology ArticleApplied MathematicsSequence analysisGenomicsData compression; Hadoop; MapReduce; Sequence analysis; Algorithms; Big Data; Data Compression; Genomics; SoftwareComputer Science Applicationslcsh:Biology (General)Software deploymentHadoopData compressionlcsh:R858-859.7020201 artificial intelligence & image processingState (computer science)businesscomputerAlgorithmsSoftwareData compressionBMC Bioinformatics
researchProduct

On Big Data: How should we make sense of them?

2020

The topic of Big Data is today extensively discussed, not only on the technical ground. This also depends on the fact that Big Data are frequently presented as allowing an epistemological paradigm shift in scientific research, which would be able to supersede the traditional hypothesis-driven method. In this piece, I critically scrutinize two key claims that are usually associated with this approach, namely, the fact that data speak for themselves, deflating the role of theories and models, and the primacy of correlation over causation. In so doing, I will also refer to a recent case history of data mining projects in the field of biomedicine, i.e. EXPOsOMICS. My intention is both to acknow…

Big DataValue (ethics)causalityMultidisciplinarydata-driven scienceComputer sciencebusiness.industryBig dataepistemologyopacity of algorithm.Data scienceend of theoryHistory and Philosophy of ScienceParadigm shiftKey (cryptography)CausationHeuristicsbusinessMètode Revista de difusió de la investigació
researchProduct

2013

Currently, a growing number of programs become available in statistical software for multiple imputation of missing values. Among others, two algorithms are mainly implemented: Expectation Maximization (EM) and Multiple Imputation by Chained Equations (MICE). They have been shown to work well in large samples or when only small proportions of missing data are to be imputed. However, some researchers have begun to impute large proportions of missing data or to apply the method to small samples. A simulation was performed using MICE on datasets with 50, 100 or 200 cases and four or eleven variables. A varying proportion of data (3% - 63%) was set as missing completely at random and subsequent…

Binary responseSample size determinationStatisticsExpectation–maximization algorithmEconometricsMain effectImputation (statistics)Missing dataInteractionLogistic regressionMathematicsOpen Journal of Statistics
researchProduct

Fast Algorithms for Pseudoarboricity

2015

The densest subgraph problem, which asks for a subgraph with the maximum edges-to-vertices ratio d∗, is solvable in polynomial time. We discuss algorithms for this problem and the computation of a graph orientation with the lowest maximum indegree, which is equal to ⌈d∗⌉. This value also equals the pseudoarboricity of the graph. We show that it can be computed in O(|E| √ log log d∗) time, and that better estimates can be given for graph classes where d∗ satisfies certain asymptotic bounds. These runtimes are achieved by accelerating a binary search with an approximation scheme, and a runtime analysis of Dinitz’s algorithm on flow networks where all arcs, except the source and sink arcs, hav…

Binary search algorithmComputation0102 computer and information sciences02 engineering and technologyOrientation (graph theory)01 natural sciencesFlow (mathematics)010201 computation theory & mathematicsLog-log plotTheoryofComputation_ANALYSISOFALGORITHMSANDPROBLEMCOMPLEXITY0202 electrical engineering electronic engineering information engineeringGraph (abstract data type)020201 artificial intelligence & image processingUnit (ring theory)AlgorithmTime complexityMathematicsofComputing_DISCRETEMATHEMATICSMathematics2016 Proceedings of the Eighteenth Workshop on Algorithm Engineering and Experiments (ALENEX)
researchProduct

A new compact formulation for the discrete p-dispersion problem

2017

Abstract This paper addresses the discrete p -dispersion problem (PDP) which is about selecting  p facilities from a given set of candidates in such a way that the minimum distance between selected facilities is maximized. We propose a new compact formulation for this problem. In addition, we discuss two simple enhancements of the new formulation: Simple bounds on the optimal distance can be exploited to reduce the size and to increase the tightness of the model at a relatively low cost of additional computation time. Moreover, the new formulation can be further strengthened by adding valid inequalities. We present a computational study carried out over a set of large-scale test instances i…

Binary search algorithmMathematical optimization021103 operations researchInformation Systems and ManagementLine searchGeneral Computer Science0211 other engineering and technologies0102 computer and information sciences02 engineering and technologyManagement Science and Operations ResearchSolver01 natural sciencesIndustrial and Manufacturing EngineeringFacility location problemSet (abstract data type)010201 computation theory & mathematicsModeling and SimulationProgramming paradigmInteger programmingAlgorithmStandard model (cryptography)MathematicsEuropean Journal of Operational Research
researchProduct