Search results for "ALGORITHM"
showing 10 items of 4887 documents
PINCoC: a Co-Clustering based Method to Analyze Protein-Protein Interaction Networks
2007
Anovel technique to search for functionalmodules in a protein-protein interaction network is presented. The network is represented by the adjacency matrix associated with the undirected graph modelling it. The algorithm introduces the concept of quality of a sub-matrix of the adjacency matrix, and applies a greedy search technique for finding local optimal solutions made of dense submatrices containing the maximum number of ones. An initial random solution, constituted by a single protein, is evolved to search for a locally optimal solution by adding/removing connected proteins that best contribute to improve the quality function. Experimental evaluations carried out on Saccaromyces Cerevis…
A Collaborative Filtering Approach for Drug Repurposing
2022
A recommendation system is proposed based on the construction of Knowledge Graphs, where physical interaction between proteins and associations between drugs and targets are taken into account. The system suggests new targets for a given drug depending on how proteins are linked each other in the graph. The framework adopted for the implementation of the proposed approach is Apache Spark, useful for loading, managing and manipulating data by means of appropriate Resilient Distributed Datasets (RDD). Moreover, the Alternating Least Square (ALS) machine learning algorithm, a Matrix Factorization algorithm for distributed and parallel computing, is applied. Preliminary obtained results seem to…
“The datafication and commodification of Italian schools during the Covid-19 crisis. Implications for policy and future research”
2022
Big Data and algorithms increasingly inform public policymaking and institutional practices, producing an impact on people’s everyday life. An emerging body of scholarly research—Critical Data Studies—has been working on this role shedding light on how society’s current platformisation is linked to a much longer privatization and reorganization of the public sector. This chapter intends to reflect on how the Covid-19 pandemic has dramatically accelerated these processes focusing on school education in particular. Health Big Data and apps have been crucial to take concrete measures to fight the pandemic, while platforms have helped organize vaccination rounds. Nevertheless, they have also be…
Big Data in metagenomics: Apache Spark vs MPI.
2020
The progress of next-generation sequencing has lead to the availability of massive data sets used by a wide range of applications in biology and medicine. This has sparked significant interest in using modern Big Data technologies to process this large amount of information in distributed memory clusters of commodity hardware. Several approaches based on solutions such as Apache Hadoop or Apache Spark, have been proposed. These solutions allow developers to focus on the problem while the need to deal with low level details, such as data distribution schemes or communication patterns among processing nodes, can be ignored. However, performance and scalability are also of high importance when…
The Datafication of Hate: Expectations and Challenges in Automated Hate Speech Monitoring.
2020
Laaksonen, S-M.; Haapoja, J.; Kinnunen, T., Nelimarkka, M. & Pöyhtäri, R. (2020, accepted). . Frontiers in Big Data: Data Mining and Management / Critical Data and Algorithm Studies. doi:10.3389/fdata.2020.00003 Hate speech has been identified as a pressing problem in society and several automated approaches have been designed to detect and prevent it. This paper reports and reflects upon an action research setting consisting of multi-organizational collaboration conducted during Finnish municipal elections in 2017, wherein a technical infrastructure was designed to automatically monitor candidates' social media updates for hate speech. The setting allowed us to engage in a 2-fold investiga…
FASTA/Q data compressors for MapReduce-Hadoop genomics: space and time savings made easy
2021
Abstract Background Storage of genomic data is a major cost for the Life Sciences, effectively addressed via specialized data compression methods. For the same reasons of abundance in data production, the use of Big Data technologies is seen as the future for genomic data storage and processing, with MapReduce-Hadoop as leaders. Somewhat surprisingly, none of the specialized FASTA/Q compressors is available within Hadoop. Indeed, their deployment there is not exactly immediate. Such a State of the Art is problematic. Results We provide major advances in two different directions. Methodologically, we propose two general methods, with the corresponding software, that make very easy to deploy …
On Big Data: How should we make sense of them?
2020
The topic of Big Data is today extensively discussed, not only on the technical ground. This also depends on the fact that Big Data are frequently presented as allowing an epistemological paradigm shift in scientific research, which would be able to supersede the traditional hypothesis-driven method. In this piece, I critically scrutinize two key claims that are usually associated with this approach, namely, the fact that data speak for themselves, deflating the role of theories and models, and the primacy of correlation over causation. In so doing, I will also refer to a recent case history of data mining projects in the field of biomedicine, i.e. EXPOsOMICS. My intention is both to acknow…
2013
Currently, a growing number of programs become available in statistical software for multiple imputation of missing values. Among others, two algorithms are mainly implemented: Expectation Maximization (EM) and Multiple Imputation by Chained Equations (MICE). They have been shown to work well in large samples or when only small proportions of missing data are to be imputed. However, some researchers have begun to impute large proportions of missing data or to apply the method to small samples. A simulation was performed using MICE on datasets with 50, 100 or 200 cases and four or eleven variables. A varying proportion of data (3% - 63%) was set as missing completely at random and subsequent…
Fast Algorithms for Pseudoarboricity
2015
The densest subgraph problem, which asks for a subgraph with the maximum edges-to-vertices ratio d∗, is solvable in polynomial time. We discuss algorithms for this problem and the computation of a graph orientation with the lowest maximum indegree, which is equal to ⌈d∗⌉. This value also equals the pseudoarboricity of the graph. We show that it can be computed in O(|E| √ log log d∗) time, and that better estimates can be given for graph classes where d∗ satisfies certain asymptotic bounds. These runtimes are achieved by accelerating a binary search with an approximation scheme, and a runtime analysis of Dinitz’s algorithm on flow networks where all arcs, except the source and sink arcs, hav…
A new compact formulation for the discrete p-dispersion problem
2017
Abstract This paper addresses the discrete p -dispersion problem (PDP) which is about selecting p facilities from a given set of candidates in such a way that the minimum distance between selected facilities is maximized. We propose a new compact formulation for this problem. In addition, we discuss two simple enhancements of the new formulation: Simple bounds on the optimal distance can be exploited to reduce the size and to increase the tightness of the model at a relatively low cost of additional computation time. Moreover, the new formulation can be further strengthened by adding valid inequalities. We present a computational study carried out over a set of large-scale test instances i…