0000000001018110
AUTHOR
Mārtiņš Opmanis
Application of Graph Clustering and Visualisation Methods to Analysis of Biomolecular Data
In this paper we present an approach based on integrated use of graph clustering and visualisation methods for semi-supervised discovery of biologically significant features in biomolecular data sets. We describe several clustering algorithms that have been custom designed for analysis of biomolecular data and feature an iterated two step approach involving initial computation of thresholds and other parameters used in clustering algorithms, which is followed by identification of connected graph components, and, if needed, by adjustment of clustering parameters for processing of individual subgraphs.
Variation in genomic landscape of clear cell renal cell carcinoma across Europe
The incidence of renal cell carcinoma (RCC) is increasing worldwide, and its prevalence is particularly high in some parts of Central Europe. Here we undertake whole-genome and transcriptome sequencing of clear cell RCC (ccRCC), the most common form of the disease, in patients from four different European countries with contrasting disease incidence to explore the underlying genomic architecture of RCC. Our findings support previous reports on frequent aberrations in the epigenetic machinery and PI3K/mTOR signalling, and uncover novel pathways and genes affected by recurrent mutations and abnormal transcriptome patterns including focal adhesion, components of extracellular matrix (ECM) and …
Characteristic Topological Features of Promoter Capture Hi-C Interaction Networks
Current Hi-C technologies for chromosome conformation capture allow to understand a broad spectrum of functional interactions between genome elements. Although significant progress has been made into analysis of Hi-C data to identify the biologically significant features, many questions still remain open. In this paper we describe analysis methods of Hi-C (specifically PCHi-C) interaction networks that are strictly focused on topological properties of these networks. The main questions we are trying to answer are: (1) can topological properties of interaction networks for different cell types alone be sufficient to distinguish between these types, and what the most important of such propert…
PASSIM – an open source software system for managing information in biomedical studies
Abstract Background One of the crucial aspects of day-to-day laboratory information management is collection, storage and retrieval of information about research subjects and biomedical samples. An efficient link between sample data and experiment results is absolutely imperative for a successful outcome of a biomedical study. Currently available software solutions are largely limited to large-scale, expensive commercial Laboratory Information Management Systems (LIMS). Acquiring such LIMS indeed can bring laboratory information management to a higher level, but often implies sufficient investment of time, effort and funds, which are not always available. There is a clear need for lightweig…
Graph-based network analysis of transcriptional regulation pattern divergence in duplicated yeast gene pairs
The genome and interactome of Saccharomyces cerevisiae have been characterized extensively over the course of the past few decades. However, despite many insights gained over the years, both functional studies and evolutionary analyses continue to reveal many complexities and confounding factors in the construction of reliable transcriptional regulatory network models. We present here a graph-based technique for comparing transcriptional regulatory networks based on network motif similarity for gene pairs. We construct interaction graphs for duplicated transcription factor pairs traceable to the ancestral whole-genome duplication as well as other paralogues in Saccharomyces cerevisiae. We c…
Integer Complexity: Experimental and Analytical Results
We consider representing of natural numbers by arithmetical expressions using ones, addition, multiplication and parentheses. The (integer) complexity of n -- denoted by ||n|| -- is defined as the number of ones in the shortest expressions representing n. We arrive here very soon at the problems that are easy to formulate, but (it seems) extremely hard to solve. In this paper we represent our attempts to explore the field by means of experimental mathematics. Having computed the values of ||n|| up to 10^12 we present our observations. One of them (if true) implies that there is an infinite number of Sophie Germain primes, and even that there is an infinite number of Cunningham chains of len…
Integer Complexity: Experimental and Analytical Results II
We consider representing of natural numbers by expressions using 1's, addition, multiplication and parentheses. $\left\| n \right\|$ denotes the minimum number of 1's in the expressions representing $n$. The logarithmic complexity $\left\| n \right\|_{\log}$ is defined as $\left\| n \right\|/{\log_3 n}$. The values of $\left\| n \right\|_{\log}$ are located in the segment $[3, 4.755]$, but almost nothing is known with certainty about the structure of this "spectrum" (are the values dense somewhere in the segment etc.). We establish a connection between this problem and another difficult problem: the seemingly "almost random" behaviour of digits in the base 3 representations of the numbers $…
Root cause analysis of large scale application testing results
In this paper we present a new root cause analysis algorithm for discovering the most likely causes of the differences found in testing results of two versions of the same software. The problematic points in test and environment attribute hierarchies are presented to the user in compact way which in turn allows to save time on test result processing. We have proven that for clearly separated problem causes our algorithm gives exact solution. Practical application of described method is discussed.
Using Deep Learning to Extrapolate Protein Expression Measurements
Mass spectrometry (MS)-based quantitative proteomics experiments typically assay a subset of up to 60% of the ≈20 000 human protein coding genes. Computational methods for imputing the missing values using RNA expression data usually allow only for imputations of proteins measured in at least some of the samples. In silico methods for comprehensively estimating abundances across all proteins are still missing. Here, a novel method is proposed using deep learning to extrapolate the observed protein expression values in label-free MS experiments to all proteins, leveraging gene functional annotations and RNA measurements as key predictive attributes. This method is tested on four datasets, in…
Integer Complexity: Experimental and Analytical Results II
We consider representing natural numbers by expressions using only 1’s, addition, multiplication and parentheses. Let \( \left\| n \right\| \) denote the minimum number of 1’s in the expressions representing \(n\). The logarithmic complexity \( \left\| n \right\| _{\log } \) is defined to be \({ \left\| n \right\| }/{\log _3 n}\). The values of \( \left\| n \right\| _{\log } \) are located in the segment \([3, 4.755]\), but almost nothing is known with certainty about the structure of this “spectrum” (are the values dense somewhere in the segment?, etc.). We establish a connection between this problem and another difficult problem: the seemingly “almost random” behaviour of digits in the ba…
Fragile Correctness of Social Network Analysis
Draft version of the paper
Mobile phone data statistics as a dynamic proxy indicator in assessing regional economic activity and human commuting patterns
Graph-based characterisations of cell types and functionally related modules in promoter capture Hi-C Data
Pattern Identification by Factor Analysis for Regions with Similar Economic Activity Based on Mobile Communication Data
The study analyses the regions’ economic activity in Latvia using Latvia Mobile Telephone (LMT) mobile communication data from July 2015 to January 2017. The call activity and a number of unique phone users by 119 Latvia counties and biggest cities were analysed in two steps: at first method of principal components was used to explain the variance in the data and then exploratory factor analysis was applied. Three factors were identified that describe 87.5% of the total variance of the aggregated daily data. The first factor is related more to the regions with higher economic activity, the second and third factors capture, respectively, lowers call activity during weekdays and are related t…
Topological structure analysis of chromatin interaction networks.
Abstract Background Current Hi-C technologies for chromosome conformation capture allow to understand a broad spectrum of functional interactions between genome elements. Although significant progress has been made into analysis of Hi-C data to identify biologically significant features, many questions still remain open, in particular regarding potential biological significance of various topological features that are characteristic for chromatin interaction networks. Results It has been previously observed that promoter capture Hi-C (PCHi-C) interaction networks tend to separate easily into well-defined connected components that can be related to certain biological functionality, however, …
Network motif-based analysis of regulatory patterns in paralogous gene pairs
Current high-throughput experimental techniques make it feasible to infer gene regulatory interactions at the whole-genome level with reasonably good accuracy. Such experimentally inferred regulatory networks have become available for a number of simpler model organisms such as S. cerevisiae, and others. The availability of such networks provides an opportunity to compare gene regulatory processes at the whole genome level, and in particular, to assess similarity of regulatory interactions for homologous gene pairs either from the same or from different species. We present here a new technique for analyzing the regulatory interaction neighborhoods of paralogous gene pairs. Our central focu…