Variation in genomic landscape of clear cell renal cell carcinoma across Europe
The incidence of renal cell carcinoma (RCC) is increasing worldwide, and its prevalence is particularly high in some parts of Central Europe. Here we undertake whole-genome and transcriptome sequencing of clear cell RCC (ccRCC), the most common form of the disease, in patients from four different European countries with contrasting disease incidence to explore the underlying genomic architecture of RCC. Our findings support previous reports on frequent aberrations in the epigenetic machinery and PI3K/mTOR signalling, and uncover novel pathways and genes affected by recurrent mutations and abnormal transcriptome patterns including focal adhesion, components of extracellular matrix (ECM) and …
PASSIM – an open source software system for managing information in biomedical studies
Abstract Background One of the crucial aspects of day-to-day laboratory information management is collection, storage and retrieval of information about research subjects and biomedical samples. An efficient link between sample data and experiment results is absolutely imperative for a successful outcome of a biomedical study. Currently available software solutions are largely limited to large-scale, expensive commercial Laboratory Information Management Systems (LIMS). Acquiring such LIMS indeed can bring laboratory information management to a higher level, but often implies sufficient investment of time, effort and funds, which are not always available. There is a clear need for lightweig…
A simple algorithm for drawing large graphs on small screens
Viewing a large graph in limited display space has traditionally been accomplished using either reduced scale rendering of the graph or by attaching scrollbars to a view window which shows only a small portion of the entire graph. Recent work, however, has concentrated on integrating a locally detailed view with a globally scaled view. We present an algorithm for constructing a view which smoothly integrates local detail and global context in a single view window and describe user interaction with such a display.
Learning of regular expressions by pattern matching
We consider the problem of restoring regular expressions from good examples. We describe a natural learning algorithm for obtaining a “plausible” regular expression from one example. The algorithm is based on finding the longest substring which can be matched by some part of the so far obtained expression. We believe that the algorithm to a certain extent mimics humans guessing regular expressions from the same sort of examples. We show that for regular expressions of bounded length successful learning takes time linear in the length of the example, provided that the example is “good”. Under certain natural restrictions the run-time of the learning algorithm is polynomial also in unsuccessf…
Dynamics of gene regulatory networks and their dependence on network topology and quantitative parameters – the case of phage λ
Background Gene regulatory networks can be modelled in various ways depending on the level of detail required and biological questions addressed. One of the earliest formalisms used for modeling is a Boolean network, although these models cannot describe most temporal aspects of a biological system. Differential equation models have also been used to model gene regulatory networks, but these frameworks tend to be too detailed for large models and many quantitative parameters might not be deducible in practice. Hybrid models bridge the gap between these two model classes – these are useful when concentration changes are important while the information about precise concentrations and binding…
Efficient learning of regular expressions from good examples
We consider the problem of restoring regular expressions from expressive examples. We define the class of unambiguous regular expressions, the notion of the union number of an expression showing how many union operations can occur directly under any single iteration, and the notion of an expressive example. We present a polynomial time algorithm which tries to restore an unambiguous regular expression from one expressive example. We prove that if the union number of the expression is 0 or 1 and the example is long enough, then the algorithm correctly restores the original expression from one good example. The proof relies on original investigations in theory of covering symbol sequences (wo…
Using Deep Learning to Extrapolate Protein Expression Measurements
Mass spectrometry (MS)-based quantitative proteomics experiments typically assay a subset of up to 60% of the ≈20 000 human protein coding genes. Computational methods for imputing the missing values using RNA expression data usually allow only for imputations of proteins measured in at least some of the samples. In silico methods for comprehensively estimating abundances across all proteins are still missing. Here, a novel method is proposed using deep learning to extrapolate the observed protein expression values in label-free MS experiments to all proteins, leveraging gene functional annotations and RNA measurements as key predictive attributes. This method is tested on four datasets, in…
Discovering unbounded unions of regular pattern languages from positive examples
The problem of learning unions of certain pattern languages from positive examples is considered. We restrict to the regular patterns, i.e., patterns where each variable symbol can appear only once, and to the substring patterns, which is a subclass of regular patterns of the type xαy, where x and y are variables and α is a string of constant symbols. We present an algorithm that, given a set of strings, finds a good collection of patterns covering this set. The notion of a ‘good covering’ is defined as the most probable collection of patterns likely to be present in the examples, assuming a simple probabilistic model, or equivalently using the Minimum Description Length (MDL) principle. Ou…
Efficient algorithm for learning simple regular expressions from noisy examples
We present an efficient algorithm for finding approximate repetitions in a given sequence of characters. First, we define a class of simple regular expressions which are of star-height one and do not contain union operations, and a stochastic mutation process of a given length over a string of characters. Then, assuming that a given string of characters is obtained corrupted by the defined mutation process from some long enough word generated by a simple regular expression, we try to restore the expression. We prove that to within some reasonable accuracy it is always possible if the length of the mutation process is bounded comparing to the length of the example. We provide an algorithm by…
Inductive synthesis of dot expressions
We consider the problem of the synthesis of algorithms by sample computations. We introduce a formal language, namely, the so-called dot expressions, which is based on a formalization of the intuitive notion of ellipsis (‘...’). Whilst formally the dot expressions are simply a language describing sets of words, on the other hand, it can be considered as a programming language supporting quite a wide class of programs. Equivalence and asymptotical equivalence of dot expressions are defined and proved to be decidable. A formal example of a dot expression is defined in the way that, actually, it represents a sample computation of the program presented by the given dot expression. A system of s…
Additional file 1 of Dynamics of gene regulatory networks and their dependence on network topology and quantitative parameters – the case of phage λ
Software package implementing our proposed method of attractor analysis. It contains source files, user manual and the phage λ model described in this manuscript. Following subsections describe files from the package. ModelDescription.txt: Definition of the phage λ model that is analysed within this paper. ModelConstraints.txt: File that specifies partial constraints for the orderings of binding site affinities. Here, the constraints are applicable to our phage λ model. HSM_graph_analysis.cpp: The main component of the software that identifies all feasible states of a system. HSM_graph_analysis.h: The second component of the software for graph analysis. It is a C++ header file which contain…