Search results for "Informatics"
showing 10 items of 2542 documents
Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases
2019
AbstractThe widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with ‘ready-to-use’ deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotatio…
Extending the Tsetlin Machine With Integer-Weighted Clauses for Increased Interpretability
2020
Despite significant effort, building models that are both interpretable and accurate is an unresolved challenge for many pattern recognition problems. In general, rule-based and linear models lack accuracy, while deep learning interpretability is based on rough approximations of the underlying inference. Using a linear combination of conjunctive clauses in propositional logic, Tsetlin Machines (TMs) have shown competitive performance on diverse benchmarks. However, to do so, many clauses are needed, which impacts interpretability. Here, we address the accuracy-interpretability challenge in machine learning by equipping the TM clauses with integer weights. The resulting Integer Weighted TM (…
Using the Tsetlin Machine to Learn Human-Interpretable Rules for High-Accuracy Text Categorization With Medical Applications
2019
Medical applications challenge today's text categorization techniques by demanding both high accuracy and ease-of-interpretation. Although deep learning has provided a leap ahead in accuracy, this leap comes at the sacrifice of interpretability. To address this accuracy-interpretability challenge, we here introduce, for the first time, a text categorization approach that leverages the recently introduced Tsetlin Machine. In all brevity, we represent the terms of a text as propositional variables. From these, we capture categories using simple propositional formulae, such as: if "rash" and "reaction" and "penicillin" then Allergy. The Tsetlin Machine learns these formulae from a labelled tex…
Multiscale analysis of information dynamics for linear multivariate processes.
2016
In the study of complex physical and physiological systems represented by multivariate time series, an issue of great interest is the description of the system dynamics over a range of different temporal scales. While information-theoretic approaches to the multiscale analysis of complex dynamics are being increasingly used, the theoretical properties of the applied measures are poorly understood. This study introduces for the first time a framework for the analytical computation of information dynamics for linear multivariate stochastic processes explored at different time scales. After showing that the multiscale processing of a vector autoregressive (VAR) process introduces a moving aver…
A Proposed Access Control-Based Privacy Preservation Model to Share Healthcare Data in Cloud
2020
Healthcare data in cloud computing facilitates the treatment of patients efficiently by sharing information about personal health data between the healthcare providers for medical consultation. Furthermore, retaining the confidentiality of data and patients' identity is a another challenging task. This paper presents the concept of an access control-based (AC) privacy preservation model for the mutual authentication of users and data owners in the proposed digital system. The proposed model offers a high-security guarantee and high efficiency. The proposed digital system consists of four different entities, user, data owner, cloud server, and key generation center (KGC). This approach makes…
Alignment-free Genomic Analysis via a Big Data Spark Platform
2021
Abstract Motivation Alignment-free distance and similarity functions (AF functions, for short) are a well-established alternative to pairwise and multiple sequence alignments for many genomic, metagenomic and epigenomic tasks. Due to data-intensive applications, the computation of AF functions is a Big Data problem, with the recent literature indicating that the development of fast and scalable algorithms computing AF functions is a high-priority task. Somewhat surprisingly, despite the increasing popularity of Big Data technologies in computational biology, the development of a Big Data platform for those tasks has not been pursued, possibly due to its complexity. Results We fill this impo…
Random Walk in a N-cube Without Hamiltonian Cycle to Chaotic Pseudorandom Number Generation: Theoretical and Practical Considerations
2017
Designing a pseudorandom number generator (PRNG) is a difficult and complex task. Many recent works have considered chaotic functions as the basis of built PRNGs: the quality of the output would indeed be an obvious consequence of some chaos properties. However, there is no direct reasoning that goes from chaotic functions to uniform distribution of the output. Moreover, embedding such kind of functions into a PRNG does not necessarily allow to get a chaotic output, which could be required for simulating some chaotic behaviors. In a previous work, some of the authors have proposed the idea of walking into a $\mathsf{N}$-cube where a balanced Hamiltonian cycle has been removed as the basis o…
Using the transit of Venus to probe the upper planetary atmosphere
2015
The atmosphere of a transiting planet shields the stellar radiation providing us with a powerful method to estimate its size and density. In particular, because of their high ionization energy, atoms with high atomic number (Z) absorb short-wavelength radiation in the upper atmosphere, undetectable with observations in visible light. One implication is that the planet should appear larger during a primary transit observed in high energy bands than in the optical band. The last Venus transit in 2012 offered a unique opportunity to study this effect. The transit has been monitored by solar space observations from Hinode and Solar Dynamics Observatory (SDO). We measure the radius of Venus duri…
Broad spectrum of Fabry disease manifestation in an extended Spanish family with a new deletion in the GLA gene
2012
Background. Fabry disease (FD) is an X-linked inherited disease based on the absence or reduction of lysosomal-galactosidase (Gla) activity. The enzymatic defect results in progressive impairment of cerebrovascular, renal and cardiac function. Normally, female heterozygote mutation carriers are less strongly affected than male hemizygotes aggravating disease diagnosis. Method. Close examination of the patients by renal biopsy, echo- and electrocardiography and MRI. Blood work and subsequent DNA analysis were carried out utilizing approved protocols for PCR and Sequencing. MLPA analysis was done to unveil deletions within the GLA gene locus. Quantitative detection of Glycolipids in patient p…
Chances and Challenges of Computational Data Gathering and Analysis
2015
Digital and social media and large available data-sets generate various new possibilities and challenges for conducting research focused on perpetually developing online news ecosystems. This paper presents a novel computational technique for gathering and processing large quantities of data from Facebook. We demonstrate how to use this technique for detecting and analysing issue-attention cycles and news flows in Facebook groups and pages. Although the paper concentrates on a Finnish Facebook group as a case study, the demonstrated method can be used for gathering and analysing large sets of data from various social network sites and national contexts. The paper also discusses Facebook pla…