Search results for "Synthetic data"
showing 10 items of 34 documents
Propagation of errors due to incorrect positions of sources and detectors in wave-field tomography
2004
Tomographic data processed by 2D inversion programs can produce fairly large distortions due to incorrect source and/or detector positions. This problem is very serious in high-frequency electromagnetic tomography (GPR), due to the dimensions of the transmitter and receiver antennae. The errors can even be larger when coupled antennae are used (receiver and transmitter inside the same box) whose positions are not clearly known. Similar errors can be involved in seismic tomography, for instance when the mechanical connection between transducers and sample is defective. In this paper the problem has been studied using synthetic data which were calculated for different acquisition geometries. …
A new Multi-Layers Method to Analyze Gene Expression
2007
In the paper a new Multi-Layers approach (called Multi-Layers Model MLM) for the analysis of stochastic signals and its application to the analysis of gene expression data is presented. It consists in the generation of sub-samples from the input signal by applying a threshold technique based on cut-set optimal conditions. The MLM has been applied on synthetic and real microarray data for the identification of particular regions across DNA called nucleosomes and linkers. Nucleosomes are the fundamental repeating subunits of all eukaryotic chromatin, and their positioning provides useful information regarding the regulation of gene expression in eukaryotic cells. Results have shown a good rec…
Healthcare trajectory mining by combining multidimensional component and itemsets
2012
Sequential pattern mining is aimed at extracting correlations among temporal data. Many different methods were proposed to either enumerate sequences of set valued data (i.e., itemsets) or sequences containing multidimensional items. However, in real-world scenarios, data sequences are described as events of both multidimensional items and set valued information. These rich heterogeneous descriptions cannot be exploited by traditional approaches. For example, in healthcare domain, hospitalizations are defined as sequences of multi-dimensional attributes (e.g. Hospital or Diagnosis) associated with two sets, set of medical procedures (e.g. $ \lbrace $ Radiography, Appendectomy $\rbrace$) and…
Online Topology Identification from Vector Autoregressive Time Series
2019
Causality graphs are routinely estimated in social sciences, natural sciences, and engineering due to their capacity to efficiently represent the spatiotemporal structure of multivariate data sets in a format amenable for human interpretation, forecasting, and anomaly detection. A popular approach to mathematically formalize causality is based on vector autoregressive (VAR) models and constitutes an alternative to the well-known, yet usually intractable, Granger causality. Relying on such a VAR causality notion, this paper develops two algorithms with complementary benefits to track time-varying causality graphs in an online fashion. Their constant complexity per update also renders these a…
Pathway analysis of high-throughput biological data within a Bayesian network framework
2011
Abstract Motivation: Most current approaches to high-throughput biological data (HTBD) analysis either perform individual gene/protein analysis or, gene/protein set enrichment analysis for a list of biologically relevant molecules. Bayesian Networks (BNs) capture linear and non-linear interactions, handle stochastic events accounting for noise, and focus on local interactions, which can be related to causal inference. Here, we describe for the first time an algorithm that models biological pathways as BNs and identifies pathways that best explain given HTBD by scoring fitness of each network. Results: Proposed method takes into account the connectivity and relatedness between nodes of the p…
Classification and Automated Interpretation of Spinal Posture Data Using a Pathology-Independent Classifier and Explainable Artificial Intelligence (…
2021
Clinical classification models are mostly pathology-dependent and, thus, are only able to detect pathologies they have been trained for. Research is needed regarding pathology-independent classifiers and their interpretation. Hence, our aim is to develop a pathology-independent classifier that provides prediction probabilities and explanations of the classification decisions. Spinal posture data of healthy subjects and various pathologies (back pain, spinal fusion, osteoarthritis), as well as synthetic data, were used for modeling. A one-class support vector machine was used as a pathology-independent classifier. The outputs were transformed into a probability distribution according to Plat…
Analysis of discrete and continuous distributions of ventilatory time constants from dynamic computed tomography.
2005
In this study, an algorithm was developed to measure the distribution of pulmonary time constants (TCs) from dynamic computed tomography (CT) data sets during a sudden airway pressure step up. Simulations with synthetic data were performed to test the methodology as well as the influence of experimental noise. Furthermore the algorithm was applied to in vivo data. In five pigs sudden changes in airway pressure were imposed during dynamic CT acquisition in healthy lungs and in a saline lavage ARDS model. The fractional gas content in the imaged slice (FGC) was calculated by density measurements for each CT image. Temporal variations of the FGC were analysed assuming a model with a continuous…
Issues in synthetic data generation for advanced manufacturing
2017
To have any chance of application in real world, advanced manufacturing research in data analytics needs to explore and prove itself with real-world manufacturing data. Limited access to real-world data largely contrasts with the need for data of varied types and larger quantity for research. Use of virtual data is a promising approach to make up for the lack of access. This paper explores the issues, identifies challenges, and suggests requirements and desirable features in the generation of virtual data. These issues, requirements, and features can be used by researchers to build virtual data generators and gain experience that will provide data to data scientists while avoiding known or …
A one class classifier for Signal identification: a biological case study
2008
The paper describes an application of a one-class KNN to identify different signal patterns embedded in a noise structured background. The problem become harder whenever only one pattern is well represented in the signal, in such cases one class classifier techniques are more indicated. The classification phase is applied after a preprocessing phase based on a Multi Layer Model (MLM) that provides a preliminary signal segmentation in an interval feature space. The one-class KNN has been tested on synthetic data that simulate microarray data for the identification of nucleosomes and linker regions across DNA. Results have shown a good recognition rate on synthetic data for nucleosome and lin…
Locality-sensitive hashing enables signal classification in high-throughput mass spectrometry raw data at scale
2021
Mass spectrometry is an important experimental technique in the field of proteomics. However, analysis of certain mass spectrometry data faces a combination of two challenges: First, even a single experiment produces a large amount of multi-dimensional raw data and, second, signals of interest are not single peaks but patterns of peaks that span along the different dimensions. The rapidly growing amount of mass spectrometry data increases the demand for scalable solutions. Existing approaches for signal detection are usually not well suited for processing large amounts of data in parallel or rely on strong assumptions concerning the signals properties. In this study, it is shown that locali…