Search results for "clustering"
showing 10 items of 446 documents
SpCLUST: Towards a fast and reliable clustering for potentially divergent biological sequences
2019
International audience; This paper presents SpCLUST, a new C++ package that takes a list of sequences as input, aligns them with MUSCLE, computes their similarity matrix in parallel and then performs the clustering. SpCLUST extends a previously released software by integrating additional scoring matrices which enables it to cover the clustering of amino-acid sequences. The similarity matrix is now computed in parallel according to the master/slave distributed architecture, using MPI. Performance analysis, realized on two real datasets of 100 nucleotide sequences and 1049 amino-acids ones, show that the resulting library substantially outperforms the original Python package. The proposed pac…
Retrospective Proteomic Screening of 100 Breast Cancer Tissues.
2017
The present investigation has been conducted on one hundred tissue fragments of breast cancer, collected and immediately cryopreserved following the surgical resection. The specimens were selected from patients with invasive ductal carcinoma of the breast, the most frequent and potentially aggressive type of mammary cancer, with the objective to increase the knowledge of breast cancer molecular markers potentially useful for clinical applications. The proteomic screening; by 2D-IPG and mass spectrometry; allowed us to identify two main classes of protein clusters: proteins expressed ubiquitously at high levels in all patients; and proteins expressed sporadically among the same patients. Wit…
FragClust and TestClust, two informatics tools for chemical structure hierarchical clustering analysis applied to lipidomics. The example of Alzheime…
2016
Lipidomic analysis is able to measure simultaneously thousands of compounds belonging to a few lipid classes. In each lipid class, compounds differ only by the acyl radical, ranging between C10:0 (capric acid) and C24:0 (lignoceric acid). Although some metabolites have a peculiar pathological role, more often compounds belonging to a single lipid class exert the same biological effect. Here, we present a lipidomics workflow that extracts the tandem mass spectrometry data from individual files and uses them to group compounds into structurally homogeneous clusters by chemical structure hierarchical clustering analysis (CHCA). The case-to-control peak area ratios of the metabolites are then a…
A clustering package for nucleotide sequences using Laplacian Eigenmaps and Gaussian Mixture Model.
2018
International audience; In this article, a new Python package for nucleotide sequences clustering is proposed. This package, freely available on-line, implements a Laplacian eigenmap embedding and a Gaussian Mixture Model for DNA clustering. It takes nucleotide sequences as input, and produces the optimal number of clusters along with a relevant visualization. Despite the fact that we did not optimise the computational speed, our method still performs reasonably well in practice. Our focus was mainly on data analytics and accuracy and as a result, our approach outperforms the state of the art, even in the case of divergent sequences. Furthermore, an a priori knowledge on the number of clust…
Autoimmune polyglandular diseases.
2019
Autoimmune polyglandular diseases (APD) are defined as the presence of two autoimmune -induced endocrine failures. With respect to the significant morbidity and potential mortality of APD, the diagnostic objective is to detect APD at an early stage, with the advantage of less frequent complications, effective therapy and better prognosis. This requires that patients at risk be regularly screened for subclinical endocrinopathies prior to clinical manifestation. Regarding the time interval between manifestation of first and further endocrinopathies, regular and long-term follow-up is warranted. Quality of life and psychosocial status are poor in APD patients and involved relatives. Familial c…
Innovative Strategies to Develop Chemical Categories Using a Combination of Structural and Toxicological Properties.
2016
Interest is increasing in the development of non-animal methods for toxicological evaluations. These methods are however, particularly challenging for complex toxicological endpoints such as repeated dose toxicity. European Legislation, e.g., the European Union's Cosmetic Directive and REACH, demands the use of alternative methods. Frameworks, such as the Read-across Assessment Framework or the Adverse Outcome Pathway Knowledge Base, support the development of these methods. The aim of the project presented in this publication was to develop substance categories for a read-across with complex endpoints of toxicity based on existing databases. The basic conceptual approach was to combine str…
CUDA-enabled hierarchical ward clustering of protein structures based on the nearest neighbour chain algorithm
2015
Clustering of molecular systems according to their three-dimensional structure is an important step in many bioinformatics workflows. In applications such as docking or structure prediction, many algorithms initially generate large numbers of candidate poses (or decoys), which are then clustered to allow for subsequent computationally expensive evaluations of reasonable representatives. Since the number of such candidates can easily range from thousands to millions, performing the clustering on standard central processing units (CPUs) is highly time consuming. In this paper, we analyse and evaluate different approaches to parallelize the nearest neighbour chain algorithm to perform hierarc…
Multivariate statistical analysis of a large odorants database aimed at revealing similarities and links between odorants and odors
2017
International audience; The perception of odor is an important component of smell; the first step of odor detection, and the discrimination of structurally diverse odorants depends on their interactions with olfactory receptors (ORs). Indeed, the perception of an odor's quality results from a combinatorial coding, in which the deciphering remains a major challenge. Several studies have successfully established links between odors and odorants by categorizing and classifying data. Hence, the categorization of odors appears to be a promising way to manage odors. In the proposed study, we performed a computational analysis using odor descriptions of the odorants present in Flavor-Base 9th Edit…
Differentiating cancer cells reveal early large-scale genome regulation by pericentric domains.
2021
Abstract Finding out how cells prepare for fate change during differentiation commitment was our task. To address whether the constitutive pericentromere-associated domains (PADs) may be involved, we used a model system with known transcriptome data, MCF-7 breast cancer cells treated with the ErbB3 ligand heregulin (HRG), which induces differentiation and is used in the therapy of cancer. PAD-repressive heterochromatin (H3K9me3), centromere-associated-protein-specific, and active euchromatin (H3K4me3) antibodies, real-time PCR, acridine orange DNA structural test (AOT), and microscopic image analysis were applied. We found a two-step DNA unfolding after 15–20 and 60 min of HRG treatment, re…
Comparison of conventional descriptive analysis and a citation frequency-based descriptive method for odor profiling: An application to Burgundy Pino…
2010
International audience; The limitations of intensity scoring when describing the odor characteristics of a complex product have been documented in the literature. In the present work, the odor properties of 12 Burgundy Pinot noir wines were described by two independent panels performing, respectively, an intensity-based (conventional descriptive analysis) and a citation frequency-based method. Methods were compared according to three criteria: similarity of the sensory maps, control of panel performance and practical aspects. Intensity scoring and citation frequency data were analyzed, respectively, by Principal Components Analysis (PCA) and Correspondence Analysis (CA) followed by Hierarch…