Search results for "Cluster analysis"
showing 10 items of 848 documents
Assessing statistical significance in multivariable genome wide association analysis
2016
Motivation: Although Genome Wide Association Studies (GWAS) genotype a very large number of single nucleotide polymorphisms (SNPs), the data are often analyzed one SNP at a time. The low predictive power of single SNPs, coupled with the high significance threshold needed to correct for multiple testing, greatly decreases the power of GWAS. Results: We propose a procedure in which all the SNPs are analyzed in a multiple generalized linear model, and we show its use for extremely high-dimensional datasets. Our method yields P-values for assessing significance of single SNPs or groups of SNPs while controlling for all other SNPs and the family wise error rate (FWER). Thus, our method tests whe…
ParDRe: faster parallel duplicated reads removal tool for sequencing studies
2016
This is a pre-copyedited, author-produced version of an article accepted for publication in Bioinformatics following peer review. The version of record [insert complete citation information here] is available online at: https://doi.org/10.1093/bioinformatics/btw038 [Abstract] Summary: Current next generation sequencing technologies often generate duplicated or near-duplicated reads that (depending on the application scenario) do not provide any interesting biological information but can increase memory requirements and computational time of downstream analysis. In this work we present ParDRe , a de novo parallel tool to remove duplicated and near-duplicated reads through the clustering of S…
The latent geometry of the human protein interaction network
2017
Abstract Motivation A series of recently introduced algorithms and models advocates for the existence of a hyperbolic geometry underlying the network representation of complex systems. Since the human protein interaction network (hPIN) has a complex architecture, we hypothesized that uncovering its latent geometry could ease challenging problems in systems biology, translating them into measuring distances between proteins. Results We embedded the hPIN to hyperbolic space and found that the inferred coordinates of nodes capture biologically relevant features, like protein age, function and cellular localization. This means that the representation of the hPIN in the two-dimensional hyperboli…
Inhabiting plant roots, nematodes, and truffles—polyphilus, a new helotialean genus with two globally distributed species
2018
Fungal root endophytes, including the common group of dark septate endophytes (DSEs), represent different taxonomic groups and potentially diverse life strategies. In this study, we investigated two unidentified helotialean lineages found previously in a study of DSE fungi of semiarid grasslands, from several other sites, and collected recently from a pezizalean truffle ascoma and eggs of the cereal cyst nematode Heterodera filipjevi. The taxonomic positions and phylogenetic relationships of 21 isolates with different hosts and geographic origins were studied in detail. Four loci, namely, nuc rDNA ITS1-5.8S-ITS2 (internal transcribed spacer [ITS]), partial 28S nuc rDNA (28S), partial 18S nu…
Ultra-Fast Detection of Higher-Order Epistatic Interactions on GPUs
2017
Detecting higher-order epistatic interactions in Genome-Wide Association Studies (GWAS) remains a challenging task in the fields of genetic epidemiology and computer science. A number of algorithms have recently been proposed for epistasis discovery. However, they suffer from a high computational cost since statistical measures have to be evaluated for each possible combination of markers. Hence, many algorithms use additional filtering stages discarding potentially non-interacting markers in order to reduce the overall number of combinations to be examined. Among others, Mutual Information Clustering (MIC) is a common pre-processing filter for grouping markers into partitions using K-Means…
Co-regulation of paralog genes in the three-dimensional chromatin architecture.
2016
Paralog genes arise from gene duplication events during evolution, which often lead to similar proteins that cooperate in common pathways and in protein complexes. Consequently, paralogs show correlation in gene expression whereby the mechanisms of co-regulation remain unclear. In eukaryotes, genes are regulated in part by distal enhancer elements through looping interactions with gene promoters. These looping interactions can be measured by genome-wide chromatin conformation capture (Hi-C) experiments, which revealed self-interacting regions called topologically associating domains (TADs). We hypothesize that paralogs share common regulatory mechanisms to enable coordinated expression acco…
Full-automatic computer aided system for stem cell clustering using content-based microscopic image analysis
2017
Abstract Stem cells are very original cells that can differentiate into other cells, tissues and organs, which play a very important role in biomedical treatments. Because of the importance of stem cells, in this paper we propose a full-automatic computer aided clustering system to assist scientists to explore potential co-occurrence relations between the cell differentiation and their morphological information in phenotype. In this proposed system, a multi-stage Content-based Microscopic Image Analysis (CBMIA) framework is applied, including image segmentation, feature extraction, feature selection, feature fusion and clustering techniques. First, an Improved Supervised Normalized Cuts (IS…
Unexpected associated microalgal diversity in the lichen Ramalina farinacea is uncovered by pyrosequencing analyses
2017
The current literature reveals that the intrathalline coexistence of multiple microalgal taxa in lichens is more common than previously thought, and additional complexity is supported by the coexistence of bacteria and basidiomycete yeasts in lichen thalli. This replaces the old paradigm that lichen symbiosis occurs between a fungus and a single photobiont. The lichen Ramalina farinacea has proven to be a suitable model to study the multiplicity of microalgae in lichen thalli due to the constant coexistence of Trebouxia sp. TR9 and T. jamesii in long-distance populations. To date, studies involving phycobiont diversity within entire thalli are based on Sanger sequencing, but this method see…
Detection of temporal clusters of health care-associated infections or colonizations with Pseudomonas aeruginosa.
2016
International audience; We investigated temporal clusters of Pseudomonas aeruginosa cases between 2005 and 2014 in 1 French university hospital, overall and by ward, using the Kulldorff method. Clusters of positive water samples were also investigated at the whole hospital level. Our results suggest that water outlets are not closely involved in the occurrence of clusters of P aeruginosa cases.
Low-cost scalable discretization, prediction and feature selection for complex systems
2019
The introduced data-driven tool allows simultaneous feature selection, model inference, and marked cost and quality gains.