Search results for " sequence"

showing 10 items of 3643 documents

Compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assessment.

2007

Abstract Background Similarity of sequences is a key mathematical notion for Classification and Phylogenetic studies in Biology. It is currently primarily handled using alignments. However, the alignment methods seem inadequate for post-genomic studies since they do not scale well with data set size and they seem to be confined only to genomic and proteomic sequences. Therefore, alignment-free similarity measures are actively pursued. Among those, USM (Universal Similarity Metric) has gained prominence. It is based on the deep theory of Kolmogorov Complexity and universality is its most novel striking feature. Since it can only be approximated via data compression, USM is a methodology rath…

Computer scienceAlgorismesPrediction by partial matchingCompression dissimilaritycomputer.software_genreBiochemistryProtein Structure SecondaryPhylogenetic studiesStructural BiologySequence Analysis ProteinDatabases Proteinlcsh:QH301-705.5Biological dataNCDApplied MathematicsGenomicsClassificationCDComputer Science ApplicationsBenchmarking:Informàtica::Informàtica teòrica [Àrees temàtiques de la UPC]Universal compression dissimilarityArea Under CurveMetric (mathematics)lcsh:R858-859.7Data miningAlgorithmsData compressionResearch Article:Informàtica::Aplicacions de la informàtica::Bioinformàtica [Àrees temàtiques de la UPC]Normalization (statistics)lcsh:Computer applications to medicine. Medical informaticsBioinformatics Sequence Alignment AlgorithmsSet (abstract data type)Similarity (network science)Normalized compression sissimilarityData compression (Computer science)AnimalsHumansAmino Acid SequenceMolecular BiologyBiologyDades -- Compressió (Informàtica)USMUniversal similarity metricProteinsUCDProtein Structure TertiaryData setGenòmicaStatistical classificationlcsh:Biology (General)ROC CurvecomputerSequence AlignmentSoftwareBMC bioinformatics
researchProduct

Efficient and Accurate OTU Clustering with GPU-Based Sequence Alignment and Dynamic Dendrogram Cutting.

2015

De novo clustering is a popular technique to perform taxonomic profiling of a microbial community by grouping 16S rRNA amplicon reads into operational taxonomic units (OTUs). In this work, we introduce a new dendrogram-based OTU clustering pipeline called CRiSPy. The key idea used in CRiSPy to improve clustering accuracy is the application of an anomaly detection technique to obtain a dynamic distance cutoff instead of using the de facto value of 97 percent sequence similarity as in most existing OTU clustering pipelines. This technique works by detecting an abrupt change in the merging heights of a dendrogram. To produce the output dendrograms, CRiSPy employs the OTU hierarchical clusterin…

Computer scienceCorrelation clusteringSingle-linkage clusteringMolecular Sequence DataMachine learningcomputer.software_genrePattern Recognition AutomatedCURE data clustering algorithmRNA Ribosomal 16SGeneticsComputer GraphicsCluster analysisBase Sequencebusiness.industryApplied MathematicsDendrogramHigh-Throughput Nucleotide SequencingPattern recognitionSignal Processing Computer-AssistedEquipment DesignHierarchical clusteringEquipment Failure AnalysisRNA BacterialCanopy clustering algorithmArtificial intelligenceHierarchical clustering of networksbusinesscomputerSequence AlignmentAlgorithmsBiotechnologyIEEE/ACM transactions on computational biology and bioinformatics
researchProduct

tbg - a new file format for genomic data

2021

AbstractMotivationThe question of determining whether a Single-Nucleotide Polymorphism (SNP) or a variant in general leads to a change in the amino acid sequence of a protein coding gene is often a laborious and time-consuming challenge. Here, we introduce the tbg file format for storing genomic data and tbg-tools, a user-friendly toolbox for the faster analysis of SNPs. The file format stores information for each nucleotide in each gene, allowing to predict which change in the amino acid sequence will be caused by a variant in the nucleotide sequence. Our new tool therefore has the potential to make biological sense of the unprecedented amount of genome-wide genetic variation that research…

Computer scienceGenetic variationNucleic acid sequenceSingle-nucleotide polymorphismComputational biologyLine (text file)Python (programming language)File formatPeptide sequencecomputerToolboxcomputer.programming_language
researchProduct

pcaExplorer: an R/Bioconductor package for interacting with RNA-seq principal components

2019

AbstractBackgroundPrincipal component analysis (PCA) is frequently useentirely written ind in genomics applications for quality assessment and exploratory analysis in high-dimensional data, such as RNA sequencing (RNA-seq) gene expression assays. Despite the availability of many software packages developed for this purpose, an interactive and comprehensive interface for performing these operations is lacking.ResultsWe developed the pcaExplorer software package to enhance commonly performed analysis steps with an interactive and user-friendly application, which provides state saving as well as the automated creation of reproducible reports. pcaExplorer is implemented in R using the Shiny fra…

Computer scienceInterface (computing)ShinyBioconductorPrincipal component analysis610 MedizinRNA-SeqGenomicslcsh:Computer applications to medicine. Medical informaticsReproducible researchBioconductorTranscriptomeExploratory data analysisUser-friendly610 Medical sciencesGene expressionHumansRNA-SeqGenelcsh:QH301-705.5Data CurationBase Sequencebusiness.industrySequence Analysis RNARRNAReproducibility of Resultslcsh:Biology (General)Principal component analysisRNAlcsh:R858-859.7Software engineeringbusinessSoftware
researchProduct

A methodology to assess the intrinsic discriminative ability of a distance function and its interplay with clustering algorithms for microarray data …

2013

Abstract Background Clustering is one of the most well known activities in scientific investigation and the object of research in many disciplines, ranging from statistics to computer science. Following Handl et al., it can be summarized as a three step process: (1) choice of a distance function; (2) choice of a clustering algorithm; (3) choice of a validation method. Although such a purist approach to clustering is hardly seen in many areas of science, genomic data require that level of attention, if inferences made from cluster analysis have to be of some relevance to biomedical research. Results A procedure is proposed for the assessment of the discriminative ability of a distance functi…

Computer sciencecomputer.software_genreBiochemistrysymbols.namesakeDiscriminative modelStructural BiologyCluster AnalysisRelevance (information retrieval)Cluster analysisMolecular BiologyOligonucleotide Array Sequence AnalysisClustering discriminative ability of a distance function external validation indicesSettore INF/01 - InformaticaResearchApplied MathematicsMutual informationPearson product-moment correlation coefficientComputer Science ApplicationsHierarchical clusteringEuclidean distanceRange (mathematics)Metric (mathematics)symbolsData miningTranscriptomecomputerAlgorithmsBMC Bioinformatics
researchProduct

Indexing a sequence for mapping reads with a single mismatch

2014

Mapping reads against a genome sequence is an interesting and useful problem in computational molecular biology and bioinformatics. In this paper, we focus on the problem of indexing a sequence for mapping reads with a single mismatch. We first focus on a simpler problem where the length of the pattern is given beforehand during the data structure construction. This version of the problem is interesting in its own right in the context of the next generation sequencing. In the sequel, we show how to solve the more general problem. In both cases, our algorithm can construct an efficient data structure in time and space and can answer subsequent queries in time. Here, n is the length of the s…

Computer sciencegenome sequenceGeneral Mathematics[INFO.INFO-DS]Computer Science [cs]/Data Structures and Algorithms [cs.DS]General Physics and AstronomyContext (language use)algorithmscomputer.software_genrePattern matchingSequenceSearch engine indexingGeneral EngineeringWildcard characterArticlescomputer.file_formatConstruct (python library)Data structuremapping readspattern matchingComputingMethodologies_DOCUMENTANDTEXTPROCESSINGData mining[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]Focus (optics)mismatchcomputerAlgorithmindexingPhilosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
researchProduct

Dynamic DNA Origami Devices: from Strand-Displacement Reactions to External-Stimuli Responsive Systems

2018

DNA nanotechnology provides an excellent foundation for diverse nanoscale structures that can be used in various bioapplications and materials research. Among all existing DNA assembly techniques, DNA origami proves to be the most robust one for creating custom nanoshapes. Since its invention in 2006, building from the bottom up using DNA advanced drastically, and therefore, more and more complex DNA-based systems became accessible. So far, the vast majority of the demonstrated DNA origami frameworks are static by nature; however, there also exist dynamic DNA origami devices that are increasingly coming into view. In this review, we discuss DNA origami nanostructures that exhibit controlled…

Computer sciencemechanical movementnanotekniikka02 engineering and technologyReview01 natural sciencesrobotiikkalcsh:Chemistrychemistry.chemical_compoundDNA origamiNanotechnologyDNA nanotechnologylcsh:QH301-705.5SpectroscopyroboticsPhysicsGeneral Medicineself-assembly021001 nanoscience & nanotechnologyMechanical engineeringComputer Science ApplicationsChemistryNanorobotics0210 nano-technologyBiotechnologyeducationNanotechnology010402 general chemistryMedical sciencesCatalysisDNA sequencingInorganic ChemistryDisplacement reactionsmolecular devicesDNA nanotechnologyAnimalsHumansPhysical and Theoretical ChemistryMolecular BiologyBase SequenceOrganic ChemistryResponsive systemsDNA0104 chemical sciencesNanostructureslcsh:Biology (General)lcsh:QD1-999chemistryTargeted drug deliveryNucleic Acid ConformationDNA origamiDNAInternational Journal of Molecular Sciences
researchProduct

Molecular architecture of a toxin pore: a 15-residue sequence lines the transmembrane channel of staphylococcal alpha-toxin.

1996

Staphylococcus aureus alpha-toxin is a hydrophilic polypeptide of 293 amino acids that produces heptameric transmembrane pores. During assembly, the formation of a pre-pore precedes membrane permeabilization; the latter is linked to a conformational change in the oligomer. Here, 41 single-cysteine replacement toxin mutants were thiol-specifically labelled with the polarity-sensitive fluorescent probe acrylodan. After oligomerization on membranes, only the mutants with acrylodan attached to residues in the sequence 118-140 exhibited a marked blue shift in the fluorescence emission maximum, indicative of movement of the fluorophore to a hydrophobic environment. Within this region, two functio…

Conformational changeStaphylococcus aureusProtein ConformationMembrane lipidsBacterial ToxinsMolecular Sequence DataBiologyGeneral Biochemistry Genetics and Molecular BiologyCell membraneHemolysin ProteinsProtein structure2-NaphthylaminemedicinePoint MutationAmino Acid SequenceCysteineMolecular BiologyPeptide sequenceFluorescent Dyeschemistry.chemical_classificationBinding SitesGeneral Immunology and MicrobiologyMolecular StructureGeneral NeuroscienceCell MembraneTransmembrane proteinAmino acidmedicine.anatomical_structureMembraneSpectrometry FluorescenceBiochemistrychemistryLiposomesBiophysicsMutagenesis Site-DirectedResearch ArticleThe EMBO journal
researchProduct

Combined approach to atrial and ventricular function for assessment of diastole through MRI: Hypertrophic Cardiomyopathy (HCM) vs Healthy Controls (H…

2013

Purpose Methods and Materials Results Conclusion References Personal Information

CongenitalCongenital Imaging sequences MR CardiacMRImaging sequencesSettore MED/36 - Diagnostica Per Immagini E RadioterapiaCardiac
researchProduct

Finding essential features for tracking starfish in a video sequence

2004

The paper introduces a software system for detecting and tracking starfish in an underwater video sequence. The target of such a system is to help biologists in giving an estimate of the number of starfish present in a particular area of the sea-bottom. The nature of the input images is characterised by a low signal/noise ratio and by the presence of noisy background represented by pebbles; this makes the detection a non-trivial task. The procedure we use is a chain of several steps that starts from the extraction of the area of interest and ends with a classifier and a tracker providing the necessary information for counting the starfish present in the scene. © 2003 IEEE.

Contextual image classificationbiologySettore INF/01 - InformaticaEstimation theoryComputer sciencebusiness.industryStarfishFeature extractionbiology.organism_classificationObject detectionComputer visionArtificial intelligenceSoftware systemUnderwaterbusinessClassifier (UML)underwater video sequence starfish features extraction.
researchProduct