Search results for "algorithm."

showing 10 items of 4617 documents

The colored longest common prefix array computed via sequential scans

2018

Due to the increased availability of large datasets of biological sequences, the tools for sequence comparison are now relying on efficient alignment-free approaches to a greater extent. Most of the alignment-free approaches require the computation of statistics of the sequences in the dataset. Such computations become impractical in internal memory when very large collections of long sequences are considered. In this paper, we present a new conceptual data structure, the colored longest common prefix array (cLCP), that allows to efficiently tackle several problems with an alignment-free approach. In fact, we show that such a data structure can be computed via sequential scans in semi-exter…

0301 basic medicineFOS: Computer and information sciencesAlignment-free methodsBurrows–Wheeler transformComputer scienceComputationAverage common substring0206 medical engineeringMatching statisticsScale (descriptive set theory)02 engineering and technologyTheoretical Computer Science03 medical and health sciencesComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)Burrows-wheeler transformString (computer science)Computer Science (all)LCP arrayMatching statisticData structureSubstring030104 developmental biologyAlignment-free methods; Average common substring; Burrows-wheeler transform; Longest common prefix; Matching statistics; Theoretical Computer Science; Computer Science (all)Pairwise comparisonLongest common prefixAlgorithm020602 bioinformaticsAlignment-free method
researchProduct

Q-nexus: a comprehensive and efficient analysis pipeline designed for ChIP-nexus

2016

Background: ChIP-nexus, an extension of the ChIP-exo protocol, can be used to map the borders of protein-bound DNA sequences at nucleotide resolution, requires less input DNA and enables selective PCR duplicate removal using random barcodes. However, the use of random barcodes requires additional preprocessing of the mapping data, which complicates the computational analysis. To date, only a very limited number of software packages are available for the analysis of ChIP-exo data, which have not yet been systematically tested and compared on ChIP-nexus data. Results: Here, we present a comprehensive software package for ChIP-nexus data that exploits the random barcodes for selective removal …

0301 basic medicineFOS: Computer and information sciencesDuplication ratesChromatin ImmunoprecipitationBioinformaticsPipeline (computing)610Biologycomputer.software_genre600 Technik Medizin angewandte Wissenschaften::610 Medizin und Gesundheit03 medical and health sciencesSoftwareChIP-nexusGeneticsPreprocessorNucleotide MotifsLibrary complexityChIP-exoGeneticsProtocol (science)Binding Sitesbusiness.industryfungiComputational BiologyHigh-Throughput Nucleotide SequencingReproducibility of ResultsChipChromatin immunoprecipitationData mappingDNA-Binding ProteinsAlgorithm030104 developmental biologyChIP-exoData miningbusinessPeak callingcomputerAlgorithmsSoftwareProtein BindingTranscription FactorsResearch ArticleBiotechnologyBMC Genomics
researchProduct

Alignment-free sequence comparison using absent words

2018

Sequence comparison is a prerequisite to virtually all comparative genomic analyses. It is often realised by sequence alignment techniques, which are computationally expensive. This has led to increased research into alignment-free techniques, which are based on measures referring to the composition of sequences in terms of their constituent patterns. These measures, such as $q$-gram distance, are usually computed in time linear with respect to the length of the sequences. In this paper, we focus on the complementary idea: how two sequences can be efficiently compared based on information that does not occur in the sequences. A word is an {\em absent word} of some sequence if it does not oc…

0301 basic medicineFOS: Computer and information sciencesFormal Languages and Automata Theory (cs.FL)Computer Science - Formal Languages and Automata TheorySequence alignmentInformation System0102 computer and information sciencesCircular wordAbsent words01 natural sciencesUpper and lower boundsSequence comparisonTheoretical Computer ScienceCombinatorics03 medical and health sciencesComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)Absent wordCircular wordsMathematicsSequenceSettore INF/01 - InformaticaProcess (computing)q-gramComputer Science Applications1707 Computer Vision and Pattern Recognitionq-gramsComposition (combinatorics)Computer Science Applications030104 developmental biologyComputational Theory and MathematicsForbidden words010201 computation theory & mathematicsFocus (optics)Forbidden wordWord (computer architecture)Information SystemsInteger (computer science)
researchProduct

Use of deep learning methods to translate drug-induced gene expression changes from rat to human primary hepatocytes

2020

In clinical trials, animal and cell line models are often used to evaluate the potential toxic effects of a novel compound or candidate drug before progressing to human trials. However, relating the results of animal and in vitro model exposures to relevant clinical outcomes in the human in vivo system still proves challenging, relying on often putative orthologs. In recent years, multiple studies have demonstrated that the repeated dose rodent bioassay, the current gold standard in the field, lacks sufficient sensitivity and specificity in predicting toxic effects of pharmaceuticals in humans. In this study, we evaluate the potential of deep learning techniques to translate the pattern of …

0301 basic medicineGene ExpressionGene Expression Regulation/drug effectsPathology and Laboratory MedicineConvolutional neural networkTOXICITYMachine LearningVoeding Metabolisme en GenomicaTime Measurement0302 clinical medicineGene expressionMedicine and Health SciencesMeasurementClinical Trials as TopicMultidisciplinaryArtificial neural networkPharmaceuticsQRMetabolism and GenomicsTOXICOGENOMICS030220 oncology & carcinogenesisMetabolisme en GenomicaMedicineEngineering and TechnologyNutrition Metabolism and GenomicsHepatocytes/drug effectsAlgorithmsResearch ArticleComputer and Information SciencesClinical Trials as Topic/statistics & numerical dataNeural NetworksGenetic ToxicologyTOXICOLOGYSciencePredictive ToxicologyComputational biologyBiologyComputer03 medical and health sciencesDose Prediction MethodsDeep LearningVoedingArtificial IntelligenceIn vivoGeneticsLife ScienceAnimalsHumansGeneNutritionbusiness.industryDeep learningBiology and Life SciencesGold standard (test)REPRESENTATIONSRats030104 developmental biologyGene Expression RegulationHepatocytesArtificial intelligenceNeural Networks ComputerToxicogenomicsbusinessNeuroscience
researchProduct

Measuring the clustering effect of BWT via RLE

2017

Abstract The Burrows–Wheeler Transform (BWT) is a reversible transformation on which are based several text compressors and many other tools used in Bioinformatics and Computational Biology. The BWT is not actually a compressor, but a transformation that performs a context-dependent permutation of the letters of the input text that often create runs of equal letters (clusters) longer than the ones in the original text, usually referred to as the “clustering effect” of BWT. In particular, from a combinatorial point of view, great attention has been given to the case in which the BWT produces the fewest number of clusters (cf. [5] , [16] , [21] , [23] ). In this paper we are concerned about t…

0301 basic medicineGeneral Computer SciencePermutationComputer Science (all)Binary number0102 computer and information sciencesQuantitative Biology::Genomics01 natural sciencesUpper and lower boundsTheoretical Computer ScienceCombinatorics03 medical and health sciencesPermutation030104 developmental biologyTransformation (function)BWT010201 computation theory & mathematicsRun-length encodingComputer Science::Data Structures and AlgorithmsCluster analysisPrimitive root modulo nBWT; Permutation; Run-length encoding; Theoretical Computer Science; Computer Science (all)Word (computer architecture)Run-length encodingMathematics
researchProduct

Preselection statistics and Random Forest classification identify population informative single nucleotide polymorphisms in cosmopolitan and autochth…

2018

Commercial single nucleotide polymorphism (SNP) arrays have been recently developed for several species and can be used to identify informative markers to differentiate breeds or populations for several downstream applications. To identify the most discriminating genetic markers among thousands of genotyped SNPs, a few statistical approaches have been proposed. In this work, we compared several methods of SNPs preselection (Delta, F st and principal component analyses (PCA)) in addition to Random Forest classifications to analyse SNP data from six dairy cattle breeds, including cosmopolitan (Holstein, Brown and Simmental) and autochthonous Italian breeds raised in two different regions and …

0301 basic medicineGenetic MarkersLinkage disequilibriumGenotypePopulationAnimal Identification SystemsSNPSingle-nucleotide polymorphismBiologyBreedingPolymorphism Single NucleotideSF1-1100Linkage Disequilibrium03 medical and health sciencesSettore AGR/17 - Zootecnica Generale E Miglioramento GeneticoSNPAnimalsBos tauruSelection GeneticeducationSelection (genetic algorithm)Geneticseducation.field_of_studyPrincipal Component AnalysisRandom ForestBos taurus; breed assignment; Random Forest; SNP; Animal Science and Zoology0402 animal and dairy science04 agricultural and veterinary sciencesPhenotypic trait040201 dairy & animal scienceBos taurusSNP genotypingAnimal culture030104 developmental biologyPhenotypeItalyGenetic markerSNP breed assignment Random Forest Bos taurusCattleAnimal Science and Zoologybreed assignmentAnimal
researchProduct

Comparison of CRISPR and Marker-Based Methods for the Engineering of Phage T7

2020

This article belongs to the Section Bacterial Viruses.

0301 basic medicineGenetic Markersviruses030106 microbiologyMutantlcsh:QR1-502t7Computational biologyGenome ViralBiologyGenomeArticlelcsh:MicrobiologyBacteriophage03 medical and health sciencesbacteriophageVirologyBacteriophage T7CRISPRClustered Regularly Interspaced Short Palindromic RepeatsGenomescrisprBacteriophageGeneSelection (genetic algorithm)Gene EditingQHT7Viral Tail Proteinsbiology.organism_classificationBacteriòfags3. Good healthQRtail fibres030104 developmental biologyInfectious DiseasesLytic cycleCRISPRMutationTail fibresCRISPR-Cas SystemsHomologous recombinationGenèticaViruses
researchProduct

Previously Undescribed Family Mutation in the JAG1 Gene as a Cause for Alagille Syndrome

2017

0301 basic medicineGeneticsJAG1Polymorphism Geneticbusiness.industryGastroenterologyInfant030105 genetics & hereditymedicine.diseaseAlagille Syndrome03 medical and health sciences030104 developmental biologyPolymorphism (computer science)MutationPediatrics Perinatology and Child HealthAlagille syndromeMutation (genetic algorithm)medicineHumansFemalebusinessGeneJagged-1 ProteinJournal of Pediatric Gastroenterology & Nutrition
researchProduct

A Novel Role for CSRP1 in a Lebanese Family with Congenital Cardiac Defects

2017

Despite an obvious role for consanguinity in congenital heart disease (CHD), most studies fail to document a monogenic model of inheritance except for few cases. We hereby describe a first-degree cousins consanguineous Lebanese family with 7 conceived children: 2 died in utero of unknown causes, 3 have CHD, and 4 have polydactyly. The aim of the study is to unveil the genetic variant(s) causing these phenotypes using next generation sequencing (NGS) technology. Targeted exome sequencing identified a heterozygous duplication in CSRP1 which leads to a potential frameshift mutation at position 154 of the protein. This mutation is inherited from the father, and segregates only with the CHD phen…

0301 basic medicineGeneticsPolydactylylcsh:QH426-470ConsanguinityBiologypolydactylymedicine.diseasecongenital heart diseaseFrameshift mutation03 medical and health scienceslcsh:Genetics030104 developmental biologyTRPS1Gene duplicationMutation (genetic algorithm)medicineGeneticsMolecular MedicineMissense mutationExomeGenetics (clinical)Exome sequencingOriginal ResearchCSRP1Frontiers in Genetics
researchProduct

Lost Strings in Genomes: What Sense Do They Make?

2017

We studied the sets of avoided strings to be observed over a family of genomes. It was found that the length of the minimal avoided string rarely exceeds 9 nucleotides, with neither respect to a phylogeny of a genome under consideration. The lists of the avoided strings observed over the sets of (related) genomes have been analyzed. Very low correlation between the phylogeny, and the set of those strings has been found.

0301 basic medicineGeneticsanimal structuresgenetic structuresinformation scienceString (physics)GenomeCombinatoricsSet (abstract data type)03 medical and health sciences030104 developmental biology0302 clinical medicinePhylogeneticscardiovascular systemLow correlation030217 neurology & neurosurgerySelection (genetic algorithm)Mathematics
researchProduct