Search results for "ALGORITHMS"

showing 10 items of 1716 documents

UVPAR: fast detection of functional shifts in duplicate genes.

2006

Abstract Background The imprint of natural selection on gene sequences is often difficult to detect. A plethora of methods have been devised to detect genetic changes due to selective processes. However, many of those methods depend heavily on underlying assumptions regarding the mode of change of DNA sequences and often require sophisticated mathematical treatments that made them computationally slow. The development of fast and effective methods to detect modifications in the selective constraints of genes is therefore of great interest. Results We describe UVPAR, a program designed to quickly test for changes in the functional constraints of duplicate genes. Starting with alignments of t…

DanioComputational biologyBiologylcsh:Computer applications to medicine. Medical informaticsBiochemistryDNA sequencingEvolution MolecularGenes DuplicateSequence Analysis ProteinStructural BiologySelection GeneticHox geneMolecular BiologyGenelcsh:QH301-705.5Selection (genetic algorithm)GeneticsNatural selectionApplied MathematicsProteinsSequence Analysis DNAbiology.organism_classificationComputer Science Applicationslcsh:Biology (General)lcsh:R858-859.7DNA microarraySequence AlignmentSoftwareAlgorithmsGenètica

researchProduct

Criminal networks analysis in missing data scenarios through graph distances.

2021

Data collected in criminal investigations may suffer from: (i) incompleteness, due to the covert nature of criminal organisations; (ii) incorrectness, caused by either unintentional data collection errors and intentional deception by criminals; (iii) inconsistency, when the same information is collected into law enforcement databases multiple times, or in different formats. In this paper we analyse nine real criminal networks of different nature (i.e., Mafia networks, criminal street gangs and terrorist organizations) in order to quantify the impact of incomplete data and to determine which network type is most affected by it. The networks are firstly pruned following two specific methods: …

Data AnalysisFOS: Computer and information sciencesComputer and Information SciencesScienceIntelligenceSocial SciencesTransportationCriminologyCivil EngineeringSocial NetworkingComputer Science - Computers and SocietyLaw EnforcementSociologyComputers and Society (cs.CY)PsychologyHumansComputer NetworksSocial and Information Networks (cs.SI)Algorithms; Humans; Terrorism; Criminals; Data Analysis; Social NetworkingSettore INF/01 - InformaticaQCognitive PsychologyRBiology and Life SciencesEigenvaluesComputer Science - Social and Information NetworksCriminalsTransportation InfrastructurePoliceRoadsProfessionsAlgebraLinear AlgebraPeople and PlacesPhysical SciencesEngineering and TechnologyCognitive ScienceMedicineLaw and Legal SciencesPopulation GroupingsTerrorismCrimeCriminal Justice SystemMathematicsNetwork AnalysisAlgorithmsResearch ArticleNeurosciencePLoS ONE

researchProduct

Analyzing big datasets of genomic sequences: fast and scalable collection of k-mer statistics

2019

Abstract Background Distributed approaches based on the MapReduce programming paradigm have started to be proposed in the Bioinformatics domain, due to the large amount of data produced by the next-generation sequencing techniques. However, the use of MapReduce and related Big Data technologies and frameworks (e.g., Apache Hadoop and Spark) does not necessarily produce satisfactory results, in terms of both efficiency and effectiveness. We discuss how the development of distributed and Big Data management technologies has affected the analysis of large datasets of biological sequences. Moreover, we show how the choice of different parameter configurations and the careful engineering of the …

Data AnalysisFOS: Computer and information sciencesTime FactorsTime FactorComputer scienceStatistics as TopicBig dataApache Spark; distributed computing; performance evaluation; k-mer countinglcsh:Computer applications to medicine. Medical informaticsBiochemistryDomain (software engineering)Databases03 medical and health sciences0302 clinical medicineStructural BiologyComputer clusterStatisticsSpark (mathematics)Molecular Biologylcsh:QH301-705.5030304 developmental biology0303 health sciencesGenomeSettore INF/01 - InformaticaBase SequenceNucleic AcidApache Sparkbusiness.industryResearchApache Spark; Distributed computing; k-mer counting; Performance evaluation; Algorithms; Base Sequence; Software; Time Factors; Data Analysis; Databases Nucleic Acid; Genome; Statistics as TopicApplied Mathematicsk-mer countingDistributed computingComputer Science ApplicationsAlgorithmData AnalysiComputer Science - Distributed Parallel and Cluster Computinglcsh:Biology (General)030220 oncology & carcinogenesisScalabilityPerformance evaluationlcsh:R858-859.7Algorithm designDistributed Parallel and Cluster Computing (cs.DC)Databases Nucleic AcidbusinessAlgorithmsSoftware

researchProduct

Controlling false match rates in record linkage using extreme value theory

2011

AbstractCleansing data from synonyms and homonyms is a relevant task in fields where high quality of data is crucial, for example in disease registries and medical research networks. Record linkage provides methods for minimizing synonym and homonym errors thereby improving data quality. We focus our attention to the case of homonym errors (in the following denoted as ‘false matches’), in which records belonging to different entities are wrongly classified as equal. Synonym errors (‘false non-matches’) occur when a single entity maps to multiple records in the linkage result. They are not considered in this study because in our application domain they are not as crucial as false matches. Fa…

Data cleansingData cleansingBiomedical ResearchDatabases FactualCalibration (statistics)Computer scienceHealth Informaticscomputer.software_genrePlot (graphics)Mean excess plotStatisticsRegistriesExtreme value theoryLinkage (software)Models StatisticalComputational BiologyFellegi–Sunter modelMixture modelGeneralized Pareto distributionComputer Science ApplicationsData qualityStatistics of extreme valuesDatabase Management SystemsMedical Record LinkageData miningcomputerAlgorithmsMedical InformaticsRecord linkageJournal of Biomedical Informatics

researchProduct

New results for finding common neighborhoods in massive graphs in the data stream model

2008

AbstractWe consider the problem of finding pairs of vertices that share large common neighborhoods in massive graphs. We give lower bounds for randomized, two-sided error algorithms that solve this problem in the data-stream model of computation. Our results correct and improve those of Buchsbaum, Giancarlo, and Westbrook [On finding common neighborhoods in massive graphs, Theoretical Computer Science, 299 (1–3) 707–718 (2004)]

Data streamDiscrete mathematicsGeneral Computer ScienceExtremal graph theorySpace lower boundsModel of computationCommunication complexityGraph theoryUpper and lower boundsTheoretical Computer ScienceExtremal graph theoryCombinatoricsGraph algorithms for data streamsAlgorithms Theoretical Computer SciencedGraph algorithmsCommunication complexityComputer Science(all)MathematicsTheoretical Computer Science

researchProduct

Datorzinātne un informācijas tehnoloģijas: Informācijas apstrādes automatizācija

2004

The first volume in the new series " Automation of Information Processing""contains recent results of young researchers, most of them doctoral students at the University of Latvia. Though the topics of the papers are quite different, they are all centered around the problem of providing theory, methodology, development tools and supporting environment for the development of information systems. All the papers in the volume are related to the most up-to-date issues in the respective area.

DataDatorzinātneTelecommunicationsInformation systemsInformācijas apstrādeSoftware development:TECHNOLOGY::Information technology::Computer science [Research Subject Categories]Informācijas sistēmasAlgorithmsSoftware testingInformācijas tehnologijas

researchProduct

Local dimensionality reduction and supervised learning within natural clusters for biomedical data analysis

2006

Inductive learning systems were successfully applied in a number of medical domains. Nevertheless, the effective use of these systems often requires data preprocessing before applying a learning algorithm. This is especially important for multidimensional heterogeneous data presented by a large number of features of different types. Dimensionality reduction (DR) is one commonly applied approach. The goal of this paper is to study the impact of natural clustering--clustering according to expert domain knowledge--on DR for supervised learning (SL) in the area of antibiotic resistance. We compare several data-mining strategies that apply DR by means of feature extraction or feature selection w…

Databases FactualComputer scienceFeature extractionInformation Storage and RetrievalFeature selectionMachine learningcomputer.software_genreModels BiologicalPattern Recognition AutomatedImmune systemArtificial IntelligenceDrug Resistance BacterialCluster AnalysisHumansComputer SimulationElectrical and Electronic EngineeringRepresentation (mathematics)Cluster analysisCross Infectionbusiness.industryDimensionality reductionSupervised learningGeneral MedicineAnti-Bacterial AgentsComputer Science ApplicationsData pre-processingData miningArtificial intelligenceMultidimensional systemsbusinesscomputerAlgorithmsBiotechnology

researchProduct

FABC: Retinal Vessel Segmentation Using AdaBoost

2010

This paper presents a method for automated vessel segmentation in retinal images. For each pixel in the field of view of the image, a 41-D feature vector is constructed, encoding information on the local intensity structure, spatial properties, and geometry at multiple scales. An AdaBoost classifier is trained on 789 914 gold standard examples of vessel and nonvessel pixels, then used for classifying previously unseen images. The algorithm was tested on the public digital retinal images for vessel extraction (DRIVE) set, frequently used in the literature and consisting of 40 manually labeled images with gold standard. Results were compared experimentally with those of eight algorithms as we…

Databases FactualComputer scienceFeature vectorFeature extractionNormal DistributionComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISIONImage processingModels BiologicalEdge detectionArtificial IntelligenceImage Processing Computer-AssistedHumansSegmentationComputer visionAdaBoostFluorescein AngiographyElectrical and Electronic EngineeringTraining setPixelContextual image classificationSettore INF/01 - Informaticabusiness.industryReproducibility of ResultsRetinal VesselsWavelet transformBayes TheoremPattern recognitionGeneral MedicineImage segmentationComputer Science ApplicationsComputingMethodologies_PATTERNRECOGNITIONROC CurveTest setAdaBoost classifier retinal images vessel segmentationArtificial intelligencebusinessAlgorithmsBiotechnology

researchProduct

A completely automated CAD system for mass detection in a large mammographic database.

2006

Mass localization plays a crucial role in computer-aided detection (CAD) systems for the classification of suspicious regions in mammograms. In this article we present a completely automated classification system for the detection of masses in digitized mammographic images. The tool system we discuss consists in three processing levels: (a) Image segmentation for the localization of regions of interest (ROIs). This step relies on an iterative dynamical threshold algorithm able to select iso-intensity closed contours around gray level maxima of the mammogram. (b) ROI characterization by means of textural features computed from the gray tone spatial dependence matrix (GTSDM), containing secon…

Databases FactualInformation Storage and RetrievalReproducibility of ResultsBreast NeoplasmsSensitivity and SpecificityNeural networkPattern Recognition AutomatedRadiographic Image EnhancementBreast cancerTextural featuresRadiology Information SystemsImage processingComputer-aided detection (CAD)Artificial IntelligenceCluster AnalysisDatabase Management SystemsHumansRadiographic Image Interpretation Computer-AssistedFemaleBreast cancer; Computer-aided detection (CAD); Image processing; Mammographic mass detection; Neural network; Textural featuresMammographic mass detectionAlgorithmsMammographyMedical physics

researchProduct

Fuzzy technique for microcalcifications clustering in digital mammograms

2012

Abstract Background Mammography has established itself as the most efficient technique for the identification of the pathological breast lesions. Among the various types of lesions, microcalcifications are the most difficult to identify since they are quite small (0.1-1.0 mm) and often poorly contrasted against an images background. Within this context, the Computer Aided Detection (CAD) systems could turn out to be very useful in breast cancer control. Methods In this paper we present a potentially powerful microcalcifications cluster enhancement method applicable to digital mammograms. The segmentation phase employs a form filter, obtained from LoG filter, to overcome the dependence from …

Databases FactualMicrocalcificationsBreast NeoplasmsContext (language use)CADcomputer.software_genreSensitivity and SpecificityFuzzy logicClusteringBreast cancerSegmentationBreast cancerC-meansImage Processing Computer-AssistedmedicineCluster AnalysisHumansMammographyRadiology Nuclear Medicine and imagingSegmentationCluster analysisSpatial filtersmedicine.diagnostic_testMultimediabusiness.industryCalcinosisPattern recognitionmedicine.diseaseSettore FIS/07 - Fisica Applicata(Beni Culturali Ambientali Biol.e Medicin)Computer aided detectionFuzzy logicRadiology Nuclear Medicine and imagingFemaleArtificial intelligencebusinesscomputerAlgorithmsMammographyResearch ArticleBreast cancer Microcalcifications Spatial filters Clustering Fuzzy logic C-means Mammography SegmentationBMC Medical Imaging

researchProduct