Search results for "Data mining"

showing 10 items of 907 documents

The effect of automated taxa identification errors on biological indices

2017

In benthic macroinvertebrate biomonitoring systems, the target is to determine the status of ecosystems based on several biological indices. To increase cost-efficiency, computer-based taxa identification for image data has recently been developed. Taxa identification errors can, however, have strong effects on the indices and thus on the determination of the ecological status. In order to shift the biomonitoring process towards automated expert systems, we need a clear understanding on the bias caused by automation. In this paper, we examine eleven classification methods in the case of macroinvertebrate image data and show how their classification errors propagate into different biological…

Computer science02 engineering and technologycomputer.software_genre01 natural sciencesSimilarity010104 statistics & probabilityArtificial IntelligenceBiomonitoring0202 electrical engineering electronic engineering information engineeringEcosystem0101 mathematicssimilarityta218Invertebrateta112General Engineeringerror propagation [diversity]Computer Science ApplicationssamanlaisuusTaxondiversity: error propagationBenthic zonebiomonitoringidentification020201 artificial intelligence & image processingIdentification (biology)Data miningSpecies richnessclassification errorcomputerExpert Systems with Applications
researchProduct

SCCF Parameter and Similarity Measure Optimization and Evaluation

2019

Neighborhood-based Collaborative Filtering (CF) is one of the most successful and widely used recommendation approaches; however, it suffers from major flaws especially under sparse environments. Traditional similarity measures used by neighborhood-based CF to find similar users or items are not suitable in sparse datasets. Sparse Subspace Clustering and common liking rate in CF (SCCF), a recently published research, proposed a tunable similarity measure oriented towards sparse datasets; however, its performance can be maximized and requires further analysis and investigation. In this paper, we propose and evaluate the performance of a new tuning mechanism, using the Mean Absolute Error (MA…

Computer science020206 networking & telecommunications02 engineering and technologyRecommender systemSimilarity measurecomputer.software_genreMeasure (mathematics)Similarity (network science)Subspace clustering0202 electrical engineering electronic engineering information engineeringCollaborative filtering020201 artificial intelligence & image processingData miningcomputerSelection (genetic algorithm)Overall efficiency
researchProduct

Reestimating a minimum acceptable geocoding hit rate for conducting a spatial analysis

2019

Geocoding consists in converting a textual description of a location into coordinates. Hence, geocoding a dataset of events has to be carried out before performing a spatial analysis of some data. ...

Computer science05 social sciencesGeography Planning and Development0211 other engineering and technologies0507 social and economic geography02 engineering and technologyLibrary and Information Sciencescomputer.software_genreGeocodingHit rateData mining050703 geographycomputer021101 geological & geomatics engineeringInformation SystemsInternational Journal of Geographical Information Science
researchProduct

Large-scale random features for kernel regression

2015

Kernel methods constitute a family of powerful machine learning algorithms, which have found wide use in remote sensing and geosciences. However, kernel methods are still not widely adopted because of the high computational cost when dealing with large scale problems, such as the inversion of radiative transfer models. This paper introduces the method of random kitchen sinks (RKS) for fast statistical retrieval of bio-geo-physical parameters. The RKS method allows to approximate a kernel matrix with a set of random bases sampled from the Fourier domain. We extend their use to other bases, such as wavelets, stumps, and Walsh expansions. We show that kernel regression is now possible for data…

Computer science1900 General Earth and Planetary Sciencescomputer.software_genreKernel (linear algebra)10122 Institute of GeographyKernel methodWavelet1706 Computer Science ApplicationsRadiative transferLife ScienceKernel regressionData mining910 Geography & travelcomputer2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS)
researchProduct

Revisitation of Nonorthogonal Spin Adaptation in Coupled Cluster Theory.

2015

The benefits of what is alternatively called a nonorthogonally spin-adapted, spin-free, or orbital representation of the coupled cluster equations is discussed relative to orthogonally spin-adapted, spin-orbital, and spin-integrated theories. In particular, specific linear combinations of the orbital cluster amplitudes, denoted spin-summed amplitudes, are shown to reduce the number of contractions that must be explicitly performed and to simplify the expressions and their derivation. The computational efficiency of the spin-summed approach is discussed and compared to orthogonally spin-adapted and spin-integrated approaches. The spin-summed approach is shown to have significant computationa…

Computer scienceAdaptation (eye)computer.software_genreComputer Science ApplicationsAmplitudeCoupled clusterCluster (physics)Condensed Matter::Strongly Correlated ElectronsData miningStatistical physicsPhysical and Theoretical ChemistryRepresentation (mathematics)Linear combinationcomputerSpin-½Journal of chemical theory and computation
researchProduct

Hierarchical modeling for rare event detection and cell subset alignment across flow cytometry samples.

2013

Flow cytometry is the prototypical assay for multi-parameter single cell analysis, and is essential in vaccine and biomarker research for the enumeration of antigen-specific lymphocytes that are often found in extremely low frequencies (0.1% or less). Standard analysis of flow cytometry data relies on visual identification of cell subsets by experts, a process that is subjective and often difficult to reproduce. An alternative and more objective approach is the use of statistical models to identify cell subsets of interest in an automated fashion. Two specific challenges for automated analysis are to detect extremely low frequency event subsets without biasing the estimate by pre-processing…

Computer scienceAdaptive Immunitycomputer.software_genre0302 clinical medicineSingle-cell analysisEnumerationBiology (General)Immune ResponseEvent (probability theory)0303 health sciencesEcologymedicine.diagnostic_testT CellsStatisticsFlow Cytometry3. Good healthComputational Theory and MathematicsData modelModeling and SimulationMedicineData miningImmunotherapyResearch ArticleTumor ImmunologyQH301-705.5Immune CellsImmunologyContext (language use)BiostatisticsModels BiologicalFlow cytometry03 medical and health sciencesCellular and Molecular NeuroscienceGeneticsmedicineHumansSensitivity (control systems)Statistical MethodsImmunoassaysMolecular BiologyBiologyEcology Evolution Behavior and Systematics030304 developmental biologybusiness.industryImmunityReproducibility of ResultsPattern recognitionStatistical modelImmunologic SubspecialtiesLymphocyte SubsetsImmunologic TechniquesClinical ImmunologyArtificial intelligencebusinesscomputerMathematics030215 immunologyPLoS computational biology
researchProduct

Compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assessment.

2007

Abstract Background Similarity of sequences is a key mathematical notion for Classification and Phylogenetic studies in Biology. It is currently primarily handled using alignments. However, the alignment methods seem inadequate for post-genomic studies since they do not scale well with data set size and they seem to be confined only to genomic and proteomic sequences. Therefore, alignment-free similarity measures are actively pursued. Among those, USM (Universal Similarity Metric) has gained prominence. It is based on the deep theory of Kolmogorov Complexity and universality is its most novel striking feature. Since it can only be approximated via data compression, USM is a methodology rath…

Computer scienceAlgorismesPrediction by partial matchingCompression dissimilaritycomputer.software_genreBiochemistryProtein Structure SecondaryPhylogenetic studiesStructural BiologySequence Analysis ProteinDatabases Proteinlcsh:QH301-705.5Biological dataNCDApplied MathematicsGenomicsClassificationCDComputer Science ApplicationsBenchmarking:Informàtica::Informàtica teòrica [Àrees temàtiques de la UPC]Universal compression dissimilarityArea Under CurveMetric (mathematics)lcsh:R858-859.7Data miningAlgorithmsData compressionResearch Article:Informàtica::Aplicacions de la informàtica::Bioinformàtica [Àrees temàtiques de la UPC]Normalization (statistics)lcsh:Computer applications to medicine. Medical informaticsBioinformatics Sequence Alignment AlgorithmsSet (abstract data type)Similarity (network science)Normalized compression sissimilarityData compression (Computer science)AnimalsHumansAmino Acid SequenceMolecular BiologyBiologyDades -- Compressió (Informàtica)USMUniversal similarity metricProteinsUCDProtein Structure TertiaryData setGenòmicaStatistical classificationlcsh:Biology (General)ROC CurvecomputerSequence AlignmentSoftwareBMC bioinformatics
researchProduct

Machine Learning Techniques for Intrusion Detection: A Comparative Analysis

2016

International audience; With the growth of internet world has transformed into a global market with all monetary and business exercises being carried online. Being the most imperative resource of the developing scene, it is the vulnerable object and hence needs to be secured from the users with dangerous personality set. Since the Internet does not have focal surveillance component, assailants once in a while, utilizing varied and advancing hacking topologies discover a path to bypass framework " s security and one such collection of assaults is Intrusion. An intrusion is a movement of breaking into the framework by compromising the security arrangements of the framework set up. The techniq…

Computer scienceAnomaly-based intrusion detection system02 engineering and technologyIntrusion detection systemIDSMachine learningcomputer.software_genre[ INFO.INFO-CV ] Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV][INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]Machine LearningResource (project management)Component (UML)0202 electrical engineering electronic engineering information engineeringROCSet (psychology)[ INFO.INFO-AI ] Computer Science [cs]/Artificial Intelligence [cs.AI]False Positivebusiness.industryACM[INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV]020206 networking & telecommunicationsPrecisionObject (computer science)True PositiveOutlier020201 artificial intelligence & image processingThe InternetArtificial intelligenceData miningbusinesscomputer
researchProduct

Combining conjunctive rule extraction with diffusion maps for network intrusion detection

2013

Network security and intrusion detection are important in the modern world where communication happens via information networks. Traditional signature-based intrusion detection methods cannot find previously unknown attacks. On the other hand, algorithms used for anomaly detection often have black box qualities that are difficult to understand for people who are not algorithm experts. Rule extraction methods create interpretable rule sets that act as classifiers. They have mostly been combined with already labeled data sets. This paper aims to combine unsupervised anomaly detection with rule extraction techniques to create an online anomaly detection framework. Unsupervised anomaly detectio…

Computer scienceAnomaly-based intrusion detection systemNetwork securityintrusion detectiontunkeutumisen havaitseminenFeature extractionDiffusion mapdiffusion mapIntrusion detection systemMachine learningcomputer.software_genrepoikkeavuuden havaitseminenBlack boxtiedon louhintan-grammiCluster analysista113Training setrule extractionbusiness.industryn-gramanomaly detectiondiffuusiokarttakoneoppiminensääntöjen erottaminenAnomaly detectionArtificial intelligenceData miningtiedonlouhintabusinesscomputer2013 IEEE Symposium on Computers and Communications (ISCC)
researchProduct

Vibrational spectroscopy provides a green tool for multi-component analysis

2010

Abstract Based on the literature published in the past decade, we focus on the possibilities offered by vibrational-spectroscopy-based techniques to make multi-component analysis of samples independently of their physical state. We discuss the main chemometric tools proposed for developing calibration models and solving problems derived from spectroscopic non-idealities (e.g., highly overlapped spectral bands or the presence of spectral non-linearity), and the benefits provided by vibrational-spectroscopy-based multi-component analysis in industry. Our main objective is to show that vibrational spectroscopy provides fast analytical methods that enable non-destructive analysis and permits, i…

Computer scienceCalibration (statistics)Infrared spectroscopyMineralogySample (statistics)Spectral bandscomputer.software_genreAnalytical ChemistryChemometricsNonlinear systemComponent analysisData miningFocus (optics)computerSpectroscopyTrAC Trends in Analytical Chemistry
researchProduct