Search results for "DATA MINING"

showing 7 items of 907 documents

Scalable implementation of dependence clustering in Apache Spark

2017

This article proposes a scalable version of the Dependence Clustering algorithm which belongs to the class of spectral clustering methods. The method is implemented in Apache Spark using GraphX API primitives. Moreover, a fast approximate diffusion procedure that enables algorithms of spectral clustering type in Spark environment is introduced. In addition, the proposed algorithm is benchmarked against Spectral clustering. Results of applying the method to real-life data allow concluding that the implementation scales well, yet demonstrating good performance for densely connected graphs. peerReviewed

ta113ta213Apache SparkComputer sciencedatasetsCorrelation clusteringdata miningcomputer.software_genrealgorithmsSpectral clusteringComputational sciencedependence clusteringData stream clusteringCURE data clustering algorithmScalabilitySpark (mathematics)algoritmitCanopy clustering algorithmData miningtiedonlouhintaCluster analysisclustering algorithmscomputerdata processingtietojenkäsittely

researchProduct

Toward modernizing the systematic review pipeline in genetics: efficient updating via data mining

2012

Purpose: The aim of this study was to demonstrate that modern data mining tools can be used as one step in reducing the labor necessary to produce and maintain systematic reviews. Methods: We used four continuously updated, manually curated resources that summarize MEDLINE-indexed articles in entire fields using systematic review methods (PDGene, AlzGene, and SzGene for genetic determinants of Parkinson disease, Alzheimer disease, and schizophrenia, respectively; and the Tufts Cost-Effectiveness Analysis (CEA) Registry for cost-effectiveness analyses). In each data set, we trained a classification model on citations screened up until 2009. We then evaluated the ability of the model to class…

text classificationTechnology Assessment BiomedicalDatabases FactualComputer scienceCost-Benefit AnalysisReview Literature as TopicHardware_PERFORMANCEANDRELIABILITYEmpirical Researchcomputer.software_genre03 medical and health sciences0302 clinical medicineMeta-Analysis as TopicAlzheimer DiseaseHardware_INTEGRATEDCIRCUITSData MiningHumanssupport vector machineOriginal Research Article030212 general & internal medicineGenetics (clinical)030304 developmental biologyGenetics0303 health sciencesParkinson DiseasePipeline (software)3. Good healthmeta-analysisReview Literature as Topicmachine learningSchizophreniaData miningPeriodicals as Topiccomputercitation screeningSoftwareGenetics in Medicine

researchProduct

Information, Communications and Media Technologies for Sustainability: Constructing Data-Driven Policy Narratives

2021

This paper introduces the idea of data-driven narratives to examine how the use of information, communications, and media technologies (ICMTs) impacts the sustainable growth of economies. While ICMTs have regularly been advocated as a policy tool for growth and development, there is a research gap in empirical studies validating how such policies may be effective. This analysis is based on historical panel data from 39 economies across the developed North (19) and developing South (20). The industry-standard Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology was applied to construct narratives that weave extant theories with empirical data. The art of developing data-dri…

tieto- ja viestintätekniikkalcsh:TJ807-830Geography Planning and Developmenttulotasolcsh:Renewable energy sourcestaloudellinen kehitysCRISP-DM methodologyManagement Monitoring Policy and LawtalouskasvuEmpirical researchPolitical scienceNarrativedata analytics and modelinglcsh:Environmental scienceslcsh:GE1-350Sustainable developmentkestävä kehitysRenewable Energy Sustainability and the Environmentbusiness.industrylcsh:Environmental effects of industries and plantstaloudellinen kestävyysanalyysimenetelmätPublic relationssustainable development goalskansainvälinen vertailuCross-Industry Standard Process for Data Mining (CRISP-DM)tavoitteetlcsh:TD194-195kehitysmaatIT for developmentdataSustainabilityIntermediationtiedonlouhintabusinessSustainable growth rateConstruct (philosophy)Panel dataSustainability

researchProduct

Ordered fuzzy rules generation based on incremental dataset

2021

This paper proposes a novel approach for building transparent knowledge-based systems by generating interpretable fuzzy rules that allow for present dependences between quantitative variables by accounting for uncertainty and the dynamics of their values. In the approach, IF-THEN rules are used to show the conditional relationship between the ordered fuzzy numbers, which contain additional information about the tendencies of variables' value changes. This paper elaborates an approach of mining ordered fuzzy rules from numerical data included in an incremental database. This approach develops the ability to record uncertainty and its change in the context of rapidly changing data. In additio…

uncertainty modelingfuzzy setBasis (linear algebra)Computer scienceInferenceValue (computer science)Context (language use)computer.software_genreFuzzy logicordered fuzzy numberKnowledge-based systemsmachine learningordered fuzzy rulesFuzzy numberProduction (economics)Data miningrules generationcomputer2021 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)

researchProduct

Application of the Information Bottleneck method to discover user profiles in a Web store

2018

The paper deals with the problem of discovering groups of Web users with similar behavioral patterns on an e-commerce site. We introduce a novel approach to the unsupervised classification of user sessions, based on session attributes related to the user click-stream behavior, to gain insight into characteristics of various user profiles. The approach uses the agglomerative Information Bottleneck (IB) algorithm. Based on log data for a real online store, efficiency of the approach in terms of its ability to differentiate between buying and non-buying sessions was validated, indicating some possible practical applications of the our method. Experiments performed for a number of session sampl…

unsupervised classificationComputer science02 engineering and technologyE-commerceCustomer profile020204 information systems0202 electrical engineering electronic engineering information engineeringe-commerceWeb storeCluster analysisUser profileInformation retrievalbusiness.industrycustomer profileBehavioral patternInformation bottleneck methoddata miningComputer Science Applicationsmachine learningComputational Theory and MathematicsAgglomerative Information Bottleneck020201 artificial intelligence & image processinguser profilebusinessclusteringInformation SystemsJournal of Organizational Computing and Electronic Commerce

researchProduct

Pricavy-Preserving Aspects for Data Mining in ICT Services

The steady adoption of systems for profiling users behavior, collecting and critically interpreting as much information as possible about likes and dislikes, interests and habits of Internet residents and generic services consumers have rapidly become some of the hottest keywords within networking research community. Indeed, mining information about users behavior is an advantage for both service providers and service customers: on one side, providers can improve their revenues by focusing on the most successful features of their services, while on the other side, users can enjoy services which reflect closer their specific needs. There are many examples of user profiling applications. Inte…

user profilingsecure multi-party computationSettore ING-INF/03 - Telecomunicazionisecret sharingdata miningprivacyclustering

researchProduct

Identifying the Sales Patterns of Online Stores with Time Series Clustering

2018

Electronic commerce, especially in the business-to-consumer (B2C) context, has for years been a popular research topic in information systems (IS). However, the prior research on the topic has traditionally been dominated by the consumer focus instead of the business focus of online stores. For example, whereas various segmentations exist for online consumers based on their purchase behaviour, no such segmentations have been developed for online stores based on their sales patterns. In this study, our objective is to address this gap in prior research by identifying the most typical sales patterns of online stores operating in the B2C context. By using self-organising maps (SOM) to analyse …

verkkokauppa (verkkoliiketoiminta)Series (mathematics)Computer scienceverkkokauppabusiness-to-consumercomputer.software_genreB2Conline storesklusteritsegmentointisales patternsSegmentationData miningCluster analysiscomputertime series clustering

researchProduct