Search results for "DATA MINING"
showing 7 items of 907 documents
Scalable implementation of dependence clustering in Apache Spark
2017
This article proposes a scalable version of the Dependence Clustering algorithm which belongs to the class of spectral clustering methods. The method is implemented in Apache Spark using GraphX API primitives. Moreover, a fast approximate diffusion procedure that enables algorithms of spectral clustering type in Spark environment is introduced. In addition, the proposed algorithm is benchmarked against Spectral clustering. Results of applying the method to real-life data allow concluding that the implementation scales well, yet demonstrating good performance for densely connected graphs. peerReviewed
Toward modernizing the systematic review pipeline in genetics: efficient updating via data mining
2012
Purpose: The aim of this study was to demonstrate that modern data mining tools can be used as one step in reducing the labor necessary to produce and maintain systematic reviews. Methods: We used four continuously updated, manually curated resources that summarize MEDLINE-indexed articles in entire fields using systematic review methods (PDGene, AlzGene, and SzGene for genetic determinants of Parkinson disease, Alzheimer disease, and schizophrenia, respectively; and the Tufts Cost-Effectiveness Analysis (CEA) Registry for cost-effectiveness analyses). In each data set, we trained a classification model on citations screened up until 2009. We then evaluated the ability of the model to class…
Information, Communications and Media Technologies for Sustainability: Constructing Data-Driven Policy Narratives
2021
This paper introduces the idea of data-driven narratives to examine how the use of information, communications, and media technologies (ICMTs) impacts the sustainable growth of economies. While ICMTs have regularly been advocated as a policy tool for growth and development, there is a research gap in empirical studies validating how such policies may be effective. This analysis is based on historical panel data from 39 economies across the developed North (19) and developing South (20). The industry-standard Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology was applied to construct narratives that weave extant theories with empirical data. The art of developing data-dri…
Ordered fuzzy rules generation based on incremental dataset
2021
This paper proposes a novel approach for building transparent knowledge-based systems by generating interpretable fuzzy rules that allow for present dependences between quantitative variables by accounting for uncertainty and the dynamics of their values. In the approach, IF-THEN rules are used to show the conditional relationship between the ordered fuzzy numbers, which contain additional information about the tendencies of variables' value changes. This paper elaborates an approach of mining ordered fuzzy rules from numerical data included in an incremental database. This approach develops the ability to record uncertainty and its change in the context of rapidly changing data. In additio…
Application of the Information Bottleneck method to discover user profiles in a Web store
2018
The paper deals with the problem of discovering groups of Web users with similar behavioral patterns on an e-commerce site. We introduce a novel approach to the unsupervised classification of user sessions, based on session attributes related to the user click-stream behavior, to gain insight into characteristics of various user profiles. The approach uses the agglomerative Information Bottleneck (IB) algorithm. Based on log data for a real online store, efficiency of the approach in terms of its ability to differentiate between buying and non-buying sessions was validated, indicating some possible practical applications of the our method. Experiments performed for a number of session sampl…
Pricavy-Preserving Aspects for Data Mining in ICT Services
The steady adoption of systems for profiling users behavior, collecting and critically interpreting as much information as possible about likes and dislikes, interests and habits of Internet residents and generic services consumers have rapidly become some of the hottest keywords within networking research community. Indeed, mining information about users behavior is an advantage for both service providers and service customers: on one side, providers can improve their revenues by focusing on the most successful features of their services, while on the other side, users can enjoy services which reflect closer their specific needs. There are many examples of user profiling applications. Inte…
Identifying the Sales Patterns of Online Stores with Time Series Clustering
2018
Electronic commerce, especially in the business-to-consumer (B2C) context, has for years been a popular research topic in information systems (IS). However, the prior research on the topic has traditionally been dominated by the consumer focus instead of the business focus of online stores. For example, whereas various segmentations exist for online consumers based on their purchase behaviour, no such segmentations have been developed for online stores based on their sales patterns. In this study, our objective is to address this gap in prior research by identifying the most typical sales patterns of online stores operating in the B2C context. By using self-organising maps (SOM) to analyse …