0000000000139823

AUTHOR

Junming Shao

Prototype-based learning on concept-drifting data streams

Data stream mining has gained growing attentions due to its wide emerging applications such as target marketing, email filtering and network intrusion detection. In this paper, we propose a prototype-based classification model for evolving data streams, called SyncStream, which dynamically models time-changing concepts and makes predictions in a local fashion. Instead of learning a single model on a sliding window or ensemble learning, SyncStream captures evolving concepts by dynamically maintaining a set of prototypes in a new data structure called the P-tree. The prototypes are obtained by error-driven representativeness learning and synchronization-inspired constrained clustering. To ide…

research product

Insight into Disrupted Spatial Patterns of Human Connectome in Alzheimer’s Disease via Subgraph Mining

Alzheimer’s disease (AD) is the most common cause of age-related dementia, which prominently affects the human connectome. In this paper, the authors focus on the question how they can identify disrupted spatial patterns of the human connectome in AD based on a data mining framework. Using diffusion tractography, the human connectomes for each individual subject were constructed based on two diffusion derived attributes: fiber density and fractional anisotropy, to represent the structural brain connectivity patterns. After frequent subgraph mining, the abnormal score was finally defined to identify disrupted subgraph patterns in patients. Experiments demonstrated that our data-driven approa…

research product

Discovering Aberrant Patterns of Human Connectome in Alzheimer's Disease via Subgraph Mining

Alzheimer's disease (AD) is the most common cause of age-related dementia, which prominently affects the human connectome. Diffusion weighted imaging (DWI) provides a promising way to explore the organization of white matter fiber tracts in the human brain in a non-invasive way. However, the immense amount of data from millions of voxels of a raw diffusion map prevent an easy way to utilizable knowledge. In this paper, we focus on the question how we can identify disrupted spatial patterns of the human connectome in AD based on a data mining framework. Using diffusion tractography, the human connectomes for each individual subject were constructed based on two diffusion derived attributes: …

research product

Robust Synchronization-Based Graph Clustering

Complex graph data now arises in various fields like social networks, protein-protein interaction networks, ecosystems, etc. To reveal the underlying patterns in graphs, an important task is to partition them into several meaningful clusters. The question is: how can we find the natural partitions of a complex graph which truly reflect the intrinsic patterns? In this paper, we propose RSGC, a novel approach to graph clustering. The key philosophy of RSGC is to consider graph clustering as a dynamic process towards synchronization. For each vertex, it is viewed as an oscillator and interacts with other vertices according to the graph connection information. During the process towards synchro…

research product

Scalable Clustering by Iterative Partitioning and Point Attractor Representation

Clustering very large datasets while preserving cluster quality remains a challenging data-mining task to date. In this paper, we propose an effective scalable clustering algorithm for large datasets that builds upon the concept of synchronization. Inherited from the powerful concept of synchronization, the proposed algorithm, CIPA (Clustering by Iterative Partitioning and Point Attractor Representations), is capable of handling very large datasets by iteratively partitioning them into thousands of subsets and clustering each subset separately. Using dynamic clustering by synchronization, each subset is then represented by a set of point attractors and outliers. Finally, CIPA identifies the…

research product

Graph Clustering with Local Density-Cut

In this paper, we introduce a new graph clustering algorithm, called Dcut. The basic idea is to envision the graph clustering as a local density-cut problem. To identify meaningful communities in a graph, a density-connected tree is first constructed in a local fashion. Building upon the local intuitive density-connected tree, Dcut allows partitioning a graph into multiple densely tight-knit clusters effectively and efficiently. We have demonstrated that our method has several attractive benefits: (a) Dcut provides an intuitive criterion to evaluate the goodness of a graph clustering in a more precise way; (b) Building upon the density-connected tree, Dcut allows identifying high-quality cl…

research product