6533b855fe1ef96bd12b1b5a
RESEARCH PRODUCT
A Windowing strategy for Distributed Data Mining optimized through GPUs
Nicandro Cruz-ramírezAlejandro Guerra-hernándezHéctor-gabriel Acosta-mesaXavier LimónFrancisco Grimaldosubject
Computer sciencebusiness.industryMulti-agent systemDecision treeProcess (computing)Window (computing)02 engineering and technologyMachine learningcomputer.software_genreRandom forestTree (data structure)C4.5 algorithmArtificial Intelligence020204 information systemsSignal Processing0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingComputer Vision and Pattern RecognitionArtificial intelligenceData miningbusinesscomputerSoftwaredescription
Abstract This paper introduces an optimized Windowing based strategy for inducing decision trees in Distributed Data Mining scenarios. Windowing consists in selecting a sample of the available training examples (the window) to induce a decision tree with an usual algorithm, e.g., J48; finding instances not covered by this tree (counter examples) in the remaining training examples, adding them to the window to induce a new tree; and repeating until a termination criterion is met. In this way, the number of training examples required to induce the tree is reduced considerably, while maintaining the expected accuracy levels; which is paid in terms of time performance. Our proposed enhancements solve this by searching for counter examples on GPUs and further reducing their number in the window. The resulting strategy is implemented in JaCa-DDM, our agents & artifacts tool for Distributed Data Mining, keeping the benefits of Windowing, while distributing the process and being faster than the traditional centralized approach, even performing similarly to Bagging and Random Forests in some cases. Experiments in data mining tasks are addressed, including a case study on pixel-based segmentation for the detection of precancerous cervical lesions on medical images.
year | journal | country | edition | language |
---|---|---|---|---|
2017-07-01 | Pattern Recognition Letters |