6533b830fe1ef96bd1296fa9

RESEARCH PRODUCT

Incremental linear model trees on massive datasets

Andreas HapfelmeierStefan KramerJana Schmidt

subject

Class (computer programming)Computer scienceProcess (engineering)business.industryComputationLinear modelSampling (statistics)computer.software_genreMachine learningKISS principleData miningArtificial intelligenceOnline algorithmbusinesscomputer

description

The existence of massive datasets raises the need for algorithms that make efficient use of resources like memory and computation time. Besides well-known approaches such as sampling, online algorithms are being recognized as good alternatives, as they often process datasets faster using much less memory. The important class of algorithms learning linear model trees online (incremental linear model trees or ILMTs in the following) offers interesting options for regression tasks in this sense. However, surprisingly little is known about their performance, as there exists no large-scale evaluation on massive stationary datasets under equal conditions. Therefore, this paper shows their applicability on massive stationary datasets under various parameter settings. To reduce biases arising from the choice of a programming language or programming skills, all algorithms were reimplemented within the same framework and tested under the same conditions. Results on real-world datasets indicate that for massive stationary datasets parameter settings leading to complex models do not pay off, as there is at most a small accuracy gain at a much larger running time. Experimental evidence suggests that simple and fast algorithms perform best.

https://doi.org/10.1145/2480362.2480390