6533b825fe1ef96bd128297b

RESEARCH PRODUCT

Experimental Performance Evaluation of Cloud-Based Analytics-as-a-Service

Francesco PacePietro MichiardiDaniele VenzanoMarco MilanesioDamiano Carra

subject

Service (business)FOS: Computer and information sciencesDistributed databaseComputer sciencebusiness.industryReliability (computer networking)Distributed computingRank (computer programming)020206 networking & telecommunicationsCloud computing02 engineering and technologyData modelingperformance evaluationstorageComputer Science - Distributed Parallel and Cluster ComputingAnalytics0202 electrical engineering electronic engineering information engineeringDistributed Parallel and Cluster Computing (cs.DC)business

description

An increasing number of Analytics-as-a-Service solutions has recently seen the light, in the landscape of cloud-based services. These services allow flexible composition of compute and storage components, that create powerful data ingestion and processing pipelines. This work is a first attempt at an experimental evaluation of analytic application performance executed using a wide range of storage service configurations. We present an intuitive notion of data locality, that we use as a proxy to rank different service compositions in terms of expected performance. Through an empirical analysis, we dissect the performance achieved by analytic workloads and unveil problems due to the impedance mismatch that arise in some configurations. Our work paves the way to a better understanding of modern cloud-based analytic services and their performance, both for its end-users and their providers.

https://dx.doi.org/10.48550/arxiv.1602.07919