A Flexible Framework for Accurate Simulation of Cloud In-Memory Data Stores

Pierangelo Di Sanzo, Francesco Quaglia, Bruno Ciciani, Alessandro Pellegrini, Diego Didona, Paolo Romano, Roberto Palmieri, and Sebastiano Peluso

Published in: Simulation Modelling Practice and Theory, 2015
pdf Download PDF

In-memory (transactional) data stores, also referred to as data grids, are recognized as a first-class data management technology for cloud platforms, thanks to their ability to match the elasticity requirements imposed by the pay-as-you-go cost model. On the other hand, determining how performance and reliability/availability of these systems vary as a function of configuration parameters, such as the amount of cache servers to be deployed, and the degree of in-memory replication of slices of data, is far from being a trivial task. Yet, it is an essential aspect of the provisioning process of cloud platforms, given that it has an impact on the amount of cloud resources that are planned for usage. To cope with the issue of predicting/analysing the behavior of different configurations of cloud in-memory data stores, in this article we present a flexible simulation framework offering skeleton simulation models that can be easily specialized in order to capture the dynamics of diverse data grid systems, such as those related to the specific (distributed) protocol used to provide data consistency and/or transactional guarantees. Besides its flexibility, another peculiar aspect of the framework lies in that it integrates simulation and machine-learning (black-box) techniques, the latter being used to capture the dynamics of the data-exchange layer (e.g. the message passing layer) across the cache servers. This is a relevant aspect when considering that the actual data-transport/networking infrastructure on top of which the data grid is deployed might be unknown, hence being not feasible to be modeled via white-box (namely purely simulative) approaches. We also provide an extended experimental study aimed at validating instances of simulation models supported by our framework against execution dynamics of real data grid systems deployed on top of either private or public cloud infrastructures. Particularly, our validation test-bed has been based on an industrial-grade open-source data grid, namely Infinispan by JBoss/Red-Hat, and a de-facto standard benchmark for NoSQL platforms, namely YCSB by Yahoo. The validation study has been conducted by relying on both public and private cloud systems, scaling the underlying infrastructure up to 100 (resp. 140) Virtual Machines for the public (resp. private) cloud case. Further, we provide some experimental data related to a scenario where our framework is used for on-line capacity planning and reconfiguration of the data grid system.

BibTeX Entry:

author = {Di Sanzo, Pierangelo and Quaglia, Francesco and Ciciani, Bruno and Pellegrini, Alessandro and Didona, Diego and Romano, Paolo and Palmieri, Roberto and Peluso, Sebastiano},
journal = {Simulation Modelling Practice and Theory},
title = {A Flexible Framework for Accurate Simulation of Cloud In-Memory Data Stores},
year = {2015},
issn = {1569-190X},
month = jul,
number = {2},
pages = {219--238},
volume = {58},
doi = {10.1016/j.simpat.2015.05.011},
publisher = {Elsevier},
series = {SIMPAT}