ROUGH DRAFT authorea.com/98265

# This block failed to display. Double-click this text to correct any errors. Your changes are saved.

Abstract

Elasticity is a key feature of cloud computing where resources are allocated and released according to user demands. Reactive auto scaling, in which the scaling actions take place just after meeting the triggering thresholds, suffers from several issues like risk of under provisioning at peak loads and over provisioning during other times. Proactive scaling solutions, where a user’s future resource demand can be forecast and necessary scaling actions enacted beforehand, can overcome these issues. Nevertheless, the effectiveness of such proactive scaling solutions depends on the accuracy of the prediction method(s) adopted. We propose a forecasting technique to enhance the accuracy of workload forecasting in cloud auto scalers. An ensemble workload prediction mechanism based on time series and machine learning techniques is proposed to make more accurate predictions on drastically different workload patterns. In this work, we initially evaluated several forecasting models for their applicability in forecasting different workload patterns. The proposed ensemble technique is then implemented using three well-known forecasting models and tested for three real-world workloads. Simulation results show that our ensemble method produces significantly lower forecast errors compared to the use of individual models and the prediction technique employed in Apache Stratos, an open source PaaS platform.

Keywords-cloud computing; auto scaling; workload prediction; time series analysis

# Introduction

Cloud computing has already gained ground with advantages like scalability, on-demand resource provisioning, and high availability. According to the NIST definition (Mell 2011), cloud computing refers to the delivery of computing resources, such as networks, servers, storage, applications, and platforms, over the network based on user demand. Based on this definition, it is evident that the elasticity is a major requirement of any cloud platform, which is often achieved via an auto-scaling process. Auto scaling refers to the dynamic allocation and release of resources for a cloud application, in order to optimize its resource utilization while minimizing the cost, as well as achieving the desired Quality of Service (QoS) and availability goals (al. 2010), (Roy 2011).

It is the responsibility of the application developer to deal with auto scaling, if it is operating on the IaaS level. By contrast, if the application itself runs on top of the PaaS layer, then the burden of auto scaling is removed from the developer and placed onto the PaaS provider. Operating at the PaaS layer enables the auto scaling services to use high level, application-specific metrics such as the number of requests in flight, as well as low level, cloud-specific metrics such as CPU and memory utilization. PaaS-level auto scaling have to mainly deal with infrastructure resource management, as several applications share and compete for the resources simultaneously. Hence, it is difficult to auto scale at the PaaS level as the same PaaS platform may run vastly different applications with different workload patterns.

Auto scaling can be achieved via either reactive or proactive approaches. In reactive, threshold-based auto scaling, users have to specify thresholds for workload metrics, and scaling would occur only after such thresholds are exceeded (Lorido-Botran 2014). In this approach, even though it is desirable to maintain high thresholds which would result in higher levels of resource utilization, thresholds should be sufficiently small to compensate for delays in formulating and executing scaling actions. Although this seems to be the most customizable approach from the user’s perspective, it has several weaknesses (Alipour 2014) such as the inability to adapt to workload patterns (e.g., thrashing during load fluctuations), low resource utilization and higher cost under smaller thresholds, and the risk of service/QoS degradation under larger thresholds and rapidly increasing workloads. Coming up with an optimum threshold in a reactive auto scaling system is nontrivial and it requires experience and expertise (e.g., awareness of traffic patterns and IaaS/PaaS parameters like pricing models and packages) for composing effective thresholds that can carry out scaling efficiently.

In the proactive approach, resource requirements for the future time horizon are forecasted based on the demand history. With accurate predictions, an application can take early scaling decisions so that when the workload reaches a particular level, the required resources would have already been allocated. Consequently, resource utilization can be safely increased to a maximum level. However, the reliability of such an approach depends mainly on the accuracy of the predicted values. Because we forecast the future workload requirement (e.g., CPU utilization, memory consumption, and network request count) from historical data of the respective measures, time series forecasting methods are applicable for this requirement.

We have identified following challenges specific to workload prediction for auto scaling:

• A PaaS cloud system may be used to build different applications with vastly different workload patterns. Hence, the workload prediction model should not tend to get overfitted to a specific workload pattern.

• As the workload dataset grows with time, the predictive model should evolve and continuously learn the latest workload characteristics.

• The workload predictor should be able to produce results within a bounded time.

• Given that the time horizon for the prediction should be chosen based on the physical constraints like uptime and graceful shutdown time of Virtual Machines (VMs), the predictor should be able to produce sufficiently accurate results over a sufficiently large time horizon.

The objective of this work is to come up with a prediction method that can be trained in real-time to capture the latest trends and provide sufficiently accurate results for drastically different workload patterns. First, we evaluate the ability of existing models to produce accurate real-time predictions against evolving datasets. In this evaluation we tested time series forecasting methods, including statistical methods like ARIMA and exponential model, machine learning models like neural networks, and a prediction method used in Apache Stratos, an open source PaaS framework. Through this, we demonstrate the limitations in existing solutions in accurately predicting real-world workload traces, and highlight the need for more accurate predictions. Next, we propose an ensemble technique which combines results from a neural network, ARIMA and exponential models. Based on simulations with three publicly available workload traces (two real-world cloud workload datasets and the Google cluster dataset), we demonstrate that the proposed ensemble model outperforms each of the tested individual models.

The rest of the paper is structured as follows: Section 2 outlines related work on cloud workload prediction and their limitations. Section 3 describes and evaluates several existing time series forecasting techniques while exploring their ability to provide accurate predictions on publicly available datasets with predictable patterns under real-time training scenarios. Section 4 introduces the proposed ensemble technique and prediction algorithm. Section 5 presents the performance analysis, and concluding remarks are presented in Section 6.

# Related Work

There is a significant number of already established research work in the workload prediction domain. Kupferman et al. (Kupferman 2009) applied single order auto-regression to predict the request rate and found that its accuracy depends on several parameters such as the size of the input window and the horizon window. Exponential smoothing is popularly used for prediction. Mi et al. (Mi 2010) also used quadratic exponential smoothing against real workload traces such as World Cup 98 (Arlitt 1998), and showed good results. Auto-Regressive Moving Average (ARMA) method is one of the dominant time series analysis techniques for workload and resource usage prediction. Roy et al. (Roy 2011) used a second order ARMA filter for workload prediction on the World Cup 98 traces and showed accurate results. Chris et al. used an exponential smoothing algorithm to forecast how many requests to expect and how many requests will be enqueued for the next $$t$$ seconds (Bunch 2012) for an auto scaler in PaaS cloud.

Machine learning