Auto Scaling with Apache Stratos

Posted: May 19, 2014 in Uncategorized
Tags: , ,
What is Auto Scaling

An auto scaling system automatically scale up, which means spawning up the service instances during the rush time and automatically scale down, which means shutting down the additional service instances during the non peak time.

Auto Scaling parameters

Apache Stratos PAAS framework provides a built in auto scaling capability, based on three parameters; Requests In Flight, Load Average and Memory Consumption. These three parameters are described below;
Requests In Flight (RIF)
Requests In Flight is the number of requests, which is not yet served by the Load Balancer (LB). When the LB receives a request, it increments the RIF count by one. Then it forwards the request to one of the service instances. When the service instance sends the response, LB forwards that response back to the client, who has originally sent the request. After forwarding the request, LB decrements the RIF count by one. Thus, LB increments the RIF when a request is received and decrement when the response is sent back to the client. Hence the RIF count denotes the number of requests which are pending to be served. If RIF count is high, that means there are high numbers of requests queued but not yet served. This is a good indication that, the system needs some additional resources to serve the incoming requests.

Load Average

Load Average value is the CPU usage of the service instances. High value of the Load Average denotes the CPU usage or loads of the service instances are high for some reason. It may be due to the requests, which are being served by the instance are high or the CPU is being consumed by any other process. Regardless of the reason, high CPU usage indicates that, the instances or nodes are not in a healthy condition to serve more incoming requests. In that case, additional resources or instances should be added to cater the future requests. It is the time for scaling up.

The formula used to calculate Load Average is: Load Average = (loadAvg/cores)*100
In the above formula, loadAvg refers to the load average of the instance during last minute of time and cores refer to the number of CPU cores of the instance.

To be more specific below is the Java code to calculate the loadAverage

double loadAvg = (double)ManagementFactory.getOperatingSystemMXBean().getSystemLoadAverage();
 // assume system cores = available cores to JVM
int cores = ManagementFactory.getOperatingSystemMXBean().getAvailableProcessors();
double loadAvgPercentage = (loadAvg/cores) * 100;

Above loadAvg is same as the first number which has three digits of the load average of Linux Shell commands “uptime” and “top”.
$ uptime
18:24:39 up 4 days,  4:08,  8 users,  load average: 2.82, 2.63, 2.17
A detailed article on Linux load average can be found here.

Memory Consumption

Memory Consumptions is the average memory usage of the instances. Memory consumption growth may be due to the high number of threads which processes the incoming requests. High memory consumption of nodes indicates that, the system needs to add more nodes to serve incoming requests.

The formula used to calculate the Memory Consumptions is: (usedMemory/totalMemory)*100
In the above formula, usedMemory refers to the amount of current memory usage and totalMemory refers to the total amount of memory available in the instance.
The Jave code to calculate the memory consumption is pasted below for better understanding.

private static final int MB = 1024 * 1024;
OperatingSystemMXBean osBean = (OperatingSystemMXBean) ManagementFactory.getOperatingSystemMXBean();
double totalMemory = (double)(osBean.getTotalPhysicalMemorySize()/ MB);
double usedMemory = (double)((totalMemory - (osBean.getFreePhysicalMemorySize() / MB) ));
double memoryConsumption = (usedMemory / totalMemory) * 100;

totalMemory and usedMemory in above formula are same as the output of the free Shell command in Linux

$ free -m
total       used       free     shared    buffers     cached
Mem:          7687       6362       1324          0        177       1970
-/+ buffers/cache:       4214       3472
Swap:        15623       1376      14247

Accordingly; Memory Consumption = (6362/7687) * 100 = 82

Auto Scaling Policy

When the values of; requests in flight or load average or memory consumption is high the system should scale up (add more nodes) and when the same is low, system should scale down (remove additional nodes). The question is when it is meant to be HIGH and when it is meant to be LOW. Stratos uses a policy called Auto Scaling Policy to decide this HIGH and LOW.  DevOps can deploy multiple auto scaling policies and user can select a policy at the time of subscription. Auto scaling policy describes the threshold values of each parameter to decide whether it is high or low.
A sample auto scaling policy is shown below.

  "id": "autoscale-policy-1",
  "loadThresholds": {
    "requestsInFlight": {
      "average": 50
    "memoryConsumption": {
      "average": 70
    "loadAverage": {
      "average": 70
Auto Scaling policy can be deployed the Stratos UI, Stratos CLI or by invoking the REST service hosted in Stratos manager as below.

curl -X POST -H "Content-Type: application/json" -d @policyJson -k -u admin:admin https://<SM_HOST>:9443/stratos/admin/policy/autoscale


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s