Scaling in the Cloud
Authored on 19 February, 2025 by Naresh V
Introduction
The cloud is said to be infinite. Major cloud providers promise to provision thousands of servers in a moment. Yet, behind the veils of advertisement, clouds are still bound by physical constraints. Nevertheless, cloud users must realize that scaling "up" or adding more resources must be justified and necessary, because it directly translates into higher expenses. Scaling is not restrained only to changing capacity based on end-user demand. It is also an important topic in System Design discussions.
More hardware can do more work only if the software can make use of it. In other words, if the software is not designed to "scale", then adding hardware will not result in improved performance. Yet, for brevity, we shall only discuss about how some common hardware components can be scaled cost effectively, as this falls under the responsibility of the IE teams.
Before jumping into the details of scaling, remember this golden rule: "More hardware will not always solve the problem, and at best, will solve it only temporarily."
Objectives of Scaling
Common Scaling Strategies
Scaling strategies are highly dependent on the nature of the underlying application:
  1. Stateless Applications: Stateless applications can be horizontally scaled, by adding more replicas of the application servers. In AWS, Autoscaling is a service that scales some core components like EC2 and RDS based on Cloudwatch metrics. EC2 Autoscaling is a service which modifies capacity in Autoscaling Groups based on metrics.
  2. Stateful Applications: Stateful applications can be scaled only vertically. This means that the underlying hardware has to be upgraded to a higher configuration to support more computational work. A change in configuration almost always results in a planned downtime.
    Databases are commonly used stateful applications. Most databases use parallelism internally by using multiple threads which run on separate cores to process data faster.
    Use of Read Replicas: Some Stateful applications can expand their read capacity horizontally. In this model, one instance in the cluster accepts writes (data changes) while the reader instance accepts only reads. As this model inherently suffers from data hazards like Write After Read (WaR), Read After Write (RaW) and Write After Write (WaW), applications must take precautions to safe gaurd against them.
  3. Distributed systems: Distributed systems like Cassandra and MySQL NDB Cluster support horizontal scaling of their writer instances. Thus, it is possible to increase the combined capacity (write and read) capacity more linearly in these clusters than in a Primary-Replica model, when new nodes join the cluster.
  4. Static files: Static files, or static assets, are files which are not user specific. They can be of any content type, and usually do not require a web application to be served. Examples of common static files include html, images, JavaScripts, CSS files, fonts, audio and video. One characteristic of static files that draws special attention is their frequency of being downloaded. Hence, if many static files are loaded on a page, the page load time increases, which can adversely affect user experience and SEO rankings. To optimize loading of static files, the simplest strategy is to use caching.
    If you choose to host the HTTP caching layer (self hosted CDN) yourself, OSS software like Nginx, Squid Cache and Varnish can be used. For organizations looking to use a SaaS caching solution, a commercial CDN is the answer. Commercial CDNs use proprietary technology to cache content at their Points of Presence (PoPs) or Edge Locations to avoid a round trip to the origin server (you).
Scaling examples
Horizontally scaling VMs hosting stateless applications
In the diagram, we can see how Autoscaling frameworks add replicas of stateless applications to handle additional load.
Stateless applications do not store any dynamic data in their local storage. Hence, every replica of the application server can be considered identical. Therefore, requests can be served by any of these replicas.
In case there are a fixed number of application servers, then deployment can be done on them in batches. While deploying on a server, the IE team must place it in maintenance mode and stop live traffic from reaching it. An alternative in cloud environments is to use disk snapshots of a base server to create replicas. Disk snapshots carry the risk of data inconsistencies if not done correctly.
Vertically Scaling EC2 instances hosting stateful applications
A resource is scaled up vertically by upgrading it to a higher hardware configuration. Here, a 2 core, 4 GB memory instance is upgraded to a 8 core, 16 GB memory instance.
Vertical scaling is required in two scenarios. One, if the application is stateful and stores data in its local disk. Two, if the application is stateless and horizontally scaled but each scaling unit requires more resources to run correctly.
The below factors decide the appropriate choice of VM:
Scaling stateful applications horizontally using replicas
Multiple readers/replicas apply updates from the writer/primary DB continuously. Readers can offload read-only traffic from the writer. Although the figure shows readers in separate zones, it is possible to create multiple readers in the same zone.
Scaling is one benefit of replication, apart from data redundancy. It is possible to horizontally scale reader capacity by placing a load balancer before the replicas.
Scaling static files performance using CDN
Illustration of how a CDN providers' edge locations fetch data from the origin server, and, if required, cache the assets.
CDNs have geographically spread caching infrastructure, with hundreds of Points of Presence (PoPs).[1][2][3] Based on the configuration at the CDN and the Content-Type header, the CDN provider caches static content physically closest to the end user.
Caching content at the CDN reduces traffic at the origin, thus decreasing allocated resources and hence expenses. CDNs have very high throughput capacity, about hundreds of Tbps. Thus, they typically also can defend your site against DDoS attacks.
References
  1. https://www.cloudflare.com/en-in/network/
  2. https://www.fastly.com/network-map
  3. https://www.akamai.com/why-akamai/global-infrastructure
Require related assistance? Contact us today!