Scalable architecture patterns

What does scalability mean for systems and services?

Reading time: about 9 min


  • IT and Engineering

A scalable system is one that can handle rapid changes to workloads and user demands. Scalability is the measure of how well that system responds to changes by adding or removing resources to meet demands. The architecture is the hardware, software, technology and best practices used to build the networks, applications, processes, and services that make up your entire system.

Your system, which includes the architecture, services, products, and everything that defines your brand is considered scalable when:

  • It can add resources and scale up to seamlessly handle increased customer demand and larger workloads.
  • It can easily and seamlessly remove resources when demand and workloads decrease.

The idea is to build a system that can adjust capacity to meet constantly changing demand. The system needs to be highly accessible, and it needs to be available to all of your customers whenever and wherever they need it. 

For example, a well-designed, scalable website will function just as well whether one or thousands of users concurrently access it. There should not be any perceptible decrease in functionality as more users log on.

What is a scalability pattern?

Have you ever played with LEGO® bricks? Maybe you tried to build a structure without following the instructions only to have it collapse? But when you followed the meticulously illustrated instructions, you ended up with a solid structure that would collapse only if you deliberately pulled the bricks apart. The building techniques shown in the instructions have been tested and proven to be solutions to common structural problems encountered by many builders.

Architectural patterns in computing development are similar to the LEGO® building techniques found in the instructions. They are a set of development and programming techniques that have proven to solve common problems in computer system development. These patterns have good design structures, have well-defined properties, and have successfully solved problems in the past. 

But this does not mean that every scalability pattern will work for you. Your challenge is to select appropriate patterns and tailor them to solve problems that are unique to your system. Scalability patterns save you time because a lot of the work has been done for you.

What are some common scalability patterns?

There are several scalable architecture patterns to choose from. Here we’ll discuss some of the more common patterns used to solve architectural scaling problems. 

AKF scale cube

This is a three dimensional model that defines three approaches to scaling along the X, Y, and Z axes.

X-axis scaling

The X-axis describes scaling through multiple instances of the same component. You do this by cloning or replicating a service, application, or a set of data behind a load balancer. So if you have N clones of an application running, each instance handles 1/N of the load.

X-axis scaling patterns are easy to implement and increase transaction scalability. But, they can be costly to maintain as entire data sets are replicated across multiple servers and data caches grow exponentially.

Y-axis scaling

Scaling on the Y-axis is defined by the splitting or segmentation of dissimilar components into multiple macro or micro services along verb or noun boundaries. For example, a verb-based segment might define a service such as checkout. A noun-based segment might describe the shopping cart.

Each service can be scaled independently, which allows you to apply more resources only to the services that currently need them. To ensure high availability, each service should have its own, non-shared data set. 

The non-shared data sets help you to achieve fault isolation so you can quickly and easily diagnose and fix problems without scanning the whole system. But the Y-axis takes a long time to set up and is not very easy to implement.

Z-axis scaling

While the Y-axis splits up dissimilar components, the Z-axis is used to address the segmentation of similar components in your system. Each server runs an identical copy of the code but only for a subset (or shard) of that data.

A common use of a Z-axis split is customer by geographic location or customer by type. For example, paying customers on a subscription news site might have unlimited, 24/7 access to all site content. Non-paying customers of the same site have access to this same data, but they might be limited to opening and reading only three or four articles per month. 

This setup can reduce operational costs because the data segments are smaller and require fewer storage resources. But, Z-axis takes a lot of time to design and implement and requires a lot of automation to reduce system overhead. 

Horizontal and vertical scalability patterns

You can scale up (vertically), or you can scale out (horizontally). 

Vertical scaling

Vertical scaling is when you increase the capability of a component, such as a server, to improve its performance. For example, as more traffic hits your server, it’s performance will decrease. Adding more RAM and storage drives increases the server’s performance so it can more easily handle increased traffic.

Vertical scaling is easy to implement, saves money because you are not buying new resources, and easy to maintain. But, they can be a single point of failure that could result in significant downtime.

A company just starting out might use vertical scaling to help to keep cost down. But the vertical approach will eventually reach RAM and storage limits and you will need to add more resources to keep up with demand. 

Horizontal scaling

Horizontal scaling is when you increase performance capacity by adding more of the same type of resources to your system. For example, instead of increasing one server’s capacity to improve its performance, you add more servers to the system. A load balancer helps to distribute the workload to different servers depending on their availability. This increases the overall performance of the system.

More computing resources means more fault tolerance and fewer risks of downtime. But, adding servers and load balancers can be expensive. You should consider using a combination of on-premise resources and cloud resources to handle increased traffic.

Load balancing

Load balancers efficiently distribute user requests and workloads across a group of backend servers. The idea is to balance the work among various resources so that no single resource is overloaded. Load balancing helps your IT department to ensure the availability and scalability of your services.

A load balancer’s tasks include:

  • Discovering which resources are available in the system.
  • Check resource health to determine which resources are not only available but also whether they are working properly to handle the workload. If an available server has become corrupted, the load balancer should shut down the path to it and switch to another server without the user noticing any lag or downtime.
  • Determine which algorithm should be used to balance the work across multiple healthy backend servers.

Common load balancing methods (algorithms) include:

  • Least connection method: Traffic is routed to the server that has the fewest active connections.
  • Least response time method: The load balancer measures the amount of time the server takes to respond to a health monitoring request. Traffic is sent to the healthiest server with the lowest response time. Some load balancers will consider active connections with this algorithm. 
  • Round robin method: Traffic is sent to the first available server regardless of its current workload and active connections. After that server receives and works on the request, the load balancer moves it to the bottom of the queue. The risk is that a server that receives processor-intensive requests can still be working hard on previous requests when it reaches the top of the queue again.
  • Hashing methods: The decision of which server will receive the request depends on a hash of data from the incoming packet. This data can include information such as IP address, port number, or domain name.

Caching- Content Delivery Networks (CDN)

A CDN is a global network of servers that are used to optimize and speed up access to and distribution of static web properties. Static properties are things such as Javascript, CSS, images, and other media files that don't change very often.

As much as 80% of a website might be made up of static content. For example, the videos available on a streaming service don’t change very often. Some of their videos become very popular and can get millions of hits every day. Offloading that static content to a CDN reduces the load on the original server, enhances the content on a global scale, and moves the data closer to customers making it easily accessible and highly available.


Microservices are essentially a bunch of different little applications that can all work together. Each microservice has its own purpose and responsibility. And several different teams can develop them independently of other microservices. Microservices don’t depend on each other to function, but they do need to be able to communicate with each other. 

Microservices are easy to scale because you only need to scale those that currently need it. They can be deployed independently without coordination with various development teams. Microservices work well for web applications, rapid development and deployment, and teams that are spread out across the globe.

Microservices efficiently scale transactions, large data sets, and help you to create fault isolation which keeps your systems highly available. In addition, because large disjointed features can be broken up into smaller services, the complexity of your codebase is reduced.


This is essentially splitting a large database into smaller, more manageable and scalable components. When a database gets bigger, there are more requests and transactions made on it. This slows down the response time on database queries. And it can be very costly to maintain a huge database.

A shard is an individual database partition. To spread the workload, these partitions can exist on and be spread across multiple distributed database servers. 

You might want to use sharding when your databases get too big because:

  • Smaller databases are easier to manage.
  • Smaller databases are faster and each individual shard can outperform a single large database.
  • Smaller databases scale more easily as new data shards can be created and distributed across multiple servers.
  • Sharding can reduce costs because you don’t need huge, expensive servers to host them.

Implementing scalable architecture patterns into your system should help your system be able to maintain its performance while increasing its input load. 

Scalable architecture patterns

Customize your own application architecture diagram.

Get started

About Lucidchart

Lucidchart, a cloud-based intelligent diagramming application, is a core component of Lucid Software's Visual Collaboration Suite. This intuitive, cloud-based solution empowers teams to collaborate in real-time to build flowcharts, mockups, UML diagrams, customer journey maps, and more. Lucidchart propels teams forward to build the future faster. Lucid is proud to serve top businesses around the world, including customers such as Google, GE, and NBC Universal, and 99% of the Fortune 500. Lucid partners with industry leaders, including Google, Atlassian, and Microsoft. Since its founding, Lucid has received numerous awards for its products, business, and workplace culture. For more information, visit

Bring your bright ideas to life.

Sign up free

or continue with

Sign in with GoogleSign inSign in with MicrosoftSign inSign in with SlackSign in

By registering, you agree to our Terms of Service and you acknowledge that you have read and understand our Privacy Policy.

Get started

  • Pricing
  • Individual
  • Team
  • Enterprise
  • Contact sales

© 2024 Lucid Software Inc.