As noted, application maturity and scale tend to drive distributed multi-cluster adoption. The question is: why? What are the advantages and benefits of a distributed topology over centralized clusters? As importantly, does the breadth of distribution matter? Are organizations that adopt a multi-region, multi-provider approach gaining advantage over those that don’t? Let’s dive in.
Every year there are reports of wide swaths of the internet ‘going dark’ and taking down well-known brands and applications. Invariably, those issues are traced back to problems within a particular provider network. Distributed multi-cluster deployments help mitigate those risks.
Availability and Resiliency
By mirroring workloads across clusters, you increase availability and resiliency through elimination of single points of failure. At its most basic level, this means using one cluster as a backup or failover should another cluster fail. As you increase cluster distribution outside of a single data center/cloud instance to stretch across clouds and providers, you further minimize risk – from failure of single endpoints within a provider network, or even failure of an entire provider network – by ensuring your application can fail over to other endpoints or providers.
Avoiding reliance on a single vendor is an operational mantra for many organizations, making the ability to distribute workloads not only across locations, but across providers a key advantage. This multi-vendor approach improves pricing flexibility, as well as ensuring better continuity and quality of service. Similarly, this can help mitigate data lock-in, whereby it can become prohibitively expensive to migrate data off a provider’s network.
In fact, a distributed multi-cluster, multi-vendor approach even helps mitigate managed Kubernetes lock-in, ensuring you are not committed to a particular provider’s version of Kubernetes and any proprietary extensions supported by a specific cloud provider’s managed Kubernetes service.
Performance and Latency
According to a recent survey of IT decision makers by Lumen, 86% of organizations identify application latency as a key differentiator. And the single best way to reduce latency is to reduce geographic distance by physically placing applications closer to the user base.
Distributed multi-cluster Kubernetes facilitates this strategy, allowing organizations to use an edge topology to process data and application interactions closer to the source. This becomes especially important – in fact, arguably a necessity – for applications that have a global user base, where multi-region edge deployments can geographically distribute workloads to best reduce latency while efficiently managing resources (e.g. spinning up/down infrastructure to adapt to “follow the sun” or other regional and ad hoc traffic patterns).
Organizations that elect a centralized approach are, by definition, treating users outside of the primary geography as second-class citizens when it comes to application performance. In fact, this is the primary consideration when organizations choose a particular cloud instance for deployment (e.g. AWS EC2 US-East): where are most of my users based, so those within or close to that region will enjoy a premium experience? The unspoken corollary is that as customers get geographically further from that particular region, application performance, responsiveness, resilience and availability will naturally degrade. In short, this is the default “good enough” cloud deployment for application workloads that cater largely to a home-grown user base. As these applications mature and broaden adoption, this strategy becomes increasingly tenuous, and organizations find themselves compelled to move workloads to the edge.
A closely related factor: running multiple distributed clusters also improves an organization’s ability to fine tune and scale workloads as needed, providing greater flexibility than centralized clusters. This scaling is required when an application can no longer handle additional requests effectively, either due to steadily growing volume or episodic spikes.
It’s important to note that scaling can happen horizontally (scaling out), by adding more machines to the pool of resources, or vertically (scaling up), by adding more power in the form of CPU, RAM, storage, etc. to an existing machine. There are advantages to each, and there’s something to be said for combining both approaches.
A distributed multi-cluster topology facilitates this flexibility in scaling. Among other things, when run on different clusters/providers/regions, it becomes significantly easier to identify and understand which particular workloads require scaling (and whether best served by horizontal or vertical scaling), whether that workload scaling is provider- or region-dependent, and ensure adequate resource availability while minimizing load on backend services and databases.
Those familiar with Kubernetes will recognize that one of its strengths is its ability to perform effective autoscaling of resources in response to real time changes in demand. Kubernetes doesn’t support just one autoscaler or autoscaling approach, but three:
- Pod replica count – This involves adding or removing pod replicas in direct response to changes in demand for applications using the Horizontal Pod Autoscaler (HPA).
- Cluster autoscaler – As opposed to scaling the number of running pods in a cluster, the cluster autoscaler is used to change the number of nodes in a cluster. This helps manage the costs of running Kubernetes clusters across cloud and edge infrastructure.
- Vertical pod autoscaling – the Vertical Pod Autoscaler (VPA) works by increasing or decreasing the CPU and memory resource requests of pod containers. The goal is to match cluster resource allotment to actual usage.
Worth noting for those running centralized workloads on single clusters: a multi-cluster approach might become a requirement for scaling if a company’s application threatens to hit the ceiling for Kubernetes nodes per cluster, currently limited to 5,000.
Another advantage to a distributed multi-cluster strategy is fine-grained control of costly compute resources. As mentioned above, this can be time-dependent, where global resources are spun down after hours. It can be provider-dependent, where workloads can take advantage of lower cost resources in other regions. Or it can be performance-dependent, whereby services that aren’t particularly sensitive to factors like performance or latency can take advantage of less performant, and less costly, compute resources.
A multi-cluster deployment physically and conceptually isolates workloads from each other, blocking communication and sharing of resources. There are several reasons this is desirable. First, it improves security, mitigating risk by limiting the ability of a security issue with one cluster to impact others. It can also have implications for compliance and data privacy (more on that below). It further ensures that the compute resources you intend for a specific cluster are, in fact, available to that cluster, and not consumed by a noisy neighbor.
But by far the most common rationale behind workload isolation is the desire to separate development, staging/testing and production environments. By separating clusters, any change to one cluster affects only that particular cluster. This improves issue identification and resolution, supports testing and debugging, and promotes experimentation and optimization. In short, you can ensure a change is working as expected on an isolated cluster before rolling it into production.
As noted, this isolation is an inherent benefit of multi-cluster vs single cluster deployments, and not particularly dependent on how widely distributed those clusters are – you could have isolated stage/test/production clusters in a centralized hosting environment, for example. That said, when you distribute multiple clusters, this isolation can have important implications.
Compliance is a critical consideration for many organizations. It can be closely related to isolation – workload isolation is required for compliance with standards such as PCI-DSS, for example – but it also has broader implications in regard to distributed or multi-region workloads.
In this instance, different countries and regions have unique rules and regulations governing data handling, and a distributed multi-cluster topology facilitates a region-by-region approach to compliance. A notable example is the European Union’s General Data Protection Regulation (GDPR), which governs data protection and privacy for any user that resides in the European Economic Area (EEA). Importantly, it applies to any company, regardless of location, that is processing and/or transferring the personal information of individuals residing in the EEA. For instance, GDPR may regulate how and where data is stored, something that is readily addressed through multiple clusters.
Similarly, other regulatory or governance requirements might stipulate that certain types of data or workloads must remain on-premises; a distributed multi-cluster topology allows an organization to meet this requirement while simultaneously deploying clusters elsewhere to address other business needs (to the edge for lower latency, for example).
Image source: DLA Piper Data Protection Laws of the World interactive tool