Multi-Cluster Communication – a Multi-Mesh Approach
Current solutions for cross-cluster networking all use VPNs and a single control plane, with all the failure and latency problems that result. At Tetrate we’ve designed a scheme for using one Istio mesh per cluster to do cross-cluster routing. This post presents an open-source tool, Coddiwomple, which automates generation of the Istio config needed to enact this scheme.
For the purposes of this post, I’ll define a “cluster” to be any set of workloads connected at Layer 3. This describes a Kubernetes cluster and its overlay network, it describes a bunch of EC2 instances in a VPC, and so on.
There is currently documentation for one way to enable inter-cluster communication on the Istio site. This approach uses one single Istio control plane across all of the clusters. In turn, that requires all of the clusters’ networks to be reachable from each other; either globally routable, VPN’d together, or to “cheat” by using gcloud’s VPCs which can all reach each other.
Cross-Cluster Routing, the Right Way
At Tetrate, we believe that there should be one exactly one mesh per cluster, i.e. per connected IP network. Equivalently, we believe there should be one Istio control plane per cluster. Further, we believe these control planes shouldn’t co-ordinate with each other. I.e. they shouldn’t sync config between themselves, rather each one should have all the config it needs to get a request just to the next hop, just like the way the Internet is designed to be fault-tolerant. Our reasons for this are failure blast radius, cache locality, and avoiding the hard distributed systems problems of trying to provide strong consistency of service information across the globe. I won’t go into the technical details here, but my colleague Zack Butcher will soon provide a deep-dive in an accompanying post.
Clearly, network connectivity between clusters is still needed, but not between the workloads themselves. Minimally, only a connection between Istio egress and ingress gateways is needed.
Cross-Cluster Routing with Istio
So how would we configure the Istio mesh to do this? Let’s recap what the main networking resources in Istio do;
- ServiceEntry – says which (host) names exist (i.e. are allowed to be used) within the mesh
- Gateway – open a port to the outside world (specifically, on the istio-ingressgatewayservice). Similar to an external ServiceEntry, Gateway says which names to expect, on a particular port, from outside the mesh.
- VirtualService – says how to route to those names. I.e. when a request addresses a hostname, look at the other headers and decide which workload to send it to. VirtualServices “bind” to gateways; their routing only applies to traffic crossing that Gateway. For traffic ingressing into the mesh, this a named Gateway object, for internal traffic, it’s the magic (and default) gateway “mesh”.
To set-up the behaviour we want, we will use those resources as follows. Taking the example of one service,
foo, which we want to publish from one cluster, and call from another:
- In the calling cluster, a ServiceEntry which makes the foo.global name exist in Istio, and gives the remote ingress as the next-hop address (an alternative to binding a VirtualService directly).
- In the publishing cluster, a Gateway which exposes the service to the internet.
- Also in the publishing cluster, a VirtualService, bound to our Gateway which understands that traffic on the gateway port, with host header foo.global, should talk to the local Kubernetes Service foo.
.global names are meant to be just that: global. They should work in every cluster, even the one where the service is running, so that other services can address with that name from anywhere. To make this work, we can use two more Istio resources:
- Another ServiceEntry, making foo.global exist in the local (publishing) cluster
- Another VirtualService, this time bound to the gateway “mesh”, that routes internaltraffic heading for that name to the local Kubernetes Service.
Remember also that these new
.global names we’re asking our services to use don’t actually exist; they’re not Kubernetes Services, so they’re not in DNS. This will cause the services to error when they try to resolve them, as they’ll always do first before trying to send a request (unaware that Istio will just ignore the IP address they use). So one last thing we need to do is to augment the clusters’ DNS servers to respond with some arbitrary address for
.global names. Implementations vary by environment, but you’ll see how this is done in GKE in the example below.
As you can see from the diagrams above, the configuration of Istio needed to achieve the setup I’ve advocated for is complicated. It would be tedious to write the necessary resources (YAML files) could be written out by hand. Thus we’ve written Coddiwomple, an open-source tool to auto-generate them, based on some simple descriptions of your clusters and services.
Coddiwomple (v.) Origin: English Slang Word. Definition: To travel in a purposeful manner towards a vague destination.
Using Coddiwomple itself should be fairly simple. There are comprehensive instructions on using Coddiwomple in its GitHub repo.
That guide assumes you have two or more Kubernetes clusters, with Istio installed and working. This infrastructure setup is reasonably involved; outside the scope of this post or the README. To run through a complete Coddiwomple example yourself, see this demo script, which uses a modified bookinfo which is split across two clusters as show below:
Try it yourself!
Hopefully this post has convinced you that shared global control planes and old-fashioned VPNs are a bad way to achieve multi-cluster networking. I’ve presented a new approach, with isolated failure domains and no global consistency problems. This follows the same resilient design as the internet, where each network only know about its next hop.