Service Continuity
This section provides additional details regarding service continuity in Liqo. It reports the main architectural design choices and the options to better handle eventual losses of components of the multi-cluster (e.g., control plane, nodes, network, liqo pods, etc.).
For simplicity, we consider a simple consumer-provider setup, where the consumer/local cluster offloads an application to a provider/remote cluster. Since a single peering is unidirectional and between two clusters, all the following considerations can be extended to more complex setups involving bidirectional peerings and/or multiple clusters.
High-availability Liqo components
Liqo allows to deploy the most critical Liqo components in high availability. This is achieved by deploying multiple replicas of the same component in an active/passive fashion. This ensures that, even after eventual pod restarts or node failures, exactly one replica is always active while the remaining ones run on standby.
The supported components (pods) in high availability are:
liqo-controller-manager (active-passive): ensures the Liqo control plane logic is always enforced. The number of replicas is configurable through the Helm value
controllerManager.replicas
wireguard gateway server and client (active-passive): ensures no cross-cluster connectivity downtime. The number of replicas is configurable through the Helm value
networking.gatewayTemplates.replicas
webhook (active-passive): ensures the enforcement of Liqo resources is responsive, as at least one liqo webhook pod is always active and reachable from its Service. The number of replicas is configurable through the Helm value
webhook.replicas
virtual-kubelet (active-passive): improves VirtualNodes responsiveness when the leading virtual-kubelet has some failures or is restarted. The number of replicas is configurable through the Helm value
virtualKubelet.replicas
ipam (active-passive): ensures IPs and Networks management is always up and responsive. The number of replicas is configurable through the Helm value
ipam.internal.replicas
Resilience to worker nodes failures
This section describes scenarios where one or more worker nodes are unavailable/unhealthy, with all control planes ready and the cross-cluster network up and running.
Worker node failure on the local cluster
Pods running on the local cluster are scheduled on regular worker nodes and therefore their entire lifecycle is handled by Kubernetes as explained in the official guide.
Worker node failure on the remote cluster
Offloaded pods are scheduled on the virtual node in the local cluster and run on regular worker nodes in the remote cluster. As explained in the pod offloading section the ShadowPod abstraction guarantees remote pod resiliency (hence, service continuity) in case of unavailability of the local cluster, enforcing the presence of the desired pod (scheduled on a regular worker node) without requiring the intervention of the originating cluster.
If a remote worker node becomes NotReady the Kubernetes control plane marks all pods scheduled on that node for deletion, leaving them in a Terminating state indefinitely (until the node becomes ready again or a manual eviction is performed). Due to design choices in Liqo, a pod that is (1) offloaded, (2) Terminating, (3) running on a failed node is not replaced by a new one on a healthy worker node (like in vanilla Kubernetes). The consequence is that in case of remote worker node failure, the expected workload (i.e., the number of replicas actively running) of a deployment could be less than expected.
Since Liqo v0.7.0, it is possible to overcome this issue. You can configure Liqo to make sure the expected workload is always running on the remote cluster, setting the Helm value controllerManager.config.enableNodeFailureController=true
at install/upgrade time.
This flag enables a custom Liqo controller that checks for all offloaded and Terminating pods running on NotReady nodes.
A pod matching all conditions is force-deleted by the controller.
This way, the ShadowPod controller will enforce the presence of the remote pod by creating a new one on a healthy remote worker node, therefore ensuring the expected number of replicas is actively running on the remote cluster.
As explained in the pod reflection section, the local cluster has the feedback on what is happening on the remote cluster because the remote pod status is propagated to the local pod and the number of container restarts is augmented to account for possible deletions of the remote pod (e.g., the Liqo controller force-deletes the Terminating pod on the failed node).
Warning
Enabling the controller can have some minor drawbacks: when the pod is force-deleted, the resource is removed from the K8s API server. This means that in the (rare) case that the failed node becomes ready again and without an OS restart, the containers in the pod will not be gracefully deleted by the API server because the entry is not in the database anymore. The side effect is that zombie processes associated with the pod will remain in the node until the next OS restart or manual cleanup.