BGP Control Plane Health Check

In BGP mode without leader election, every kube-vip instance on every control-plane node announces the same VIP. Typically, an upstream BGP router uses ECMP (Equal-Cost Multi-Path) to distribute API traffic evenly across all announcing nodes.

If a node's kube-apiserver goes down while kubelet (and therefore the kube-vip static pod) remains running, kube-vip continues to advertise the BGP route. The upstream router keeps sending a share of API traffic to the dead apiserver, creating a black hole for that fraction of requests. This condition persists until either kube-vip is stopped or the kube-apiserver becomes healthy again.

kube-vip can poll a configurable HTTP(S) endpoint to gate the BGP route announcement. When the endpoint becomes unreachable or returns a non-200 status for a configurable number of consecutive checks, kube-vip withdraws the BGP route, removing the unhealthy node from the ECMP set. Once the endpoint recovers, the route is re-announced automatically.

The recommended endpoint is the Kubernetes API server's built-in /livez health check:

https://localhost:6443/livez

The health check is disabled by default. Set control_plane_health_check_address to enable it.

VariableUsageDefault
control_plane_health_check_addressURL to poll (e.g. https://localhost:6443/livez). Empty disables the health check.""
control_plane_health_check_period_secondsSeconds between health check requests5
control_plane_health_check_timeout_secondsTimeout in seconds for each HTTP request3
control_plane_health_check_failure_thresholdConsecutive failures before the BGP route is withdrawn3
control_plane_health_check_ca_pathPath to a PEM CA certificate for HTTPS verification. When empty, the system trust store is used. If your kube-apiserver cert uses a private CA, you'll need to point this to that CA's cert.""

The same options are available as CLI flags for manifest generation:

FlagUsageDefault
--controlPlaneHealthCheckAddressURL to poll""
--controlPlaneHealthCheckPeriodSecondsSeconds between checks5
--controlPlaneHealthCheckTimeoutSecondsPer-request timeout3
--controlPlaneHealthCheckFailureThresholdFailures before withdrawal3
--controlPlaneHealthCheckCAPathCA cert path for HTTPS""
  1. Startup -- kube-vip begins polling the health check address immediately. The BGP route is not announced until the first successful check. This prevents advertising a node whose apiserver hasn't started yet.
  2. Healthy -- each HTTP 200 response resets the consecutive failure counter. If the route was previously withdrawn, it is re-announced.
  3. Unhealthy -- any non-200 response, connection error, or timeout increments the failure counter. When the counter reaches control_plane_health_check_failure_threshold, the BGP route is withdrawn.
  4. Recovery -- the first successful check after a failure period resets the counter and re-announces the route.
  5. Shutdown -- on SIGTERM (e.g. when the static pod is stopped), kube-vip withdraws the BGP route before exiting, ensuring clean removal from the ECMP set. This means planned maintenance (draining a node, upgrading kubelet) also benefits from graceful route withdrawal.

Below is a kube-vip static pod manifest for BGP mode with the health check enabled.

When using HTTPS with a self-signed or private CA (typical for kubeadm clusters), you must set control_plane_health_check_ca_path to the path of the CA certificate (usually /etc/kubernetes/pki/ca.crt) and mount it into the pod. Without it, the health check will fail TLS verification and never announce the route.
 1apiVersion: v1
 2kind: Pod
 3metadata:
 4  name: kube-vip
 5  namespace: kube-system
 6spec:
 7  containers:
 8  - name: kube-vip
 9    image: ghcr.io/kube-vip/kube-vip:v1.1.2
10    args:
11    - manager
12    env:
13    - name: cp_enable
14      value: "true"
15    - name: vip_address
16      value: "192.168.1.100"
17    - name: vip_interface
18      value: "eth0"
19    - name: bgp_enable
20      value: "true"
21    - name: bgp_as
22      value: "65000"
23    - name: bgp_peers
24      value: "192.168.1.1:65001::false"
25    - name: control_plane_health_check_address
26      value: "https://localhost:6443/livez"
27    - name: control_plane_health_check_ca_path
28      value: "/etc/kubernetes/pki/ca.crt"
29    securityContext:
30      capabilities:
31        add:
32        - NET_ADMIN
33        - NET_RAW
34    volumeMounts:
35    - mountPath: /etc/kubernetes/pki/ca.crt
36      name: health-check-ca
37      readOnly: true
38  hostNetwork: true
39  volumes:
40  - hostPath:
41      path: /etc/kubernetes/pki/ca.crt
42      type: File
43    name: health-check-ca

This feature is designed for environments where kube-vip runs in BGP mode without leader election (every node announces the VIP).

The health check is not needed when:

  • Using leader election (ARP mode or BGP with vip_leaderelection=true), because only one node advertises the VIP at a time and the election mechanism handles failover.
  • Using the IPVS load balancer (lb_enable=true), which already has its own node health tracking.

The default values (period=5s, timeout=3s, threshold=3) mean a failing apiserver is removed from the ECMP set within approximately 15-18 seconds (3 failed checks at 5-second intervals, plus timeout).

For faster detection, reduce the period and threshold:

1- name: control_plane_health_check_period_seconds
2  value: "1"
3- name: control_plane_health_check_failure_threshold
4  value: "3"
5- name: control_plane_health_check_timeout_seconds
6  value: "1"

This reduces detection time to approximately 3-4 seconds, but increases load on the apiserver.

Be careful when reducing the control_plane_health_check_failure_threshold too much. Transient issues connecting to the API server or brief CPU spikes can cause the health check to fail momentarily. If the threshold is too low, these transient issues can create flakes where the BGP route is prematurely withdrawn and re-announced, leading to unstable routing.