OxDEAD Unicornz

Have you ever seen so many?

GCE and Self Hosted K8s 1.6 With Calico/Flannel NoRouteCreated

There is an annoying bug in Kubernetes 1.6 running on GCE with Calico/Flannel networking operating via CNI plugin interface. You may experience same issue if kubelet on your nodes runs with options

1
--network-plugin=cni --cloud-provider=gce

When a new node is added to the k8s cluster it’s recognized as ‘Ready’ but no pods except Calico/Flannel pod are being scheduled there.

It looks like Calico/Flannel creates all needed routes in GCE but fails to properly report this to the k8s master or whatnot.

Here are related bugs:

One Two Three

If you check status of the problem node you’ll see NetworkUnavailable condition set as True:

1
2
3
4
5
6
7
8
9
kubectl get node <node_name> -o yaml
...
status:
  conditions:
  ...
  - type: NetworkUnavailable
    reason: NoRouteCreated
    status: "True"
...

One possible workaround is to manually override problem node status, as suggested here

You may run the following script from k8s master node (it queries k8s API on localhost). Array of hostnames has to be updated with actual names of problem nodes of course. Review code carefully and make sure you understand what it does before applying to production!

1
2
3
4
5
6
7
8
9
#!/bin/bash

hostnames=(minion-0 minion-1 minion-2)

for i in "${hostnames[@]}"; do
    curl http://localhost:8080/api/v1/nodes/$i/status > a.json
    cat a.json | tr -d '\n' | sed 's/{[^}]\+NetworkUnavailable[^}]\+}/{"type": "NetworkUnavailable","status": "False","reason": "RouteCreated","message": "Manually set through k8s api"}/g' > b.json
    curl -X PUT http://localhost:8080/api/v1/nodes/$i/status -H "Content-Type: application/json" -d @b.json
done

it basically dumps node status into a file, overrides “NetworkUnavailable” condition and pushes the node status back. Quite a dirty hack have to admit but a working one.

Once the workaround is applied pods should start to be scheduled properly.