Troubleshooting

Cluster API Troubleshooting

Verification of the PODS

When a problem occur with cluster API the first thing to do is to check if all the PODs involved are running.

Check is these 4 pods have the STATUS Running

If one of these pods don’t have a Running status, two things can be check in order to get information about the issue.

  1. Describe the pod that have error
  2. Get the logs of the pod that have error

Example :

Check the provider VCD

If issue occurs with the actions made on vCloud Director, during cluster creation, upgrade or scaling operation the problem can come from the Cluster API Provider.


The corresponding pod is capvcd-controller-manager inside the capvcd namespace.

Error can be found in the logs :

An option exists to display more logs regarding communication between the provider and vCloud Director.

For that apply this command :

kubectl set env -n capvcd-system deployment/capvcd-controller-manager GOVCD_LOG_ON_SCREEN=true -oyaml

These option could be verbose, so do not forget to disable it when your diagnose is ended, for that :

kubectl set env -n capvcd-system deployment/capvcd-controller-manager GOVCD_LOG_ON_SCREEN-

Check the Cluster API Objects

Cluster API use different types of objects to describe a Kubernetes cluster to manage.

The idea is to explore these objects step by step to find the object that have an error in his status or when we describe the object

Clusterapi-objects.png

Depending if the issue is about worker nodes, master nodes or the overall cluster, it is possible to choose the objects involved using the above diagram.

  • Get the objects to find the exact name and to check the status
  • Describe the object that have an issue

Repeat action 1 & 2 for all other objects till find the error.

Export logs script

VMware creates a script that export logs in a tar ball and some information about the cluster.

generate-k8s-log-bundle.sh

Node deployment troubleshooting

Check kubelet status

systemctl status kubelet

Journalctl

journalctl -xeu containerd

journalctl -xeu kubelet


Multiple files could be parsed to troubleshoot an issue during deployment :

Cloud-init

/var/log/cloud-init-output.log

/var/log/capvcd/customization/status.log

/var/log/capvcd/customization/error.log

containerd

/var/log/containers/*