-
Overview
-
Practical sheets
-
-
-
-
-
-
-
- Backup : Agent-Level B&R via NSS for IAAS offer
- Backup : Create VCOD Backup
- Backup : Netbackup Agent Installation for Linux
- Backup : Netbackup Agent Installation for Windows
- Backup : Overall Design for VCOD Offer
- Backup : User's Guide for VCOD Offer
- NSX-T : Configuring a Distributed Firewall [FR]
- NSX-T : Create a VPN Ipsec
- NSX-T : Creation of T1
- NSX-T : DNAT configuration
- NSX-T : How to configure a Gateway Firewall
- NSX-T : SNAT configuration
- NSX-T: Create and Configure a Geneve Overlay Segment [FR]
- NSX-T: How to configure an IPSEC solution
- vCenter : Clone a VM [FR]
- VCenter : Create a new VM
- VCenter : Create a snapshot of a VM
- VCenter : Reset cloudadmin password
- VCenter : Storage Vmotion on a VM
- VCenter : Upgrade Vmware tools on a VM
- Show all articles (5) Collapse Articles
-
-
Q & A
-
Services
- Backup
- Bare metal server
- Block Storage [FR]
- BVPN access
- Certifications [FR]
- Cross Connect [FR]
- Dedicated Cluster
- DRaaS with VCDA
- Dual Site [FR]
- HA Dual-Room
- Internet access
- Kubernetes [FR]
- Licenses
- LoadBalancer As A Service
- Network
- Network Storage
- Object storage
- QoS Appliance
- Security
- Support and Coaching
- Tools [FR]
- VCenter On Demand
- VM Replication [FR]
- Show all articles (8) Collapse Articles
Troubleshooting
Cluster API Troubleshooting
Verification of the PODS
When a problem occur with cluster API the first thing to do is to check if all the PODs involved are running.
Check is these 4 pods have the STATUS Running
# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
capi-kubeadm-bootstrap-system capi-kubeadm-bootstrap-controller-manager-7dc44947-hrmvc 1/1 Running 0 36m
capi-kubeadm-control-plane-system capi-kubeadm-control-plane-controller-manager-cb9d954f5-r8w54 1/1 Running 0 36m
capi-system capi-controller-manager-7594c7bc57-jr75r 1/1 Running 0 37m
capvcd-system capvcd-controller-manager-89758d745-kw4sm 1/1 Running 0 13s
If one of these pods don’t have a Running status, two things can be check in order to get information about the issue.
- Describe the pod that have error
- Get the logs of the pod that have error
Example :
# kubectl describe pod capi-controller-manager-7594c7bc57-jr75r -n capi-system
and
# kubectl logs capi-controller-manager-7594c7bc57-jr75r -n capi-system
Please note !
k9s tool can also be used to browse the cluster, display logs and make search on them. See KaaS – Troubleshooting page for information
Check the provider VCD
If issue occurs with the actions made on vCloud Director, during cluster creation, upgrade or scaling operation the problem can come from the Cluster API Provider.
The corresponding pod is capvcd-controller-manager inside the capvcd namespace.
Error can be found in the logs :
# kubectl logs capvcd-controller-manager-7594c7bc57-jr75r -n capvcd-system
An option exists to display more logs regarding communication between the provider and vCloud Director.
For that apply this command :
kubectl set env -n capvcd-system deployment/capvcd-controller-manager GOVCD_LOG_ON_SCREEN=true -oyaml
These option could be verbose, so do not forget to disable it when your diagnose is ended, for that :
kubectl set env -n capvcd-system deployment/capvcd-controller-manager GOVCD_LOG_ON_SCREEN-
Check the Cluster API Objects
Cluster API use different types of objects to describe a Kubernetes cluster to manage.
The idea is to explore these objects step by step to find the object that have an error in his status or when we describe the object
data:image/s3,"s3://crabby-images/8ee0e/8ee0effe5f5dd37161704620337c4717cfc1d439" alt="Clusterapi-objects.png"
Depending if the issue is about worker nodes, master nodes or the overall cluster, it is possible to choose the objects involved using the above diagram.
- Get the objects to find the exact name and to check the status
# kubectl get MachineDeployment -A
- Describe the object that have an issue
# kubectl describe MachineDeployment mycluster-workers-0 -n mynamespace
Repeat action 1 & 2 for all other objects till find the error.
Export logs script
VMware creates a script that export logs in a tar ball and some information about the cluster.
Please note !
Please, check the content of the script before launch it !!!
export KUBECONFIG=[PATH_TO_YOUR_KUBECONFIG]
curl -s https://raw.githubusercontent.com/vmware/cloud-provider-for-cloud-director/[YOUR_CAPVCD_VERSION]/scripts/generate-k8s-log-bundle.sh | bash -
Node deployment troubleshooting
Check kubelet status
systemctl status kubelet
Journalctl
journalctl -xeu containerd
journalctl -xeu kubelet
Multiple files could be parsed to troubleshoot an issue during deployment :
Cloud-init
/var/log/cloud-init-output.log
/var/log/capvcd/customization/status.log
/var/log/capvcd/customization/error.log
containerd
/var/log/containers/*