Background
In 12th Dec 2022, the certificates of all the k8s components expired, which cause the failure to perform CI tasks in the company software builds. However after updating the certificates following the wiki of redmine, I still cannot make the CI work. Thus restarting the master node (IP: 10.240.0.59).
Problem comes even worse, the kubelet service is not start up after reboot, all k8s services containers are not running.
Troubleshooting
I try to find the logs for errors, but is somehow problematic. Before restarting the master node, I try the following command to retrieve logs of the gitlab runner pod, but seems like it is not working because there are 2 containers within this pod, k8s needs us to specify particular container to read the logs.
Then I try to get the names of the containers within the pod with
However I cannot get any information regarding to the container names. Then I use
And then it list out "build helper", which is the name of the containers in the pod, I further run
But nothing is output, my guess is that the containers are not even running anything (since pod is in pending state), therefore no container logs are being outputted.
After that I decided restarting the whole master node (10.240.0.59), but sadly all kubelet service is not starting up, I again try to troubleshoot system log by
And found out some useful log...
So the bootstrap-kubelet.conf really is not existed, and that caused the inability of start the kubelet service.
Solutions
First update the certificates for ALL master nodes. (my case: 10.240.0.59 / 51 / 52). The command to check the certificate expiration is
This command list all k8s components' certificate status, and let you know whether they are expired or not. Renew the certificates with
Check the status with the first command again, the date should be updated +1 year. Then manually renew the certificate following the commands below
The apiserver-advertise-address is the node IP (in my case 10.240.0.59). The above script basically backup the current keys (in case any problems occur), and re-initialize the cert generation phase, backup all the files in "/etc/kubernetes" before re-initializing kubeconfig phase.
After reboot, copy the admin.conf generated in kubeconfig generation phase to the kube configuration home directory
Check the status of the kubelet service by
References