2023年1月4日 星期三

Renew kubernetes (k8s) certificates

Background

In 12th Dec 2022, the certificates of all the k8s components expired, which cause the failure to perform CI tasks in the company software builds. However after updating the certificates following the wiki of redmine, I still cannot make the CI work. Thus restarting the master node (IP: 10.240.0.59).

Problem comes even worse, the kubelet service is not start up after reboot, all k8s services containers are not running. 

Troubleshooting

I try to find the logs for errors, but is somehow problematic. Before restarting the master node, I try the following command to retrieve logs of the gitlab runner pod, but seems like it is not working because there are 2 containers within this pod, k8s needs us to specify particular container to read the logs.

kubectl logs -f -n gitlab runner-yngqxnkk-project-737-concurrent-0zwbtj

Then I try to get the names of the containers within the pod with

kubectl describe pod -n gitlab runner-yngqxnkk-project-737-concurrent-0zwbtj | more

However I cannot get any information regarding to the container names. Then I use

kubectl get pods -n gitlab runner-yngqxnkk-project-737-concurrent-0zwbtj -o jsonpath='{.spec.containers[*].name}'

And then it list out "build helper", which is the name of the containers in the pod, I further run

kubectl logs -f -n gitlab runner-yngqxnkk-project-737-concurrent-0zwbtj -c helper

But nothing is output, my guess is that the containers are not even running anything (since pod is in pending state), therefore no container logs are being outputted. 

After that I decided restarting the whole master node (10.240.0.59), but sadly all kubelet service is not starting up, I again try to troubleshoot system log by

systemctl status kubelet -l

journalctl -xeu kubelet

And found out some useful log...

kubeconfig: stat /etc/kubernetes/bootstrap-kubelet.conf: no such file or directory

So the bootstrap-kubelet.conf really is not existed, and that caused the inability of start the kubelet service.

Solutions

First update the certificates for ALL master nodes. (my case: 10.240.0.59 / 51 / 52). The command to check the certificate expiration is 

sudo kubeadm alpha certs check-expiration

This command list all k8s components' certificate status, and let you know whether they are expired or not. Renew the certificates with 

sudo kubeadm alpha certs renew all

Check the status with the first command again, the date should be updated +1 year. Then manually renew the certificate following the commands below

$ cd /etc/kubernetes/pki/

$ mv {apiserver.crt,apiserver-etcd-client.key,apiserver-kubelet-client.crt,front-proxy-ca.crt,front-proxy-client.crt,front-proxy-client.key,front-proxy-ca.key,apiserver-kubelet-client.key,apiserver.key,apiserver-etcd-client.crt} ~/

$ kubeadm init phase certs all --apiserver-advertise-address <IP>

$ cd /etc/kubernetes/

$ mv {admin.conf,controller-manager.conf,kubelet.conf,scheduler.conf} ~/

$ kubeadm init phase kubeconfig all

$ reboot

The apiserver-advertise-address is the node IP (in my case 10.240.0.59). The above script basically backup the current keys (in case any problems occur), and re-initialize the cert generation phase, backup all the files in "/etc/kubernetes" before re-initializing kubeconfig phase.

After reboot, copy the admin.conf generated in kubeconfig generation phase to the kube configuration home directory

cp -i /etc/kubernetes/admin.conf $HOME/.kube/config

Check the status of the kubelet service by

sudo systemctl status kubelet.service

References