-
Notifications
You must be signed in to change notification settings - Fork 13
Description
While rebooting some worker nodes when machine config is applied, they appear to be on NotReady status. This has been seeing on OCP 4.17 and 4.18. A RedHat support ticket was opened to track this issue https://access.redhat.com/support/cases/#/case/04065543. A workaround was provided by the RedHat team in order to make the NotReady worker nodes to Ready. According to the ticket seems like it is a bug and according to the RedHat team this will be fixed in a future release.
Impact:
We can have worker nodes in NotReady status while rebooting when machine config is applied.
Solution:
According to https://access.redhat.com/support/cases/#/case/04065543, we need to do the following on the worker node (with NotReady status) :
$ sudo systemctl status crio kubelet
$ sudo systemctl stop crio kubelet
$ sudo rm /var/lib/kubelet/pki/*
$ sudo systemctl start crio kubelet
• Note: In case that starting the crio kubelet services hangs, then do the following inside the worker node
$ sudo nmcli connection delete ovs-if-br-ex
$ sudo systemctl restart ovs-configuration.service
On the provisioner node, we will need to approve the new CSR, by doing the following
$ oc get csr
$ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty oc adm certificate approve
Once that the certificate is approved on the provisioner node go back to the worker node and try to start again the crio and kubelet services
$ sudo systemctl start crio kubelet