Rebooting the worker nodes fail and make them NotReady

While rebooting some worker nodes when machine config is applied, they appear to be on NotReady status. This has been seeing on OCP 4.17 and 4.18. A RedHat support ticket was opened to track this issue https://access.redhat.com/support/cases/#/case/04065543. A workaround was provided by the RedHat team in order to make the NotReady worker nodes to Ready. According to the ticket seems like it is a bug and according to the RedHat team this will be fixed in a future release.

**Impact:**
We can have worker nodes in NotReady status while rebooting when machine config is applied.

**Solution:**
According to https://access.redhat.com/support/cases/#/case/04065543, we need to do the following on the worker node (with NotReady status) :
$ sudo systemctl status crio kubelet
$ sudo systemctl stop crio kubelet
$ sudo rm /var/lib/kubelet/pki/*
$ sudo systemctl start crio kubelet

•	Note: In case that starting the crio kubelet services hangs, then do the following inside the worker node
$ sudo nmcli connection delete ovs-if-br-ex
$ sudo systemctl restart ovs-configuration.service
On the provisioner node, we will need to approve the new CSR, by doing the following
$ oc get csr
$ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty oc adm certificate approve
Once that the certificate is approved on the provisioner node go back to the worker node and try to start again the crio and kubelet services
$ sudo systemctl start crio kubelet



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rebooting the worker nodes fail and make them NotReady #373

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Rebooting the worker nodes fail and make them NotReady #373

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions