Skip to content

Rebooting the worker nodes fail and make them NotReady #373

@GabyCT

Description

@GabyCT

While rebooting some worker nodes when machine config is applied, they appear to be on NotReady status. This has been seeing on OCP 4.17 and 4.18. A RedHat support ticket was opened to track this issue https://access.redhat.com/support/cases/#/case/04065543. A workaround was provided by the RedHat team in order to make the NotReady worker nodes to Ready. According to the ticket seems like it is a bug and according to the RedHat team this will be fixed in a future release.

Impact:
We can have worker nodes in NotReady status while rebooting when machine config is applied.

Solution:
According to https://access.redhat.com/support/cases/#/case/04065543, we need to do the following on the worker node (with NotReady status) :
$ sudo systemctl status crio kubelet
$ sudo systemctl stop crio kubelet
$ sudo rm /var/lib/kubelet/pki/*
$ sudo systemctl start crio kubelet

• Note: In case that starting the crio kubelet services hangs, then do the following inside the worker node
$ sudo nmcli connection delete ovs-if-br-ex
$ sudo systemctl restart ovs-configuration.service
On the provisioner node, we will need to approve the new CSR, by doing the following
$ oc get csr
$ oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty oc adm certificate approve
Once that the certificate is approved on the provisioner node go back to the worker node and try to start again the crio and kubelet services
$ sudo systemctl start crio kubelet

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions