-
Notifications
You must be signed in to change notification settings - Fork 94
Description
I'm using the self-hosted assisted installer service to install Single Node OKD.
The assisted installer service is running in podman containers, as documented here
This method of doing a single node install of OKD used to work. But, has started to fail recently (within the last 30 days or so).
The host registers with the installer service, but gets stuck on an NTP synchronization failure as seen in the attached screen-shot
Looking into the pod logs of the assisted installer service, I see this message;
level=error msg="Received step reply <ntp-synchronizer-392f0f02> from infra-env <ff4ce4b9-a3cd-4c50-b258-24cfbba8d1e3> host <68b15b04-5cb1-429f-9778-3c8727d0235d> exit-code <-1> stderr <chronyc exited with non-zero exit code 127: \nchronyc: error while loading shared libraries: libnettle.so.8: cannot open shared object file: No such file or directory\n> stdout <>" func=github.com/openshift/assisted-service/internal/bminventory.logReplyReceived file="/go/src/github.com/openshift/origin/internal/bminventory/inventory.go:2992" go-id=9762 host_id=68b15b04-5cb1-429f-9778-3c8727d0235d infra_env_id=ff4ce4b9-a3cd-4c50-b258-24cfbba8d1e3 pkg=Inventory request_id=6a4edac8-f290-4cb2-813e-f6a67ef9c50b
The relevant part of the message being - chronyc: error while loading shared libraries: libnettle.so.8: cannot open shared object file: No such file or directory
I believe the root cause for this is due to the changes introduced by this commit
The code change introduced by that commit mounts the chronyc command binary of the underlying OS (on which the assisted-installer-agent container runs on) into the /usr/bin directory inside the container. In my particular instance that host OS is Fedora CoreOS 35.20220327.3.0. The problem, in this case, is that the chronyc command is a dynamically linked ELF that depends on the libnettle.so.8 shared library... which isn't present in the container. The container does contain libnettle.so.6 tho.
Anyway, IMO this [bind-mounting the chronyc command from the underlying OS] is a containers anti-pattern.
Wouldn't it be a better approach to use the chronyc installed by the dnf install chrony
in the docker file here, used to build the assisted installer agent container image.
@tsorya could you have a look at the change introduced in that commit. This introduces a significant pre-req of same shared library (that which the chronyc binary is dynamically linked) also be present on the assisted installer agent container image. Is there a different approach?