Skip to content

Conversation

jlebon
Copy link
Member

@jlebon jlebon commented Aug 19, 2025

Description

RHCOS 9.6/10.1 ISOs will soon have that volume ID there. The change was done thinking it was low risk since it's mostly cosmetic but it turns out we check for this here.

We could revert it, but... hopefully this patch is enough to fix things.

How was this code tested?

make

Assignees

/cc @carbonin

Links

Checklist

  • Title and description added to both, commit and PR
  • Relevant issues have been associated
  • Reviewers have been listed
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit tests (note that code changes require unit tests)

RHCOS 9.6/10.1 ISOs will soon have that volume ID there. The change was
done thinking it was low risk since it's mostly cosmetic but it turns
out we check for this here.

We could revert it, but... hopefully this patch is enough to fix things.
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Aug 19, 2025
@openshift-ci openshift-ci bot requested a review from carbonin August 19, 2025 14:08
@openshift-ci-robot
Copy link

@jlebon: This pull request explicitly references no jira issue.

In response to this:

Description

RHCOS 9.6/10.1 ISOs will soon have that volume ID there. The change was done thinking it was low risk since it's mostly cosmetic but it turns out we check for this here.

We could revert it, but... hopefully this patch is enough to fix things.

How was this code tested?

make

Assignees

/cc @carbonin

Links

Checklist

  • Title and description added to both, commit and PR
  • Relevant issues have been associated
  • Reviewers have been listed
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit tests (note that code changes require unit tests)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Aug 19, 2025
Copy link

codecov bot commented Aug 19, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.48%. Comparing base (81f7727) to head (e73ad3e).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #477      +/-   ##
==========================================
+ Coverage   59.32%   59.48%   +0.15%     
==========================================
  Files          27       27              
  Lines        1571     1577       +6     
==========================================
+ Hits          932      938       +6     
  Misses        478      478              
  Partials      161      161              
Files with missing lines Coverage Δ
pkg/imagestore/imagestore.go 73.10% <100.00%> (+0.69%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@carbonin
Copy link
Member

If we have a published image with this prefix available can you add it to the list in integration_test/images_test.go?

@carbonin
Copy link
Member

Also if this is something we specifically shouldn't be testing for as a way to identify these ISOs, maybe we should just remove it or switch to checking something more reliable?

@carbonin
Copy link
Member

FWIW the original bug was https://issues.redhat.com/browse/MGMT-12740

@jlebon
Copy link
Member Author

jlebon commented Aug 19, 2025

If we have a published image with this prefix available can you add it to the list in integration_test/images_test.go?

Ahhh nice. We don't have images on the mirrors yet with this new prefix. But it should get there in the next bootimage bump.

Also if this is something we specifically shouldn't be testing for as a way to identify these ISOs, maybe we should just remove it or switch to checking something more reliable?

Honestly it seems fine to me. Another way I guess is to check for the coreos.liveiso= karg using e.g. coreos-installer iso kargs show, but the code here doesn't seem to be using coreos-installer currently.

@carbonin
Copy link
Member

But it should get there in the next bootimage bump

What does this mean? Is this a "real" release or some kind of prerelease?
I'm just trying to sort out how quickly we need this out there for existing users.

@jlebon
Copy link
Member Author

jlebon commented Aug 19, 2025

But it should get there in the next bootimage bump

What does this mean? Is this a "real" release or some kind of prerelease? I'm just trying to sort out how quickly we need this out there for existing users.

It will eventually land in both 4.20 and 4.19 as we bump their respective bootimage references in rhcos.json in the installer.

@carbonin
Copy link
Member

Okay so it sounds like we don't want to wait on this fix then until we can get an image to use in integration tests.

If we want this backported (which it sounds like we really do) we'll also need a bug. I pinged the slack thread to see if the original reporter can create one.

@carbonin carbonin changed the title NO-ISSUE: Allow rhel-coreos- as a valid volume ID prefix MGMT-21574: Allow rhel-coreos- as a valid volume ID prefix Aug 20, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Aug 20, 2025

@jlebon: This pull request references MGMT-21574 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.20.0" version, but no target version was set.

In response to this:

Description

RHCOS 9.6/10.1 ISOs will soon have that volume ID there. The change was done thinking it was low risk since it's mostly cosmetic but it turns out we check for this here.

We could revert it, but... hopefully this patch is enough to fix things.

How was this code tested?

make

Assignees

/cc @carbonin

Links

Checklist

  • Title and description added to both, commit and PR
  • Relevant issues have been associated
  • Reviewers have been listed
  • This change does not require a documentation update (docstring, docs, README, etc)
  • Does this change include unit tests (note that code changes require unit tests)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 20, 2025
@openshift-ci-robot
Copy link

/test remaining-required

Scheduling tests matching the pipeline_run_if_changed parameter:
/test edge-e2e-ai-operator-ztp
/test edge-e2e-metal-assisted-4-20

Copy link

openshift-ci bot commented Aug 20, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: carbonin, jlebon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 20, 2025
@jlebon
Copy link
Member Author

jlebon commented Aug 20, 2025

/hold

We're going to revert the CoreOS change that caused this issue and tackle it differently. The volume ID will be back to rhcos. So this patch is no longer necessary.

Should it be merged anyway? You're free to do so but don't have to. It's unlikely right now we'll change it back again to rhel-coreos, but it's probably not a bad idea to be resilient to it. (A nice thing about rhel-coreos is that it's consistent with stream-coreos and fedora-coreos, but if we were to change it, it would be part of a larger change to all our artifact names, not just the ISO volume ID. And that seems unlikely to be worth it.)

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 20, 2025
@carbonin
Copy link
Member

Should it be merged anyway?

I think I'd rather look into a more stable way of ensuring the ISO is something we can work with.
Maybe we check for what we actually use (ignition info file or ignition file where we expect it to be)?
Or maybe there's some other detail we can look for?

Copy link

openshift-ci bot commented Aug 20, 2025

@jlebon: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

jlebon added a commit to jlebon/osbuild that referenced this pull request Aug 20, 2025
In osbuild#2148, we changed the logic to
generate the volume ID from the data in `/usr/lib/os-release` to sever
the reliance on metadata in the embedded treefile that will no longer
exist.

This had no effect in FCOS, but had an effect in RHCOS, which
went from a volume ID of `rhcos-...` to `rhel-coreos-...`.
This was considered harmless at the time, but in fact ended
up affecting the OpenShift Assisted Image Service. See e.g.
openshift/assisted-image-service#477 which
attempted to adapt that code. But in the end, it felt safer and less
work to just revert back to the previous volume ID. So here we are.

But we still don't want to go back to using the embedded treefile.
Instead, we now have access to the OS name to use as a label on the
container image. This label gets serialized into the aleph during the
creation of the metal image (via the `org.osbuild.ostree.aleph` stage)
which we have access here when mounting the metal image via loopback.

So pick it up from there and use that. But in case it's missing,
fallback to the previous logic rather than hard fail to make this easier
to ratchet in.
jlebon added a commit to jlebon/installer that referenced this pull request Aug 20, 2025
This reverts commit 702c738.

The live ISO in that bootimage set contains a change to the volume ID
which would break the Assisted Image Service:
openshift/assisted-image-service#477

We're working on pushing out new ISOs with the reverted volume ID in:
coreos/coreos-assembler#4276

But for now revert this to ensure it doesn't slip into the next z-stream
release.
jlebon added a commit to jlebon/osbuild that referenced this pull request Aug 21, 2025
In osbuild#2148, we changed the logic to
generate the volume ID from the data in `/usr/lib/os-release` to sever
the reliance on metadata in the embedded treefile that will no longer
exist.

This had no effect in FCOS, but had an effect in RHCOS, which
went from a volume ID of `rhcos-...` to `rhel-coreos-...`.
This was considered harmless at the time, but in fact ended
up affecting the OpenShift Assisted Image Service. See e.g.
openshift/assisted-image-service#477 which
attempted to adapt that code. But in the end, it felt safer and less
work to just revert back to the previous volume ID. So here we are.

But we still don't want to go back to using the embedded treefile.
Instead, we now have access to the OS name to use as a label on the
container image. This label gets serialized into the aleph during the
creation of the metal image (via the `org.osbuild.ostree.aleph` stage)
which we have access here when mounting the metal image via loopback.

So pick it up from there and use that. But in case it's missing,
fallback to the previous logic rather than hard fail to make this easier
to ratchet in.
bcl pushed a commit to jlebon/osbuild that referenced this pull request Aug 21, 2025
In osbuild#2148, we changed the logic to
generate the volume ID from the data in `/usr/lib/os-release` to sever
the reliance on metadata in the embedded treefile that will no longer
exist.

This had no effect in FCOS, but had an effect in RHCOS, which
went from a volume ID of `rhcos-...` to `rhel-coreos-...`.
This was considered harmless at the time, but in fact ended
up affecting the OpenShift Assisted Image Service. See e.g.
openshift/assisted-image-service#477 which
attempted to adapt that code. But in the end, it felt safer and less
work to just revert back to the previous volume ID. So here we are.

But we still don't want to go back to using the embedded treefile.
Instead, we now have access to the OS name to use as a label on the
container image. This label gets serialized into the aleph during the
creation of the metal image (via the `org.osbuild.ostree.aleph` stage)
which we have access here when mounting the metal image via loopback.

So pick it up from there and use that. But in case it's missing,
fallback to the previous logic rather than hard fail to make this easier
to ratchet in.
supakeen pushed a commit to osbuild/osbuild that referenced this pull request Aug 21, 2025
In #2148, we changed the logic to
generate the volume ID from the data in `/usr/lib/os-release` to sever
the reliance on metadata in the embedded treefile that will no longer
exist.

This had no effect in FCOS, but had an effect in RHCOS, which
went from a volume ID of `rhcos-...` to `rhel-coreos-...`.
This was considered harmless at the time, but in fact ended
up affecting the OpenShift Assisted Image Service. See e.g.
openshift/assisted-image-service#477 which
attempted to adapt that code. But in the end, it felt safer and less
work to just revert back to the previous volume ID. So here we are.

But we still don't want to go back to using the embedded treefile.
Instead, we now have access to the OS name to use as a label on the
container image. This label gets serialized into the aleph during the
creation of the metal image (via the `org.osbuild.ostree.aleph` stage)
which we have access here when mounting the metal image via loopback.

So pick it up from there and use that. But in case it's missing,
fallback to the previous logic rather than hard fail to make this easier
to ratchet in.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants