🐛 Operator operator-lifecycle-manager-packageserver is in Available=False state running in single-replica topology#3720
Conversation
|
/test sanity |
|
@jianzhangbjz: No presubmit jobs available for operator-framework/operator-lifecycle-manager@master DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/test verify |
|
@jianzhangbjz: No presubmit jobs available for operator-framework/operator-lifecycle-manager@master DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/test verify (pull_request) |
|
@jianzhangbjz: No presubmit jobs available for operator-framework/operator-lifecycle-manager@master DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Address this failure in this PR. Run make verify
./scripts/update_codegen.sh
Generating client code for 4 targets
Generating lister code for 4 targets
I1211 07:40:00.841698 4725 main.go:58] Completed successfully.
Generating informer code for 4 targets
I1211 07:40:02.735407 4806 main.go:58] Completed successfully.
Generating openapi code for 3 targets
I1211 07:40:06.508083 5035 openapi.go:759] [github.com/operator-framework/operator-lifecycle-manager/pkg/package-server/apis/operators/v1.CSVDescription] Annotations map[string]string: tag listType on type Map; only allowed on type Slice
I1211 07:40:06.546684 5035 openapi.go:759] [k8s.io/apimachinery/pkg/apis/meta/v1.Status] Details *k8s.io/apimachinery/pkg/apis/meta/v1.StatusDetails: tag listType on type Pointer; only allowed on type Slice
I1211 07:40:06.626928 5035 api_linter.go:43] Assembling file "/tmp/update_codegen.sh.api_violations.MxNXSw"
I1211 07:40:13.553859 5453 main.go:58] Completed successfully.
I1211 07:40:15.957573 5588 main.go:58] Completed successfully.
make diff
make[1]: Entering directory '/home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager'
git diff --exit-code
make[1]: Leaving directory '/home/runner/work/operator-lifecycle-manager/operator-lifecycle-manager'
# Generate mocks and silence the following warning:
# WARNING: Invoking counterfeiter multiple times from "go generate" is slow.
# Consider using counterfeiter:generate directives to speed things up.
# See https://github.com/maxbrunsfeld/counterfeiter#step-2b---add-counterfeitergenerate-directives for more information.
# Set the "COUNTERFEITER_NO_GENERATE_WARNING" environment variable to suppress this message.
# golang.org/x/tools/imports
Error: ../../../hack/overlays/goimports_vendorlesspath.go:6:6: VendorlessPath redeclared in this block
Error: ../../../vendor/golang.org/x/tools/imports/forward.go:75:6: other declaration of VendorlessPath
pkg/api/wrappers/deployment_install_client.go:1: running "go": exit status 1
# golang.org/x/tools/imports
Error: ../../../hack/overlays/goimports_vendorlesspath.go:6:6: VendorlessPath redeclared in this block
Error: ../../../vendor/golang.org/x/tools/imports/forward.go:75:6: other declaration of VendorlessPath
pkg/controller/bundle/bundle_unpacker.go:323: running "go": exit status 1
...
... |
62a3848 to
1a4a9b2
Compare
|
/hold waiting for the downstream test pass. |
b4d66b0 to
31dc300
Compare
31dc300 to
c4b4fc2
Compare
|
/unhold since the downstream test passed. |
9082dd9 to
53d0634
Compare
|
Hey @jianzhangbjz thanks for the PR! Would you consider re-titling this |
|
Sure. Updated it, thanks! |
| // With the fix: | ||
| // - Single-replica + rollout + Unschedulable = expected disruption | ||
| // - CSV stays in current phase | ||
| // - ClusterOperator maintains Available=True ✅ Contract satisfied |
There was a problem hiding this comment.
We shoul not use emojis in the code
Can you please fix that?
There was a problem hiding this comment.
Sure, I've updated it.
…se state running in single-replica topology
53d0634 to
582e174
Compare
camilamacedo86
left a comment
There was a problem hiding this comment.
I am OK with
/lgtm
@perdasilva @grokspawn @tmshort
I will need to have an approval here too
@jianzhangbjz thank you a lot for raise this one.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: camilamacedo86 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
| // CRITICAL: In single-replica deployments during rollout, Unschedulable is EXPECTED | ||
| // due to PodAntiAffinity preventing new pod from scheduling while old pod is terminating. | ||
| // This is especially common in single-node clusters or control plane scenarios. | ||
| // Per OpenShift contract: "A component must not report Available=False during normal upgrade." |
There was a problem hiding this comment.
I think statements 294-296 are sufficient here and we do not need this line.
| }) | ||
|
|
||
| t.Run("MUST NOT report Available=False during normal upgrade", func(t *testing.T) { | ||
| // OpenShift ClusterOperator Contract (MANDATORY): |
Closed it due to the internal discussion. |
During cluster upgrades, the operator-lifecycle-manager-packageserver ClusterOperator incorrectly reports Available=False for ~16 seconds, violating the OpenShift contract: A component must not report Available=False during the course of a normal upgrade.
PodAntiAffinity + Single-Node Cluster + Single-Replica Deployment
During rolling updates in single-node control plane environments:
This is especially problematic in OpenShift SNO (Single Node OpenShift) environments.
Description of the change:
Enhanced pod disruption detection in
pkg/controller/operators/olm/apiservices.go:Motivation for the change:
To address https://issues.redhat.com/browse/OCPBUGS-67210
Architectural changes:
Testing remarks:
Reviewer Checklist
/doc[FLAKE]are truly flaky and have an issueAssisted-by: Claude Code