Skip to content

Comments

OCPCLOUD-3346: tls: use centralized TLS#1456

Open
RadekManak wants to merge 9 commits intoopenshift:mainfrom
RadekManak:feature-centralized-tsl-endpoints
Open

OCPCLOUD-3346: tls: use centralized TLS#1456
RadekManak wants to merge 9 commits intoopenshift:mainfrom
RadekManak:feature-centralized-tsl-endpoints

Conversation

@RadekManak
Copy link
Contributor

@RadekManak RadekManak commented Jan 26, 2026

  • Serve machine-api-operator metrics directly over HTTPS (:8443) using controller-runtime’s metrics server with delegated authn/authz (WithAuthenticationAndAuthorization), and remove the MAO kube-rbac-proxy sidecar.
  • Add TLS profile awareness for MAO metrics:
    • read APIServer/cluster TLS profile on startup,
    • configure min TLS/ciphers from that profile,
    • watch for TLS profile changes and trigger shutdown so the pod restarts with updated TLS settings.
  • Propagate the same TLS profile to controller kube-rbac-proxy sidecars (machine, machineset, mhc) by generating --tls-min-version and profile-derived --tls-cipher-suites args.
  • Update manifests accordingly:
    • deployment ports/volume mounts/env (METRICS_PORT=8443) for direct secure serving,
    • RBAC to watch config.openshift.io/apiservers.
  • Include supporting dependency/vendor updates and minor follow-ups:
    • dependency bumps (controller-runtime, openshift/api, openshift/client-go, etc.),
    • go-build.sh root-dir handling fix,
    • lint/import cleanup (pkg/webhooks/machine_webhook.go, context import updates).

Notes

  • Provider/controller metrics remain behind kube-rbac-proxy and continue using the existing namespace/metrics authorization model.
  • MAO direct /metrics auth uses delegated token/SAR checks; scraper access relies on existing cluster-monitoring prometheus-k8s cluster RBAC.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jan 26, 2026
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jan 26, 2026

@RadekManak: This pull request references OCPCLOUD-3346 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Summary

  • Replace the MAO metrics kube‑rbac‑proxy sidecar with direct HTTPS in the operator binary, using the serving cert mounted from the MAO secret.
  • Watch the APIServer TLS profile and trigger a controlled shutdown so MAO restarts and picks up TLS changes.
  • Propagate the APIServer TLS profile into machine-api-controllers kube‑rbac‑proxy args (cipher suites + min TLS), with unit coverage.

Details

  • Direct MAO metrics TLS
  • MAO now listens on :8443 and serves /metrics via ListenAndServeTLS using /etc/tls/private/tls.crt|tls.key.
  • The deployment drops the kube‑rbac‑proxy sidecar, mounts the serving cert into /etc/tls/private, and exposes port 8443.
  • RBAC is updated to allow reading apiservers for TLS profile fetch.
  • TLS profile reload
  • MAO fetches the APIServer TLS profile at startup and builds a tls.Config.
  • A config informer watches APIServer updates and triggers shutdown on profile changes.
  • Centralized proxy TLS for controllers
  • OperatorConfig now carries the TLS profile.
  • machine-api-controllers kube‑rbac‑proxy args are generated from the profile (--tls-cipher-suites, --tls-min-version),
  • Tests updated to include APIServer presence and TLS profile expectations; a focused test validates proxy TLS args.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@RadekManak
Copy link
Contributor Author

/assign @damdo

Copy link
Member

@damdo damdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good, a couple of minor points.

var tlsProfile *osconfigv1.TLSProfileSpec
apiServer, err := optr.osClient.ConfigV1().APIServers().Get(context.Background(), "cluster", metav1.GetOptions{})
if err != nil {
klog.Warningf("Failed to fetch APIServer, using default TLS profile: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we return here?
What would happen otherwise?

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 27, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from damdo. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Remove the kube-rbac-proxy sidecar, mount the serving cert, and
restart the operator on APIServer TLS profile changes.
Capture the APIServer TLS profile in operator config and use it to
configure kube-rbac-proxy TLS args, with unit coverage.
Add unit tests to verify TLS configuration handling in
newKubeProxyContainer, including tests for TLS 1.2 with cipher suites
and TLS 1.3 without cipher suites.
@RadekManak RadekManak force-pushed the feature-centralized-tsl-endpoints branch from 4e665c0 to 24eed11 Compare February 18, 2026 14:55
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Feb 18, 2026

@RadekManak: This pull request references OCPCLOUD-3346 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

  • Serve machine-api-operator metrics directly over HTTPS (:8443) using controller-runtime’s metrics server with delegated authn/authz (WithAuthenticationAndAuthorization), and remove the MAO kube-rbac-proxy sidecar.
  • Add TLS profile awareness for MAO metrics:
  • read APIServer/cluster TLS profile on startup,
  • configure min TLS/ciphers from that profile,
  • watch for TLS profile changes and trigger shutdown so the pod restarts with updated TLS settings.
  • Propagate the same TLS profile to controller kube-rbac-proxy sidecars (machine, machineset, mhc) by generating --tls-min-version and profile-derived --tls-cipher-suites args.
  • Update manifests accordingly:
  • deployment ports/volume mounts/env (METRICS_PORT=8443) for direct secure serving,
  • RBAC to watch config.openshift.io/apiservers.
  • Include supporting dependency/vendor updates and minor follow-ups:
  • dependency bumps (controller-runtime, openshift/api, openshift/client-go, etc.),
  • go-build.sh root-dir handling fix,
  • lint/import cleanup (pkg/webhooks/machine_webhook.go, context import updates).

Notes

  • Provider/controller metrics remain behind kube-rbac-proxy and continue using the existing namespace/metrics authorization model.
  • MAO direct /metrics auth uses delegated token/SAR checks; scraper access relies on existing cluster-monitoring prometheus-k8s cluster RBAC.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@RadekManak RadekManak force-pushed the feature-centralized-tsl-endpoints branch from 24eed11 to 6b35a05 Compare February 18, 2026 15:25
Copy link
Member

@damdo damdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this Radek!
I left a bunch of comments but it mostly looks good! TY

return
}
klog.Fatal(server.ListenAndServe())

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are restarting anyway so it does not really matter, but I agree that is more correct if we got multiple updates before shutdown. This way, the second event won't log the shutdown again.

return metricsserver.NewServer(metricsserver.Options{
BindAddress: metricsAddr,
SecureServing: true,
FilterProvider: filters.WithAuthenticationAndAuthorization,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have the right RBAC defined for the MAO Deployment's service account to do this?

WithAuthenticationAndAuthorization provides a metrics.Filter for authentication and authorization. Metrics will be authenticated (via TokenReviews) and authorized (via SubjectAccessReviews) with the kube-apiserver. For the authentication and authorization the controller needs a ClusterRole with the following rules: * apiGroups: authentication.k8s.io, resources: tokenreviews, verbs: create * apiGroups: authorization.k8s.io, resources: subjectaccessreviews, verbs: create
To scrape metrics e.g. via Prometheus the client needs a ClusterRole with the following rule: * nonResourceURLs: "/metrics", verbs: get

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move TLS configuration computation from per-container to once per batch
in newKubeProxyContainers. This avoids redundant processing when creating
multiple kube-rbac-proxy containers with the same TLS profile.
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 23, 2026

@RadekManak: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-metal-ipi-ovn-ipv6 5d8f69d link true /test e2e-metal-ipi-ovn-ipv6

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants