Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4.17.0-okd-scos.2 UPI: Stuck in final stages #29

Open
deas opened this issue Jan 21, 2025 · 3 comments
Open

4.17.0-okd-scos.2 UPI: Stuck in final stages #29

deas opened this issue Jan 21, 2025 · 3 comments

Comments

@deas
Copy link

deas commented Jan 21, 2025

UPI installing 4.17.0-okd-scos.2 on VCenter, we get stuck in the final stages:

NAME                                       VERSION             AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.17.0-okd-scos.2   False       False         True       127m    APIServicesAvailable: PreconditionNotReady...
baremetal                                  4.17.0-okd-scos.2   True        False         False      126m    
cloud-controller-manager                   4.17.0-okd-scos.2   True        False         False      129m    
cloud-credential                                               True        False         False      134m    
cluster-autoscaler                         4.17.0-okd-scos.2   True        False         False      126m    
config-operator                            4.17.0-okd-scos.2   True        False         False      127m    
console                                                                                                     
control-plane-machine-set                  4.17.0-okd-scos.2   True        False         False      126m    
csi-snapshot-controller                    4.17.0-okd-scos.2   True        False         False      115m    
dns                                        4.17.0-okd-scos.2   True        False         False      115m    
etcd                                       4.17.0-okd-scos.2   True        False         False      118m    
image-registry                                                                                              
ingress                                                        False       True          True       127m    The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.)
insights                                   4.17.0-okd-scos.2   True        False         False      121m    
kube-apiserver                             4.17.0-okd-scos.2   True        False         False      108m    
kube-controller-manager                    4.17.0-okd-scos.2   True        False         True       117m    GarbageCollectorDegraded: error fetching rules: Get "https://thanos-querier.openshift-monitoring.svc:9091/api/v1/rules": dial tcp 10.42.118.129:9091: connect: connection refused
kube-scheduler                             4.17.0-okd-scos.2   True        False         False      118m    
kube-storage-version-migrator              4.17.0-okd-scos.2   True        False         False      115m    
machine-api                                4.17.0-okd-scos.2   True        False         False      126m    
machine-approver                           4.17.0-okd-scos.2   True        False         False      127m    
machine-config                             4.17.0-okd-scos.2   True        False         False      126m    
marketplace                                4.17.0-okd-scos.2   True        False         False      126m    
monitoring                                                     False       True          True       99m     UpdatingAlertmanager: reconciling Alertmanager Route failed: creating Route object failed: the server could not find the requested resource (post routes.route.openshift.io), UpdatingThanosQuerier: reconciling Thanos Querier Route failed: creating Route object failed: the server could not find the requested resource (post routes.route.openshift.io), UpdatingConsolePluginComponents: reconciling Console Plugin failed: waiting for ConsolePlugin failed: context deadline exceeded: creating ConsolePlugin object failed: the server could not find the requested resource (post consoleplugins.console.openshift.io), UpdatingPrometheus: reconciling Prometheus API Route failed: creating Route object failed: the server could not find the requested resource (post routes.route.openshift.io), UpdatingPrometheus: Prometheus "openshift-monitoring/k8s": failed to get: prometheuses.monitoring.coreos.com "k8s" not found
network                                    4.17.0-okd-scos.2   True        False         False      127m    
node-tuning                                4.17.0-okd-scos.2   True        False         False      110m    
openshift-apiserver                        4.17.0-okd-scos.2   False       False         False      127m    APIServicesAvailable: PreconditionNotReady
openshift-controller-manager               4.17.0-okd-scos.2   True        False         False      115m    
openshift-samples                                                                                           
operator-lifecycle-manager                 4.17.0-okd-scos.2   True        False         False      126m    
operator-lifecycle-manager-catalog         4.17.0-okd-scos.2   True        False         False      126m    
operator-lifecycle-manager-packageserver   4.17.0-okd-scos.2   True        False         False      54m     
service-ca                                 4.17.0-okd-scos.2   True        False         False      127m    
storage                                    4.17.0-okd-scos.2   True        False         False      127m 

Not quite sure about order and failure cascading.

However:

ingress and monitoring fail because the Route CRD does not exist (yet?)

openshift-apiserver appears to depend on authentication which fails because the service endpoint has no backing deployment/pods

    Message:               IngressStateEndpointsDegraded: No subsets found for the endpoints of oauth-server
OAuthServerServiceEndpointAccessibleControllerDegraded: Get "https://10.42.249.232:443/healthz": dial tcp 10.42.249.232:443: connect: connection refused
OAuthServerServiceEndpointsEndpointAccessibleControllerDegraded: oauth service endpoints are not ready
    Reason:                IngressStateEndpoints_MissingSubsets::OAuthServerServiceEndpointAccessibleController_SyncError::OAuthServerServiceEndpointsEndpointAccessibleController_SyncError
❯ kubectl -n openshift-authentication get svc oauth-openshift
NAME              TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
oauth-openshift   ClusterIP   10.42.249.232   <none>        443/TCP   133m

❯ kubectl -n openshift-authentication get ep oauth-openshift
NAME              ENDPOINTS   AGE
oauth-openshift   <none>      133m

❯ kubectl -n openshift-authentication get deployment
No resources found in openshift-authentication namespace.

Any ideas how to push things forward appreciated.

@nate-duke
Copy link

Check on the ingress pods in the openshift-ingress namespace.

@deas
Copy link
Author

deas commented Jan 21, 2025

@nate-duke The startup probe fails - hence, they crashloop. Pretty sure this is due to the missing Route CRD. The log goes:

I0121 17:25:13.107450       1 template.go:560] "msg"="starting router" "logger"="router" "version"="majorFromGit: \nminorFromGit: \ncommitFromGit: a33f2b6a\nversionFromGit: v0.0.0-unknown\ngitTreeState: dirty\nbuildDate: 2024-08-08T23:29:16Z\n"
I0121 17:25:13.108931       1 metrics.go:156] "msg"="router health and metrics port listening on HTTP and HTTPS" "address"="0.0.0.0:1936" "logger"="metrics"
I0121 17:25:13.111514       1 router.go:217] "msg"="creating a new template router" "logger"="template" "writeDir"="/var/lib/haproxy"
I0121 17:25:13.111565       1 router.go:302] "msg"="router will coalesce reloads within an interval of each other" "interval"="5s" "logger"="template"
I0121 17:25:13.111859       1 router.go:372] "msg"="watching for changes" "logger"="template" "path"="/etc/pki/tls/private"
I0121 17:25:13.111897       1 router.go:283] "msg"="router is including routes in all namespaces" "logger"="router"
W0121 17:25:13.116568       1 reflector.go:547] github.com/openshift/router/pkg/router/controller/factory/factory.go:124: failed to list *v1.Route: the server could not find the requested resource (get routes.route.openshift.io)
E0121 17:25:13.116599       1 reflector.go:150] github.com/openshift/router/pkg/router/controller/factory/factory.go:124: Failed to watch *v1.Route: failed to list *v1.Route: the server could not find the requested resource (get routes.route.openshift.io)
I0121 17:25:13.122312       1 reflector.go:359] Caches populated for *v1.Service from github.com/openshift/router/pkg/router/template/service_lookup.go:33
I0121 17:25:13.124527       1 reflector.go:359] Caches populated for *v1.EndpointSlice from github.com/openshift/router/pkg/router/controller/factory/factory.go:124
I0121 17:25:14.170359       1 healthz.go:255] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
...
W0121 17:27:06.185478       1 reflector.go:547] github.com/openshift/router/pkg/router/controller/factory/factory.go:124: failed to list *v1.Route: the server 
could not find the requested resource (get routes.route.openshift.io)
E0121 17:27:06.185574       1 reflector.go:150] github.com/openshift/router/pkg/router/controller/factory/factory.go:124: Failed to watch *v1.Route: failed to 
list *v1.Route: the server could not find the requested resource (get routes.route.openshift.io)
I0121 17:27:07.169910       1 healthz.go:255] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
I0121 17:27:08.170048       1 healthz.go:255] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
I0121 17:27:09.169788       1 healthz.go:255] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
I0121 17:27:10.169833       1 healthz.go:255] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
I0121 17:27:11.169032       1 healthz.go:255] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
I0121 17:27:12.169456       1 healthz.go:255] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
I0121 17:27:13.170126       1 healthz.go:255] backend-http,has-synced check failed: healthz
[-]backend-http failed: backend reported failure
[-]has-synced failed: Router not synced
E0121 17:27:13.177189       1 factory.go:130] failed to sync cache for *v1.Route shared informer
I0121 17:27:13.178203       1 template.go:844] "msg"="Shutdown requested, waiting 45s for new connections to cease" "logger"="router"
E0121 17:27:13.179582       1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: no such file or directory
I0121 17:27:13.204322       1 router.go:669] "msg"="router reloaded" "logger"="template" "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 
retry attempt(s).\n"
I0121 17:27:58.181186       1 template.go:846] "msg"="Instructing the template router to terminate" "logger"="router"
I0121 17:27:58.189618       1 router.go:669] "msg"="router reloaded" "logger"="template" "output"=" - Shutting down\n"
I0121 17:27:58.189648       1 template.go:850] "msg"="Shutdown complete, exiting" "logger"="router"

@deas
Copy link
Author

deas commented Jan 22, 2025

Digging a bit deeper, it all appears to be related to missing CRDs.

The authentication operator does not create the backing deployment because it misses OAuthClient (and likely more). It does not crashloop though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants