Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storageusers pod in CrashLoopBackOff mode after upgrade #855

Open
euh2 opened this issue Jan 30, 2025 · 6 comments
Open

storageusers pod in CrashLoopBackOff mode after upgrade #855

euh2 opened this issue Jan 30, 2025 · 6 comments

Comments

@euh2
Copy link

euh2 commented Jan 30, 2025

After upgrading from 5.0.3 to 7.0.0 the storageusers pod remains in CrashLoopBackOff mode. The logs complain about needing nats, but nats is running and healthy. Anyone any idea about that?

> k logs -f storageusers-846b49d6f9-kcrfh
2025-01-30T18:07:03Z INF no probe provided, reverting to default (OK) endpoint=/healthz line=github.com/owncloud/ocis/v2/ocis-pkg/service/debug/service.go:27 service=storage-users
2025-01-30T18:07:03Z INF registering external service com.owncloud.api.storage-users-2c687af4-9288-4b9c-bc90-1132b24db07f@10.233.230.223:9157 line=github.com/owncloud/ocis/v2/ocis-pkg/registry/register.go:19 service=storage-users
2025-01-30T18:07:03Z INF host info: storageusers-846b49d6f9-kcrfh line=github.com/cs3org/reva/[email protected]/cmd/revad/runtime/runtime.go:85 service=storage-users
2025-01-30T18:07:03Z INF running on 8 cpus line=github.com/cs3org/reva/[email protected]/cmd/revad/runtime/runtime.go:178 service=storage-users
2025-01-30T18:07:03Z INF pidfile saved at: /tmp/revad-storage-users-3d341290-da97-4089-ad67-6cf99530fc17.pid line=github.com/cs3org/reva/[email protected]/cmd/revad/internal/grace/grace.go:187 pkg=grace service=storage-users
2025-01-30T18:07:03Z WRN missing or incomplete nats configuration. Events will not be published. line=github.com/cs3org/reva/[email protected]/internal/http/services/dataprovider/dataprovider.go:84 pkg=rhttp service=storage-users
2025-01-30T18:07:03Z INF rgrpc: grpc service enabled: storageprovider line=github.com/cs3org/reva/[email protected]/pkg/rgrpc/rgrpc.go:228 pkg=rgrpc service=storage-users
2025-01-30T18:07:03Z INF rgrpc: chaining grpc unary interceptor prometheus with priority 100 line=github.com/cs3org/reva/[email protected]/pkg/rgrpc/rgrpc.go:343 pkg=rgrpc service=storage-users
2025-01-30T18:07:03Z INF rgrpc: chaining grpc unary interceptor eventsmiddleware with priority 200 line=github.com/cs3org/reva/[email protected]/pkg/rgrpc/rgrpc.go:343 pkg=rgrpc service=storage-users
2025-01-30T18:07:03Z INF grpc server listening at tcp:0.0.0.0:9157 line=github.com/cs3org/reva/[email protected]/pkg/rgrpc/rgrpc.go:192 pkg=rgrpc service=storage-users
2025-01-30T18:07:04Z ERR need event stream for async file processing line=github.com/cs3org/reva/[email protected]/pkg/storage/utils/decomposedfs/decomposedfs.go:256 pkg=rhttp service=storage-users
2025-01-30T18:07:04Z ERR error starting the http server error="http service dataprovider could not be started,: need nats for async file processing" line=github.com/cs3org/reva/[email protected]/cmd/revad/runtime/runtime.go:198 service=storage-users
2025-01-30T18:07:04Z INF pid file "/tmp/revad-storage-users-3d341290-da97-4089-ad67-6cf99530fc17.pid" got removed line=github.com/cs3org/reva/[email protected]/cmd/revad/internal/grace/grace.go:95 pkg=grace service=storage-users
@wkloucek
Copy link
Contributor

After upgrading from 5.0.3 to 7.0.0

How did you upgrade? Do you use a chart that has 7.0.0 in the Chart.yml -> appVersion? If not, the Chart you're using is probably with this version.

@euh2
Copy link
Author

euh2 commented Jan 31, 2025

After upgrading from 5.0.3 to 7.0.0

How did you upgrade? Do you use a chart that has 7.0.0 in the Chart.yml -> appVersion? If not, the Chart you're using is probably with this version.

Yes! I pulled the latest from this repository. So the Chart.yaml shows 7.0.0. And helm history ocis shows appversion 7.0.0.

@wkloucek
Copy link
Contributor

Are you using the builtin NATS or an external one?

@euh2
Copy link
Author

euh2 commented Jan 31, 2025

Are you using the builtin NATS or an external one?

I use NATS like one of the examples here in this repository. My helmfile.yaml may answer your question. It's deployed to it's own namespace, so kind of external. But used exclusively by OCIS.

@wkloucek
Copy link
Contributor

wkloucek commented Feb 3, 2025

I use NATS like one of the examples here in this repository. My helmfile.yaml may answer your question. It's deployed to it's own namespace, so kind of external. But used exclusively by OCIS.

The configuration actually looks fine.

Could you please execute following command to ensure that the relevant pods are on 7.0.0:

kubectl get pods --selector='!batch.kubernetes.io/job-name,app.kubernetes.io/instance=ocis' -o jsonpath="{.items[*].spec.containers[*].image}" |\
tr -s '[[:space:]]' '\n' |\
sort |\
uniq -c

Also it would be interesting if the output is similar (I have replicas set to 2, so we see the same output twice.):

    ~  kubectl get pods -l app=storageusers -o yaml | grep -B1 nats                                                                       ✔  garden-420505--de-instncs-0001-external ⎈ 
      - name: MICRO_REGISTRY
        value: nats-js-kv
      - name: MICRO_REGISTRY_ADDRESS
        value: nats.my-namespace.svc.cluster.local:4222
      - name: OCIS_EVENTS_ENDPOINT
        value: nats.my-namespace.svc.cluster.local:4222
      - name: OCIS_EVENTS_CLUSTER
        value: nats
--
      - name: OCIS_CACHE_STORE
        value: nats-js-kv
      - name: OCIS_CACHE_STORE_NODES
        value: nats.my-namespace.svc.cluster.local:4222
--
      - name: MICRO_REGISTRY
        value: nats-js-kv
      - name: MICRO_REGISTRY_ADDRESS
        value: nats.my-namespace.svc.cluster.local:4222
      - name: OCIS_EVENTS_ENDPOINT
        value: nats.my-namespace.svc.cluster.local:4222
      - name: OCIS_EVENTS_CLUSTER
        value: nats
--
      - name: OCIS_CACHE_STORE
        value: nats-js-kv
      - name: OCIS_CACHE_STORE_NODES
        value: nats.my-namespace.svc.cluster.local:4222

@euh2
Copy link
Author

euh2 commented Feb 3, 2025

Could you please execute following command to ensure that the relevant pods are on 7.0.0:

kubectl get pods --selector='!batch.kubernetes.io/job-name,app.kubernetes.io/instance=ocis' -o jsonpath="{.items[*].spec.containers[*].image}" |\
tr -s '[[:space:]]' '\n' |\
sort |\
uniq -c

Yes. It looks like all of them are updated.

30 owncloud/ocis:7.0.0

And the second output is similar to yours:

- name: MICRO_REGISTRY
        value: nats-js-kv
      - name: MICRO_REGISTRY_ADDRESS
        value: nats.ocis-nats.svc.cluster.local:4222
      - name: OCIS_EVENTS_ENDPOINT
        value: nats.ocis-nats.svc.cluster.local:4222
--
      - name: OCIS_CACHE_STORE
        value: nats-js-kv
      - name: OCIS_CACHE_STORE_NODES
        value: nats.ocis-nats.svc.cluster.local:4222

I also checked the connection to nats.ocis-nats.svc.cluster.local:4222 and received a respons, which is a good thing, I guess:

> kubectl run curlpod --image=curlimages/curl -ti -- sh
If you don't see a command prompt, try pressing enter.
~ $ curl -v nats.ocis-nats.svc.cluster.local:4222
* Host nats.ocis-nats.svc.cluster.local:4222 was resolved.
* IPv6: (none)
* IPv4: 10.96.123.122
*   Trying 10.96.123.122:4222...
* Connected to nats.ocis-nats.svc.cluster.local (10.96.123.122) port 4222
* using HTTP/1.x
> GET / HTTP/1.1
> Host: nats.ocis-nats.svc.cluster.local:4222
> User-Agent: curl/8.11.1
> Accept: */*
>
* Received HTTP/0.9 when not allowed
* closing connection #0
curl: (1) Received HTTP/0.9 when not allowed
~ $

I suppose NATS is a kv-store. But is it dependent on persistence? Can I delete the NATS PVCs and recreate NATS from scratch? Maybe the kv-store can become corrupted in some way after updating to 7.0.0. Although the logs look fine to me:

> kubectl -n ocis-nats logs -l app.kubernetes.io/component=nats
Defaulted container "nats" out of: nats, reloader
Defaulted container "nats" out of: nats, reloader
Defaulted container "nats" out of: nats, reloader
[7] 2025/02/03 11:27:26.073835 [WRN] Catchup for stream '$OCIS > KV_service-registry' resetting first sequence: 386508 on catchup request
[7] 2025/02/03 11:27:26.132553 [INF] JetStream cluster new stream leader for '$OCIS > KV_eventhistory'
[7] 2025/02/03 11:27:26.355371 [INF] JetStream cluster new stream leader for '$OCIS > KV_ids-storage-users'
[7] 2025/02/03 11:27:26.550106 [INF] JetStream cluster new consumer leader for '$OCIS > main-queue > userlog'
[7] 2025/02/03 11:27:27.066558 [INF] JetStream cluster new metadata leader: nats-2/nats
[7] 2025/02/03 11:27:27.641533 [INF] JetStream cluster new consumer leader for '$OCIS > main-queue > search'
[7] 2025/02/03 11:27:27.689822 [INF] JetStream cluster new stream leader for '$OCIS > KV_postprocessing'
[7] 2025/02/03 11:27:28.489216 [INF] JetStream cluster new consumer leader for '$OCIS > main-queue > frontend'
[7] 2025/02/03 11:27:29.294879 [INF] JetStream cluster new consumer leader for '$OCIS > main-queue > activitylog'
[7] 2025/02/03 11:27:35.960745 [INF] JetStream cluster new consumer leader for '$OCIS > KV_service-registry > atTLH0de'
[7] 2025/02/03 11:27:19.999821 [WRN] RAFT [yrzKKRBu - C-R3F-VcxU0MuI] Detected another leader with higher term, will stepdown
[7] 2025/02/03 11:27:20.005924 [WRN] RAFT [yrzKKRBu - S-R3F-GQ0lBcwu] Detected another leader with higher term, will stepdown
[7] 2025/02/03 11:27:20.009586 [WRN] RAFT [yrzKKRBu - S-R3M-tPuEdTd1] Detected another leader with higher term, will stepdown
[7] 2025/02/03 11:27:20.029709 [WRN] RAFT [yrzKKRBu - S-R3M-eSXnkVG4] Detected another leader with higher term, will stepdown
[7] 2025/02/03 11:27:20.153287 [INF] 10.233.205.70:45004 - rid:300 - Route connection created
[7] 2025/02/03 11:27:20.154363 [INF] 10.233.205.70:45004 - rid:300 - Router connection closed: Duplicate Route
[7] 2025/02/03 11:27:24.855415 [INF] JetStream cluster new stream leader for '$OCIS > KV_activitylog'
[7] 2025/02/03 11:27:24.858442 [INF] JetStream cluster new stream leader for '$OCIS > KV_settings-cache'
[7] 2025/02/03 11:27:26.100417 [INF] JetStream cluster new stream leader for '$OCIS > KV_ocis-pkg'
[7] 2025/02/03 11:27:36.483472 [INF] JetStream cluster new consumer leader for '$OCIS > KV_service-registry > beIVHnry'
[7] 2025/02/03 11:27:26.074897 [INF] Catchup for stream '$OCIS > KV_service-registry' complete
[7] 2025/02/03 11:27:26.119206 [INF] JetStream cluster new stream leader for '$OCIS > KV_userlog'
[7] 2025/02/03 11:27:26.165808 [INF] JetStream cluster new consumer leader for '$OCIS > main-queue > postprocessing'
[7] 2025/02/03 11:27:26.188788 [INF] JetStream cluster new consumer leader for '$OCIS > main-queue > graph'
[7] 2025/02/03 11:27:26.958469 [INF] JetStream cluster new stream leader for '$OCIS > KV_cache-roles'
[7] 2025/02/03 11:27:27.061159 [INF] Self is new JetStream cluster metadata leader
[7] 2025/02/03 11:27:28.034532 [INF] JetStream cluster new stream leader for '$OCIS > main-queue'
[7] 2025/02/03 11:27:28.238499 [INF] JetStream cluster new consumer leader for '$OCIS > main-queue > jsoncs3sharemanager'
[7] 2025/02/03 11:27:29.882287 [INF] JetStream cluster new stream leader for '$OCIS > KV_storage-system'
[7] 2025/02/03 11:27:35.968347 [INF] JetStream cluster new consumer leader for '$OCIS > KV_service-registry > 0x9QNsBr'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants