-
-
Notifications
You must be signed in to change notification settings - Fork 744
Update ceph-with-rook.md #11120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Update ceph-with-rook.md #11120
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -100,6 +100,188 @@ ceph-bucket rook-ceph.ceph.rook.io/bucket Delete Immediate | |
| ceph-filesystem rook-ceph.cephfs.csi.ceph.com Delete Immediate true 77m | ||
| ``` | ||
|
|
||
| ## 🔧 Single Node Setup Instructions | ||
|
|
||
| If you're deploying on a **single worker node**, follow these specialized steps to set up Rook Ceph with Talos Linux: | ||
|
|
||
| --- | ||
|
|
||
| ### Additional Requirements | ||
|
|
||
| To run a Ceph cluster reliably with Talos Linux and Rook: | ||
|
|
||
| * You need **at least 4 disks for the worker node**: | ||
|
|
||
| * **1 disk** for the Talos OS | ||
| * **3 dedicated and unused disks** for Ceph OSDs | ||
|
|
||
| > ❗ Rook will not function properly without **at least three available and empty disks** for Ceph. | ||
| --- | ||
|
|
||
| ### 1. Enable the Ceph Kernel Module | ||
|
|
||
| Talos includes the `nbd` kernel module, but it needs to be explicitly enabled. | ||
|
|
||
| **Create a patch file** (`patch.values.yaml`): | ||
|
|
||
| ```yaml | ||
| machine: | ||
| kernel: | ||
| modules: | ||
| - name: nbd | ||
| ``` | ||
| **Apply the kernel module patch**: | ||
| ```shell | ||
| talosctl -n 192.168.178.79 patch mc --patch @./terraform/talos/patch/patch.yaml | ||
|
Comment on lines
+121
to
+135
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. krbd kernel module works by default. Is there a reason to change to rbd-nbd and does the rook ceph cluster choose the nbd module over krbd if it's available?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. when I installed ceph it had an error that nbd was not enabled. So I did. nbd ist more modern and has way better way to snapshot volumes with not getting inconsistent journals: https://engineering.salesforce.com/mapping-kubernetes-ceph-volumes-the-rbd-nbd-way-21f7c4161f04/ There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That post seems fairly old since it's referencing ceph jewel and centos 7. Although rbd-nbd could provide earlier access to newer features, I believe there might be performance impact vs using the krbd module. I've provisioned multiple talos servers (1.9.x - 1.10.x) with ceph rook and did not require to add additional kernel modules loaded. What version of talos and rook are you using maybe I can spin up a quick test to check confirm this?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Talos: v.1.10.3
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. values.yaml: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, you need to wipe the disks. You can see that the disk probably still belongs to a previous ceph cluster by looking at the osd-prepare logs. Follow these instruction to wipe them: https://rook.io/docs/rook/latest-release/Getting-Started/ceph-teardown/#zapping-devices
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am stuck even before that phase in: configuring MONs There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please read the documentation link provided above (scroll to the top); you need to cleanup the
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks. This was it:
Maybe one has to write to it more that when no new osd are created and it is stuck in configure mons phase one needs to do that.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One should probably understand rook and ceph before installing it, I don't think any more info than already exists is necessary in talos. Rook documentation is very clear on what to do if you're reusing disks that already contained osds or mons from other clusters. |
||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ### 2. Install Rook Operator | ||
|
|
||
| **Add the Helm repository**: | ||
|
|
||
| ```shell | ||
| helm repo add rook-release https://charts.rook.io/release | ||
| ``` | ||
|
|
||
| **Install the Rook Operator**: | ||
|
|
||
| ```shell | ||
| helm install --create-namespace --namespace rook-ceph rook-ceph rook-release/rook-ceph | ||
| ``` | ||
|
|
||
| **Label the namespace for privileged pods**: | ||
|
|
||
| ```shell | ||
| kubectl label namespace rook-ceph pod-security.kubernetes.io/enforce=privileged | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ### 3. Inspect Available Disks | ||
|
|
||
| Check which disks are available: | ||
|
|
||
| ```shell | ||
| talosctl get disks -n 192.168.178.79 | ||
| ``` | ||
|
|
||
| > ✅ You need to identify **three unused devices** (e.g., `/dev/sdb`, `/dev/sdc`, `/dev/sdd`) for use by Ceph. | ||
| If necessary, create or attach new disks (virtual or physical) that Ceph can consume. | ||
|
|
||
| --- | ||
|
|
||
| ### 4. Prepare Custom `values.yaml` for Single-Node Ceph | ||
|
|
||
| Replace all instances of `talos-mec-lba` below with the **actual name of your worker node**, which you can get from: | ||
|
|
||
| ```shell | ||
| kubectl get nodes -o wide | ||
| ``` | ||
|
|
||
| Here is an example `values.yaml` configuration: | ||
|
|
||
| ```yaml | ||
| storage: | ||
| useAllNodes: false | ||
| useAllDevices: true | ||
| config: | ||
| allowMultiplePerNode: true | ||
| nodes: | ||
| - name: talos-mec-lba # 🔁 Replace with your actual node name | ||
|
|
||
| placement: | ||
| all: | ||
| nodeAffinity: | ||
| requiredDuringSchedulingIgnoredDuringExecution: | ||
| nodeSelectorTerms: | ||
| - matchExpressions: | ||
| - key: kubernetes.io/hostname | ||
| operator: In | ||
| values: | ||
| - talos-mec-lba # 🔁 Replace with your actual node name | ||
| tolerations: | ||
| - key: "node-role.kubernetes.io/control-plane" | ||
| operator: "Exists" | ||
| effect: "NoSchedule" | ||
|
|
||
| cephClusterSpec: | ||
| mon: | ||
| count: 1 | ||
| allowMultiplePerNode: true | ||
| mgr: | ||
| count: 1 | ||
| allowMultiplePerNode: true | ||
| mds: | ||
| count: 0 | ||
| allowMultiplePerNode: true | ||
| rgw: | ||
| count: 0 | ||
| allowMultiplePerNode: true | ||
| crashCollector: | ||
| disable: true | ||
| dashboard: | ||
| enabled: true | ||
| pool: | ||
| replicated: | ||
| size: 1 | ||
| minSize: 1 | ||
|
|
||
| cephCSI: | ||
| csiCephFS: | ||
| provisionerReplicas: 1 | ||
| pluginReplicas: 1 | ||
| placement: | ||
| podAntiAffinity: null | ||
| csiRBD: | ||
| provisionerReplicas: 1 | ||
| pluginReplicas: 1 | ||
| placement: | ||
| podAntiAffinity: null | ||
| ``` | ||
| --- | ||
| ### 5. Install Rook Ceph Cluster with Custom Configuration | ||
| ```shell | ||
| helm install --create-namespace --namespace rook-ceph rook-ceph-cluster \ | ||
| -f values.yaml \ | ||
| --set operatorNamespace=rook-ceph \ | ||
| rook-release/rook-ceph-cluster | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ### 6. ✅ Test PVC Provisioning | ||
|
|
||
| Create a test `PersistentVolumeClaim` to verify that Ceph RBD block storage is working: | ||
|
|
||
| ```yaml | ||
| apiVersion: v1 | ||
| kind: PersistentVolumeClaim | ||
| metadata: | ||
| name: ceph-block-pvc-23 | ||
| namespace: default | ||
| spec: | ||
| storageClassName: ceph-block # your Ceph RBD StorageClass name | ||
| accessModes: | ||
| - ReadWriteOnce | ||
| resources: | ||
| requests: | ||
| storage: 10Gi | ||
| ``` | ||
| Apply the PVC: | ||
| ```shell | ||
| kubectl apply -f test-pvc.yaml | ||
| ``` | ||
|
|
||
| Check if the volume has been successfully provisioned: | ||
|
|
||
| ```shell | ||
| kubectl get pvc ceph-block-pvc-23 | ||
| ``` | ||
|
|
||
| > ✅ If the PVC shows `STATUS: Bound`, your Ceph cluster is working correctly. | ||
|
|
||
| ## Talos Linux Considerations | ||
|
|
||
| It is important to note that a Rook Ceph cluster saves cluster information directly onto the node (by default `dataDirHostPath` is set to `/var/lib/rook`). | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't usually support/modify old Talos documentation, you should probably take this to
website/content/v1.11There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I don't know, but this is very niche use case which shouldn't be used in general (single-node setup). I would probably move this to another document, probably in
advanced/folder and link it from hereThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what also would be great if one copy a section that it doesnt copy the results of the cli (currently in the docs). Or dont have the results at all in the cli to copy.