You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On a fresh install of Kubernetes with Ceph CSI installed through either helm or manually, creating a PVC works but during Pod Creation using that PVC, following is the result:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 3m10s default-scheduler 0/5 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling.
Normal Scheduled 3m8s default-scheduler Successfully assigned default/mariadb-0 to node02
Normal SuccessfulAttachVolume 3m7s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd"
Warning FailedMount 59s kubelet MountVolume.MountDevice failed for volume "pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd" : rpc error: code = DeadlineExceeded desc = context deadline exceeded
Warning FailedMount 27s (x6 over 58s) kubelet MountVolume.MountDevice failed for volume "pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd" : rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-e159aef1-a240-4680-b557-50da7c676ec0 already exists
The associated RBD Image has been properly created and doesn't have any watchers.
rbd --pool k8s ls
csi-vol-e159aef1-a240-4680-b557-50da7c676ec0
rbd --pool k8s status csi-vol-e159aef1-a240-4680-b557-50da7c676ec0
Watchers: none
Environment details
ceph version 18.2.2
kubectl:
Client Version: v1.30.4
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.4
Kubernetes Cluster was deployed using Kubespray v1.26.0, 5 nodes
Logs
csi-rbdplugin daemon from the host the pod/pvc are deployed on, following is mostly repeated every 2 minutes
Name: datadir-mariadb-0
Namespace: default
StorageClass: csi-rbd-sc
Status: Bound
Volume: pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd
Labels: app=mariadb
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: rbd.csi.ceph.com
volume.kubernetes.io/storage-provisioner: rbd.csi.ceph.com
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 1Gi
Access Modes: RWO
VolumeMode: Filesystem
Used By: mariadb-0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Provisioning 11m rbd.csi.ceph.com_csi-rbdplugin-provisioner-6c57fbb44-f9kz9_96c49c4d-617e-4057-8b6f-4fd935313263 External provisioner is provisioning volume for claim "default/datadir-mariadb-0"
Normal ProvisioningSucceeded 11m rbd.csi.ceph.com_csi-rbdplugin-provisioner-6c57fbb44-f9kz9_96c49c4d-617e-4057-8b6f-4fd935313263 Successfully provisioned volume pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd
Normal ExternalProvisioning 11m persistentvolume-controller Waiting for a volume to be created either by the external provisioner 'rbd.csi.ceph.com' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.
POD:
Name: mariadb-0
Namespace: default
Priority: 0
Service Account: default
Node: node02/10.1.6.20
Start Time: Wed, 18 Sep 2024 17:58:48 +0200
Labels: app=mariadb
apps.kubernetes.io/pod-index=0
controller-revision-hash=mariadb-7595bc849
statefulset.kubernetes.io/pod-name=mariadb-0
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: StatefulSet/mariadb
Containers:
mariadb:
Container ID:
Image: mariadb
Image ID:
Port: 3306/TCP
Host Port: 0/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment:
MARIADB_ROOT_PASSWORD: <set to the key 'mariadb-root-password' in secret 'mariadb-secret'> Optional: false
Mounts:
/var/lib/mysql/ from datadir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-tcdxw (ro)
Conditions:
Type Status
PodReadyToStartContainers False
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
datadir:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: datadir-mariadb-0
ReadOnly: false
kube-api-access-tcdxw:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 12m default-scheduler 0/5 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling.
Normal Scheduled 12m default-scheduler Successfully assigned default/mariadb-0 to node02
Normal SuccessfulAttachVolume 12m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd"
Warning FailedMount 9m55s kubelet MountVolume.MountDevice failed for volume "pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd" : rpc error: code = DeadlineExceeded desc = context deadline exceeded
Warning FailedMount 101s (x11 over 9m54s) kubelet MountVolume.MountDevice failed for volume "pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd" : rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-e159aef1-a240-4680-b557-50da7c676ec0 already exists
Kubelet Logs node02:
Sep 18 18:10:49 node02 kubelet[139431]: E0918 18:10:49.956810 139431 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/rbd.csi.ceph.com^0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a podName: nodeName:}" failed. No retries permitted until 2024-09-18 18:12:51.95677111 +0200 CEST m=+3319.080343137 (durationBeforeRetry 2m2s). Error: UnmountDevice failed for volume "pvc-405d8d73-f657-4109-b335-206e225082c4" (UniqueName: "kubernetes.io/csi/rbd.csi.ceph.com^0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a") on node "node02" : kubernetes.io/csi: attacher.UnmountDevice failed: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a already exists
Sep 18 18:11:14 node02 kubelet[139431]: I0918 18:11:14.720217 139431 reconciler_common.go:220] "operationExecutor.MountVolume started for volume \"pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd\" (UniqueName: \"kubernetes.io/csi/rbd.csi.ceph.com^0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-e159aef1-a240-4680-b557-50da7c676ec0\") pod \"mariadb-0\" (UID: \"38620a05-30ee-4fde-9302-cbfbe56cce77\") " pod="default/mariadb-0"
Sep 18 18:11:14 node02 kubelet[139431]: I0918 18:11:14.720366 139431 operation_generator.go:622] "MountVolume.WaitForAttach entering for volume \"pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd\" (UniqueName: \"kubernetes.io/csi/rbd.csi.ceph.com^0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-e159aef1-a240-4680-b557-50da7c676ec0\") pod \"mariadb-0\" (UID: \"38620a05-30ee-4fde-9302-cbfbe56cce77\") DevicePath \"csi-0c9c090f55d5970d212b95bd0fd652bbe925e4de7bce71d55e3f3fa42a5f96bb\"" pod="default/mariadb-0"
Sep 18 18:11:14 node02 kubelet[139431]: I0918 18:11:14.723657 139431 operation_generator.go:632] "MountVolume.WaitForAttach succeeded for volume \"pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd\" (UniqueName: \"kubernetes.io/csi/rbd.csi.ceph.com^0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-e159aef1-a240-4680-b557-50da7c676ec0\") pod \"mariadb-0\" (UID: \"38620a05-30ee-4fde-9302-cbfbe56cce77\") DevicePath \"csi-0c9c090f55d5970d212b95bd0fd652bbe925e4de7bce71d55e3f3fa42a5f96bb\"" pod="default/mariadb-0"
Sep 18 18:11:14 node02 kubelet[139431]: E0918 18:11:14.734334 139431 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/rbd.csi.ceph.com^0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-e159aef1-a240-4680-b557-50da7c676ec0 podName: nodeName:}" failed. No retries permitted until 2024-09-18 18:13:16.734308848 +0200 CEST m=+3343.857880875 (durationBeforeRetry 2m2s). Error: MountVolume.MountDevice failed for volume "pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd" (UniqueName: "kubernetes.io/csi/rbd.csi.ceph.com^0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-e159aef1-a240-4680-b557-50da7c676ec0") pod "mariadb-0" (UID: "38620a05-30ee-4fde-9302-cbfbe56cce77") : rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-e159aef1-a240-4680-b557-50da7c676ec0 already exists
Sep 18 18:11:32 node02 kubelet[139431]: I0918 18:11:32.977910 139431 kubelet_getters.go:218] "Pod status updated" pod="kube-system/kube-controller-manager-node02" status="Running"
Sep 18 18:11:32 node02 kubelet[139431]: I0918 18:11:32.977985 139431 kubelet_getters.go:218] "Pod status updated" pod="kube-system/kube-scheduler-node02" status="Running"
Sep 18 18:11:32 node02 kubelet[139431]: I0918 18:11:32.977999 139431 kubelet_getters.go:218] "Pod status updated" pod="kube-system/kube-apiserver-node02" status="Running"
Sep 18 18:12:10 node02 kubelet[139431]: E0918 18:12:10.962928 139431 pod_workers.go:1298] "Error syncing pod, skipping" err="unmounted volumes=[datadir], unattached volumes=[], failed to process volumes=[]: context deadline exceeded" pod="default/mariadb-0" podUID="38620a05-30ee-4fde-9302-cbfbe56cce77"
Sep 18 18:12:23 node02 kubelet[139431]: I0918 18:12:23.962381 139431 util.go:30] "No sandbox for pod can be found. Need to start a new one" pod="default/mariadb-0"
Sep 18 18:12:32 node02 kubelet[139431]: I0918 18:12:32.978313 139431 kubelet_getters.go:218] "Pod status updated" pod="kube-system/kube-scheduler-node02" status="Running"
Sep 18 18:12:32 node02 kubelet[139431]: I0918 18:12:32.978376 139431 kubelet_getters.go:218] "Pod status updated" pod="kube-system/kube-controller-manager-node02" status="Running"
Sep 18 18:12:32 node02 kubelet[139431]: I0918 18:12:32.978394 139431 kubelet_getters.go:218] "Pod status updated" pod="kube-system/kube-apiserver-node02" status="Running"
Sep 18 18:12:52 node02 kubelet[139431]: I0918 18:12:52.056991 139431 reconciler_common.go:282] "operationExecutor.UnmountDevice started for volume \"pvc-405d8d73-f657-4109-b335-206e225082c4\" (UniqueName: \"kubernetes.io/csi/rbd.csi.ceph.com^0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a\") on node \"node02\" "
Sep 18 18:12:52 node02 kubelet[139431]: E0918 18:12:52.060694 139431 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/rbd.csi.ceph.com^0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a podName: nodeName:}" failed. No retries permitted until 2024-09-18 18:14:54.060670256 +0200 CEST m=+3441.184242273 (durationBeforeRetry 2m2s). Error: UnmountDevice failed for volume "pvc-405d8d73-f657-4109-b335-206e225082c4" (UniqueName: "kubernetes.io/csi/rbd.csi.ceph.com^0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a") on node "node02" : kubernetes.io/csi: attacher.UnmountDevice failed: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a already exists
As it stands, I can't seem to find anything that would cause the issues. Networking should also not be a issue, as all trafik doesn't get filtered and the kubernetes Cluster is connected to the cluster network of Ceph, using a user with following caps:
caps mgr = "profile rbd pool=k8s"
caps mon = "profile rbd"
caps osd = "profile rbd pool=k8s"
The text was updated successfully, but these errors were encountered:
On a fresh install of Kubernetes with Ceph CSI installed through either helm or manually, creating a PVC works but during Pod Creation using that PVC, following is the result:
Events:
Environment details
ceph version 18.2.2
kubectl:
Client Version: v1.30.4
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.4
Kubernetes Cluster was deployed using Kubespray v1.26.0, 5 nodes
Logs
csi-rbdplugin daemon from the host the pod/pvc are deployed on, following is mostly repeated every 2 minutes
Log from csi-rbdplugin-provisioner during activity
PVC:
POD:
Kubelet Logs node02:
Exec of RBD Test from within Provisioner:
Same Command from within the Daemons doesn't return anything.
TCP Connection from within csi-rbdplugin daemons:
TCP Connection from within csi-rbdplugin-provisioner pods:
As it stands, I can't seem to find anything that would cause the issues. Networking should also not be a issue, as all trafik doesn't get filtered and the kubernetes Cluster is connected to the cluster network of Ceph, using a user with following caps:
caps mgr = "profile rbd pool=k8s"
caps mon = "profile rbd"
caps osd = "profile rbd pool=k8s"
The text was updated successfully, but these errors were encountered: