Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSI-RBD fails to mount PVC, gets stuck in ContainerCreating #4851

Open
fdriessler opened this issue Sep 18, 2024 · 1 comment
Open

CSI-RBD fails to mount PVC, gets stuck in ContainerCreating #4851

fdriessler opened this issue Sep 18, 2024 · 1 comment

Comments

@fdriessler
Copy link

fdriessler commented Sep 18, 2024

On a fresh install of Kubernetes with Ceph CSI installed through either helm or manually, creating a PVC works but during Pod Creation using that PVC, following is the result:
Events:

  Type     Reason                  Age                From                     Message
  ----     ------                  ----               ----                     -------
  Warning  FailedScheduling        3m10s              default-scheduler        0/5 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling.
  Normal   Scheduled               3m8s               default-scheduler        Successfully assigned default/mariadb-0 to node02
  Normal   SuccessfulAttachVolume  3m7s               attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd"
  Warning  FailedMount             59s                kubelet                  MountVolume.MountDevice failed for volume "pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd" : rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Warning  FailedMount             27s (x6 over 58s)  kubelet                  MountVolume.MountDevice failed for volume "pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd" : rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-e159aef1-a240-4680-b557-50da7c676ec0 already exists
The associated RBD Image has been properly created and doesn't have any watchers.
rbd --pool k8s ls
csi-vol-e159aef1-a240-4680-b557-50da7c676ec0
rbd --pool k8s status csi-vol-e159aef1-a240-4680-b557-50da7c676ec0
Watchers: none

Environment details

ceph version 18.2.2
kubectl:
Client Version: v1.30.4
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.4

Kubernetes Cluster was deployed using Kubespray v1.26.0, 5 nodes

Logs

csi-rbdplugin daemon from the host the pod/pvc are deployed on, following is mostly repeated every 2 minutes

I0918 16:05:32.880127  145233 utils.go:240] ID: 197 GRPC call: /csi.v1.Identity/Probe
I0918 16:05:32.880181  145233 utils.go:241] ID: 197 GRPC request: {}
I0918 16:05:32.880204  145233 utils.go:247] ID: 197 GRPC response: {}
I0918 16:06:32.924934  145233 utils.go:240] ID: 198 GRPC call: /csi.v1.Identity/Probe
I0918 16:06:32.924982  145233 utils.go:241] ID: 198 GRPC request: {}
I0918 16:06:32.925005  145233 utils.go:247] ID: 198 GRPC response: {}
I0918 16:06:45.759271  145233 utils.go:240] ID: 199 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0918 16:06:45.759333  145233 utils.go:241] ID: 199 GRPC request: {}
I0918 16:06:45.759501  145233 utils.go:247] ID: 199 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":3}}},{"Type":{"Rpc":{"type":5}}}]}
I0918 16:06:45.760723  145233 utils.go:240] ID: 200 Req-ID: 0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a GRPC call: /csi.v1.Node/NodeUnstageVolume
I0918 16:06:45.760805  145233 utils.go:241] ID: 200 Req-ID: 0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a GRPC request: {"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/rbd.csi.ceph.com/421629782cf76dd64c3f8a6b881194643170681f0539815d6727fd987d6e4a80/globalmount","volume_id":"0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a"}
E0918 16:06:45.760844  145233 nodeserver.go:977] ID: 200 Req-ID: 0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a an operation with the given Volume ID 0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a already exists
E0918 16:06:45.760862  145233 utils.go:245] ID: 200 Req-ID: 0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a already exists

Log from csi-rbdplugin-provisioner during activity

I0918 15:55:42.171850       1 utils.go:241] ID: 52 GRPC request: {}
I0918 15:55:42.171869       1 utils.go:247] ID: 52 GRPC response: {}
I0918 15:56:42.202198       1 utils.go:240] ID: 53 GRPC call: /csi.v1.Identity/Probe
I0918 15:56:42.202265       1 utils.go:241] ID: 53 GRPC request: {}
I0918 15:56:42.202297       1 utils.go:247] ID: 53 GRPC response: {}
I0918 15:57:42.209383       1 utils.go:240] ID: 54 GRPC call: /csi.v1.Identity/Probe
I0918 15:57:42.209420       1 utils.go:241] ID: 54 GRPC request: {}
I0918 15:57:42.209434       1 utils.go:247] ID: 54 GRPC response: {}
I0918 15:58:36.240716       1 utils.go:240] ID: 55 Req-ID: 0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a GRPC call: /csi.v1.Controller/DeleteVolume
I0918 15:58:36.240845       1 utils.go:241] ID: 55 Req-ID: 0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a GRPC request: {"secrets":"***stripped***","volume_id":"0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a"}
I0918 15:58:36.244109       1 omap.go:89] ID: 55 Req-ID: 0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a got omap values: (pool="k8s", namespace="", name="csi.volume.c32d2900-5956-4b25-a9a7-8cb78c8e7f9a"): map[csi.imageid:91a17484734de8 csi.imagename:csi-vol-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a csi.volname:pvc-405d8d73-f657-4109-b335-206e225082c4 csi.volume.owner:default]
I0918 15:58:36.303882       1 rbd_util.go:640] ID: 55 Req-ID: 0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a rbd: delete csi-vol-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a-temp using mon 10.10.10.60:6789,10.10.10.70:6789,10.10.10.80:6789,10.10.10.90:6789, pool k8s
I0918 15:58:36.313956       1 controllerserver.go:1059] ID: 55 Req-ID: 0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a deleting image csi-vol-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a
I0918 15:58:36.313982       1 rbd_util.go:640] ID: 55 Req-ID: 0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a rbd: delete csi-vol-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a using mon 10.10.10.60:6789,10.10.10.70:6789,10.10.10.80:6789,10.10.10.90:6789, pool k8s
I0918 15:58:36.335423       1 rbd_util.go:682] ID: 55 Req-ID: 0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a rbd: adding task to remove image "k8s/csi-vol-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a" with id "91a17484734de8" from trash
I0918 15:58:36.343132       1 rbd_util.go:706] ID: 55 Req-ID: 0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a rbd: successfully added task to move image "k8s/csi-vol-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a" with id "91a17484734de8" to trash
I0918 15:58:36.346922       1 omap.go:126] ID: 55 Req-ID: 0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a removed omap keys (pool="k8s", namespace="", name="csi.volumes.default"): [csi.volume.pvc-405d8d73-f657-4109-b335-206e225082c4]
I0918 15:58:36.346994       1 utils.go:247] ID: 55 Req-ID: 0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a GRPC response: {}
I0918 15:58:36.369004       1 utils.go:240] ID: 56 Req-ID: 0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a GRPC call: /csi.v1.Controller/DeleteVolume
I0918 15:58:36.369063       1 utils.go:241] ID: 56 Req-ID: 0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a GRPC request: {"secrets":"***stripped***","volume_id":"0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a"}
E0918 15:58:36.369766       1 omap.go:80] ID: 56 Req-ID: 0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a omap not found (pool="k8s", namespace="", name="csi.volume.c32d2900-5956-4b25-a9a7-8cb78c8e7f9a"): rados: ret=-2, No such file or directory
W0918 15:58:36.369796       1 voljournal.go:736] ID: 56 Req-ID: 0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a unable to read omap keys: pool or key missing: key not found: rados: ret=-2, No such file or directory
E0918 15:58:36.375990       1 rbd_journal.go:691] ID: 56 Req-ID: 0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a failed to get image id k8s/csi-vol-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a: Failed as image not found (internal RBD image not found)
I0918 15:58:36.377004       1 rbd_util.go:682] ID: 56 Req-ID: 0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a rbd: adding task to remove image "k8s/csi-vol-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a" with id "91a17484734de8" from trash
I0918 15:58:36.379044       1 rbd_util.go:706] ID: 56 Req-ID: 0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a rbd: successfully added task to move image "k8s/csi-vol-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a" with id "91a17484734de8" to trash
I0918 15:58:36.382825       1 omap.go:126] ID: 56 Req-ID: 0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a removed omap keys (pool="k8s", namespace="", name="csi.volumes.default"): [csi.volume.]
I0918 15:58:36.382873       1 utils.go:247] ID: 56 Req-ID: 0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a GRPC response: {}
I0918 15:58:42.173368       1 utils.go:240] ID: 57 GRPC call: /csi.v1.Identity/Probe
I0918 15:58:42.173447       1 utils.go:241] ID: 57 GRPC request: {}
I0918 15:58:42.173474       1 utils.go:247] ID: 57 GRPC response: {}
I0918 15:58:47.076040       1 utils.go:240] ID: 58 Req-ID: pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd GRPC call: /csi.v1.Controller/CreateVolume
I0918 15:58:47.076350       1 utils.go:241] ID: 58 Req-ID: pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd GRPC request: {"capacity_range":{"required_bytes":1073741824},"name":"pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd","parameters":{"clusterID":"ec49375b-20d9-4792-8a51-c9fbcec206d6","csi.storage.k8s.io/pv/name":"pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd","csi.storage.k8s.io/pvc/name":"datadir-mariadb-0","csi.storage.k8s.io/pvc/namespace":"default","imageFeatures":"layering","pool":"k8s"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4","mount_flags":["discard"]}},"access_mode":{"mode":1}}]}
I0918 15:58:47.076564       1 rbd_util.go:1341] ID: 58 Req-ID: pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd setting disableInUseChecks: false image features: [layering] mounter: rbd
I0918 15:58:47.078229       1 omap.go:89] ID: 58 Req-ID: pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd got omap values: (pool="k8s", namespace="", name="csi.volumes.default"): map[]
I0918 15:58:47.083934       1 omap.go:159] ID: 58 Req-ID: pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd set omap keys (pool="k8s", namespace="", name="csi.volumes.default"): map[csi.volume.pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd:e159aef1-a240-4680-b557-50da7c676ec0])
I0918 15:58:47.085435       1 omap.go:159] ID: 58 Req-ID: pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd set omap keys (pool="k8s", namespace="", name="csi.volume.e159aef1-a240-4680-b557-50da7c676ec0"): map[csi.imagename:csi-vol-e159aef1-a240-4680-b557-50da7c676ec0 csi.volname:pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd csi.volume.owner:default])
I0918 15:58:47.085467       1 rbd_journal.go:491] ID: 58 Req-ID: pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd generated Volume ID (0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-e159aef1-a240-4680-b557-50da7c676ec0) and image name (csi-vol-e159aef1-a240-4680-b557-50da7c676ec0) for request name (pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd)
I0918 15:58:47.085560       1 rbd_util.go:431] ID: 58 Req-ID: pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd rbd: create k8s/csi-vol-e159aef1-a240-4680-b557-50da7c676ec0 size 1024M (features: [layering]) using mon 10.10.10.60:6789,10.10.10.70:6789,10.10.10.80:6789,10.10.10.90:6789
I0918 15:58:47.085592       1 rbd_util.go:1589] ID: 58 Req-ID: pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd setting image options on k8s/csi-vol-e159aef1-a240-4680-b557-50da7c676ec0
I0918 15:58:47.100804       1 controllerserver.go:784] ID: 58 Req-ID: pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd created image k8s/csi-vol-e159aef1-a240-4680-b557-50da7c676ec0 backed for request name pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd
I0918 15:58:47.116438       1 omap.go:159] ID: 58 Req-ID: pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd set omap keys (pool="k8s", namespace="", name="csi.volume.e159aef1-a240-4680-b557-50da7c676ec0"): map[csi.imageid:91a174e1064178])
I0918 15:58:47.164609       1 utils.go:247] ID: 58 Req-ID: pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd GRPC response: {"volume":{"capacity_bytes":1073741824,"volume_context":{"clusterID":"ec49375b-20d9-4792-8a51-c9fbcec206d6","imageFeatures":"layering","imageName":"csi-vol-e159aef1-a240-4680-b557-50da7c676ec0","journalPool":"k8s","pool":"k8s"},"volume_id":"0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-e159aef1-a240-4680-b557-50da7c676ec0"}}
I0918 15:59:42.171656       1 utils.go:240] ID: 59 GRPC call: /csi.v1.Identity/Probe
I0918 15:59:42.171704       1 utils.go:241] ID: 59 GRPC request: {}
I0918 15:59:42.171724       1 utils.go:247] ID: 59 GRPC response: {}
I0918 16:00:42.208527       1 utils.go:240] ID: 60 GRPC call: /csi.v1.Identity/Probe
I0918 16:00:42.208572       1 utils.go:241] ID: 60 GRPC request: {}
I0918 16:00:42.208587       1 utils.go:247] ID: 60 GRPC response: {}
I0918 16:01:42.171613       1 utils.go:240] ID: 61 GRPC call: /csi.v1.Identity/Probe
I0918 16:01:42.171652       1 utils.go:241] ID: 61 GRPC request: {}

PVC:

Name:          datadir-mariadb-0
Namespace:     default
StorageClass:  csi-rbd-sc
Status:        Bound
Volume:        pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd
Labels:        app=mariadb
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: rbd.csi.ceph.com
               volume.kubernetes.io/storage-provisioner: rbd.csi.ceph.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      1Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       mariadb-0
Events:
  Type    Reason                 Age   From                                                                                             Message
  ----    ------                 ----  ----                                                                                             -------
  Normal  Provisioning           11m   rbd.csi.ceph.com_csi-rbdplugin-provisioner-6c57fbb44-f9kz9_96c49c4d-617e-4057-8b6f-4fd935313263  External provisioner is provisioning volume for claim "default/datadir-mariadb-0"
  Normal  ProvisioningSucceeded  11m   rbd.csi.ceph.com_csi-rbdplugin-provisioner-6c57fbb44-f9kz9_96c49c4d-617e-4057-8b6f-4fd935313263  Successfully provisioned volume pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd
  Normal  ExternalProvisioning   11m   persistentvolume-controller                                                                      Waiting for a volume to be created either by the external provisioner 'rbd.csi.ceph.com' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.

POD:

Name:             mariadb-0
Namespace:        default
Priority:         0
Service Account:  default
Node:             node02/10.1.6.20
Start Time:       Wed, 18 Sep 2024 17:58:48 +0200
Labels:           app=mariadb
                  apps.kubernetes.io/pod-index=0
                  controller-revision-hash=mariadb-7595bc849
                  statefulset.kubernetes.io/pod-name=mariadb-0
Annotations:      <none>
Status:           Pending
IP:               
IPs:              <none>
Controlled By:    StatefulSet/mariadb
Containers:
  mariadb:
    Container ID:   
    Image:          mariadb
    Image ID:       
    Port:           3306/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:
      MARIADB_ROOT_PASSWORD:  <set to the key 'mariadb-root-password' in secret 'mariadb-secret'>  Optional: false
    Mounts:
      /var/lib/mysql/ from datadir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-tcdxw (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   False 
  Initialized                 True 
  Ready                       False 
  ContainersReady             False 
  PodScheduled                True 
Volumes:
  datadir:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  datadir-mariadb-0
    ReadOnly:   false
  kube-api-access-tcdxw:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                    From                     Message
  ----     ------                  ----                   ----                     -------
  Warning  FailedScheduling        12m                    default-scheduler        0/5 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling.
  Normal   Scheduled               12m                    default-scheduler        Successfully assigned default/mariadb-0 to node02
  Normal   SuccessfulAttachVolume  12m                    attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd"
  Warning  FailedMount             9m55s                  kubelet                  MountVolume.MountDevice failed for volume "pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd" : rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Warning  FailedMount             101s (x11 over 9m54s)  kubelet                  MountVolume.MountDevice failed for volume "pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd" : rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-e159aef1-a240-4680-b557-50da7c676ec0 already exists

Kubelet Logs node02:

Sep 18 18:10:49 node02 kubelet[139431]: E0918 18:10:49.956810  139431 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/rbd.csi.ceph.com^0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a podName: nodeName:}" failed. No retries permitted until 2024-09-18 18:12:51.95677111 +0200 CEST m=+3319.080343137 (durationBeforeRetry 2m2s). Error: UnmountDevice failed for volume "pvc-405d8d73-f657-4109-b335-206e225082c4" (UniqueName: "kubernetes.io/csi/rbd.csi.ceph.com^0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a") on node "node02" : kubernetes.io/csi: attacher.UnmountDevice failed: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a already exists
Sep 18 18:11:14 node02 kubelet[139431]: I0918 18:11:14.720217  139431 reconciler_common.go:220] "operationExecutor.MountVolume started for volume \"pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd\" (UniqueName: \"kubernetes.io/csi/rbd.csi.ceph.com^0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-e159aef1-a240-4680-b557-50da7c676ec0\") pod \"mariadb-0\" (UID: \"38620a05-30ee-4fde-9302-cbfbe56cce77\") " pod="default/mariadb-0"
Sep 18 18:11:14 node02 kubelet[139431]: I0918 18:11:14.720366  139431 operation_generator.go:622] "MountVolume.WaitForAttach entering for volume \"pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd\" (UniqueName: \"kubernetes.io/csi/rbd.csi.ceph.com^0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-e159aef1-a240-4680-b557-50da7c676ec0\") pod \"mariadb-0\" (UID: \"38620a05-30ee-4fde-9302-cbfbe56cce77\") DevicePath \"csi-0c9c090f55d5970d212b95bd0fd652bbe925e4de7bce71d55e3f3fa42a5f96bb\"" pod="default/mariadb-0"
Sep 18 18:11:14 node02 kubelet[139431]: I0918 18:11:14.723657  139431 operation_generator.go:632] "MountVolume.WaitForAttach succeeded for volume \"pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd\" (UniqueName: \"kubernetes.io/csi/rbd.csi.ceph.com^0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-e159aef1-a240-4680-b557-50da7c676ec0\") pod \"mariadb-0\" (UID: \"38620a05-30ee-4fde-9302-cbfbe56cce77\") DevicePath \"csi-0c9c090f55d5970d212b95bd0fd652bbe925e4de7bce71d55e3f3fa42a5f96bb\"" pod="default/mariadb-0"
Sep 18 18:11:14 node02 kubelet[139431]: E0918 18:11:14.734334  139431 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/rbd.csi.ceph.com^0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-e159aef1-a240-4680-b557-50da7c676ec0 podName: nodeName:}" failed. No retries permitted until 2024-09-18 18:13:16.734308848 +0200 CEST m=+3343.857880875 (durationBeforeRetry 2m2s). Error: MountVolume.MountDevice failed for volume "pvc-80ce90f2-1401-4ad5-a213-97b5221f99bd" (UniqueName: "kubernetes.io/csi/rbd.csi.ceph.com^0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-e159aef1-a240-4680-b557-50da7c676ec0") pod "mariadb-0" (UID: "38620a05-30ee-4fde-9302-cbfbe56cce77") : rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-e159aef1-a240-4680-b557-50da7c676ec0 already exists
Sep 18 18:11:32 node02 kubelet[139431]: I0918 18:11:32.977910  139431 kubelet_getters.go:218] "Pod status updated" pod="kube-system/kube-controller-manager-node02" status="Running"
Sep 18 18:11:32 node02 kubelet[139431]: I0918 18:11:32.977985  139431 kubelet_getters.go:218] "Pod status updated" pod="kube-system/kube-scheduler-node02" status="Running"
Sep 18 18:11:32 node02 kubelet[139431]: I0918 18:11:32.977999  139431 kubelet_getters.go:218] "Pod status updated" pod="kube-system/kube-apiserver-node02" status="Running"
Sep 18 18:12:10 node02 kubelet[139431]: E0918 18:12:10.962928  139431 pod_workers.go:1298] "Error syncing pod, skipping" err="unmounted volumes=[datadir], unattached volumes=[], failed to process volumes=[]: context deadline exceeded" pod="default/mariadb-0" podUID="38620a05-30ee-4fde-9302-cbfbe56cce77"
Sep 18 18:12:23 node02 kubelet[139431]: I0918 18:12:23.962381  139431 util.go:30] "No sandbox for pod can be found. Need to start a new one" pod="default/mariadb-0"
Sep 18 18:12:32 node02 kubelet[139431]: I0918 18:12:32.978313  139431 kubelet_getters.go:218] "Pod status updated" pod="kube-system/kube-scheduler-node02" status="Running"
Sep 18 18:12:32 node02 kubelet[139431]: I0918 18:12:32.978376  139431 kubelet_getters.go:218] "Pod status updated" pod="kube-system/kube-controller-manager-node02" status="Running"
Sep 18 18:12:32 node02 kubelet[139431]: I0918 18:12:32.978394  139431 kubelet_getters.go:218] "Pod status updated" pod="kube-system/kube-apiserver-node02" status="Running"
Sep 18 18:12:52 node02 kubelet[139431]: I0918 18:12:52.056991  139431 reconciler_common.go:282] "operationExecutor.UnmountDevice started for volume \"pvc-405d8d73-f657-4109-b335-206e225082c4\" (UniqueName: \"kubernetes.io/csi/rbd.csi.ceph.com^0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a\") on node \"node02\" "
Sep 18 18:12:52 node02 kubelet[139431]: E0918 18:12:52.060694  139431 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/rbd.csi.ceph.com^0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a podName: nodeName:}" failed. No retries permitted until 2024-09-18 18:14:54.060670256 +0200 CEST m=+3441.184242273 (durationBeforeRetry 2m2s). Error: UnmountDevice failed for volume "pvc-405d8d73-f657-4109-b335-206e225082c4" (UniqueName: "kubernetes.io/csi/rbd.csi.ceph.com^0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a") on node "node02" : kubernetes.io/csi: attacher.UnmountDevice failed: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-ec49375b-20d9-4792-8a51-c9fbcec206d6-000000000000001e-c32d2900-5956-4b25-a9a7-8cb78c8e7f9a already exists

Exec of RBD Test from within Provisioner:

kubectl exec csi-rbdplugin-provisioner-6c57fbb44-48kj9 -- rbd ls --id kubernetes --key xxxxxxxxxxxxxxxxxxxxxx -m=10.10.10.60:6789 --pool k8s
csi-vol-e159aef1-a240-4680-b557-50da7c676ec0

Same Command from within the Daemons doesn't return anything.

TCP Connection from within csi-rbdplugin daemons:

[root@node01 /]# cat < /dev/tcp/10.10.10.60/6789
ceph v027▒�


<�


�^C
[root@node01 /]# cat < /dev/tcp/10.10.10.60/3300
ceph v2

TCP Connection from within csi-rbdplugin-provisioner pods:

[root@csi-rbdplugin-provisioner-6c57fbb44-48kj9 /]# cat < /dev/tcp/10.10.10.60/6789
ceph v027▒�


<�&


�^C
[root@csi-rbdplugin-provisioner-6c57fbb44-48kj9 /]# cat < /dev/tcp/10.10.10.60/3300
ceph v2

As it stands, I can't seem to find anything that would cause the issues. Networking should also not be a issue, as all trafik doesn't get filtered and the kubernetes Cluster is connected to the cluster network of Ceph, using a user with following caps:
caps mgr = "profile rbd pool=k8s"
caps mon = "profile rbd"
caps osd = "profile rbd pool=k8s"

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Sep 20, 2024

Same Command from within the Daemons doesn't return anything.

This Looks to be the issue, can you please why this command is not working from the plugin container on the node?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants