kubevirt VM pod stuck in terminating state #4417

ianb-mp · 2024-05-15T01:12:28Z

Before creating an issue, make sure you've checked the following:

You are running the latest released version of k0s
Make sure you've searched for existing issues, both open and closed
Make sure you've searched for PRs too, a fix might've been merged already
You're looking at docs for the released version, "main" branch docs are usually ahead of released versions.

Platform

Linux 5.14.0-427.13.1.el9_4.x86_64 #1 SMP PREEMPT_DYNAMIC Wed May 1 19:11:28 UTC 2024 x86_64 GNU/Linux
NAME="Rocky Linux"
VERSION="9.4 (Blue Onyx)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="9.4"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Rocky Linux 9.4 (Blue Onyx)"
ANSI_COLOR="0;32"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:rocky:rocky:9::baseos"
HOME_URL="https://rockylinux.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
SUPPORT_END="2032-05-31"
ROCKY_SUPPORT_PRODUCT="Rocky-Linux-9"
ROCKY_SUPPORT_PRODUCT_VERSION="9.4"
REDHAT_SUPPORT_PRODUCT="Rocky Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.4"

Version

v1.30.0+k0s.0

Sysinfo

k0s sysinfo

Total memory: 251.0 GiB (pass)
Disk space available for /var/lib/k0s: 30.0 GiB (pass)
Name resolution: localhost: [::1 127.0.0.1] (pass)
Operating system: Linux (pass)
  Linux kernel release: 5.14.0-427.13.1.el9_4.x86_64 (pass)
  Max. file descriptors per process: current: 524288 / max: 524288 (pass)
  AppArmor: unavailable (pass)
  Executable in PATH: modprobe: /usr/sbin/modprobe (pass)
  Executable in PATH: mount: /usr/bin/mount (pass)
  Executable in PATH: umount: /usr/bin/umount (pass)
  /proc file system: mounted (0x9fa0) (pass)
  Control Groups: version 2 (pass)
    cgroup controller "cpu": available (is a listed root controller) (pass)
    cgroup controller "cpuacct": available (via cpu in version 2) (pass)
    cgroup controller "cpuset": available (is a listed root controller) (pass)
    cgroup controller "memory": available (is a listed root controller) (pass)
    cgroup controller "devices": available (device filters attachable) (pass)
    cgroup controller "freezer": available (cgroup.freeze exists) (pass)
    cgroup controller "pids": available (is a listed root controller) (pass)
    cgroup controller "hugetlb": available (is a listed root controller) (pass)
    cgroup controller "blkio": available (via io in version 2) (pass)
  CONFIG_CGROUPS: Control Group support: built-in (pass)
    CONFIG_CGROUP_FREEZER: Freezer cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_PIDS: PIDs cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_DEVICE: Device controller for cgroups: built-in (pass)
    CONFIG_CPUSETS: Cpuset support: built-in (pass)
    CONFIG_CGROUP_CPUACCT: Simple CPU accounting cgroup subsystem: built-in (pass)
    CONFIG_MEMCG: Memory Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_HUGETLB: HugeTLB Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_SCHED: Group CPU scheduler: built-in (pass)
      CONFIG_FAIR_GROUP_SCHED: Group scheduling for SCHED_OTHER: built-in (pass)
        CONFIG_CFS_BANDWIDTH: CPU bandwidth provisioning for FAIR_GROUP_SCHED: built-in (pass)
    CONFIG_BLK_CGROUP: Block IO controller: built-in (pass)
  CONFIG_NAMESPACES: Namespaces support: built-in (pass)
    CONFIG_UTS_NS: UTS namespace: built-in (pass)
    CONFIG_IPC_NS: IPC namespace: built-in (pass)
    CONFIG_PID_NS: PID namespace: built-in (pass)
    CONFIG_NET_NS: Network namespace: built-in (pass)
  CONFIG_NET: Networking support: built-in (pass)
    CONFIG_INET: TCP/IP networking: built-in (pass)
      CONFIG_IPV6: The IPv6 protocol: built-in (pass)
    CONFIG_NETFILTER: Network packet filtering framework (Netfilter): built-in (pass)
      CONFIG_NETFILTER_ADVANCED: Advanced netfilter configuration: built-in (pass)
      CONFIG_NF_CONNTRACK: Netfilter connection tracking support: module (pass)
      CONFIG_NETFILTER_XTABLES: Netfilter Xtables support: built-in (pass)
        CONFIG_NETFILTER_XT_TARGET_REDIRECT: REDIRECT target support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_COMMENT: "comment" match support: module (pass)
        CONFIG_NETFILTER_XT_MARK: nfmark target and match support: module (pass)
        CONFIG_NETFILTER_XT_SET: set target and match support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_MASQUERADE: MASQUERADE target support: module (pass)
        CONFIG_NETFILTER_XT_NAT: "SNAT and DNAT" targets support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: "addrtype" address type match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_CONNTRACK: "conntrack" connection tracking match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_MULTIPORT: "multiport" Multiple port match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_RECENT: "recent" match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_STATISTIC: "statistic" match support: module (pass)
      CONFIG_NETFILTER_NETLINK: module (pass)
      CONFIG_NF_NAT: module (pass)
      CONFIG_IP_SET: IP set support: module (pass)
        CONFIG_IP_SET_HASH_IP: hash:ip set support: module (pass)
        CONFIG_IP_SET_HASH_NET: hash:net set support: module (pass)
      CONFIG_IP_VS: IP virtual server support: module (pass)
        CONFIG_IP_VS_NFCT: Netfilter connection tracking: built-in (pass)
        CONFIG_IP_VS_SH: Source hashing scheduling: module (pass)
        CONFIG_IP_VS_RR: Round-robin scheduling: module (pass)
        CONFIG_IP_VS_WRR: Weighted round-robin scheduling: module (pass)
      CONFIG_NF_CONNTRACK_IPV4: IPv4 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_REJECT_IPV4: IPv4 packet rejection: module (pass)
      CONFIG_NF_NAT_IPV4: IPv4 NAT: unknown (warning)
      CONFIG_IP_NF_IPTABLES: IP tables support: module (pass)
        CONFIG_IP_NF_FILTER: Packet filtering: module (pass)
          CONFIG_IP_NF_TARGET_REJECT: REJECT target support: module (pass)
        CONFIG_IP_NF_NAT: iptables NAT support: module (pass)
        CONFIG_IP_NF_MANGLE: Packet mangling: module (pass)
      CONFIG_NF_DEFRAG_IPV4: module (pass)
      CONFIG_NF_CONNTRACK_IPV6: IPv6 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_NAT_IPV6: IPv6 NAT: unknown (warning)
      CONFIG_IP6_NF_IPTABLES: IP6 tables support: module (pass)
        CONFIG_IP6_NF_FILTER: Packet filtering: module (pass)
        CONFIG_IP6_NF_MANGLE: Packet mangling: module (pass)
        CONFIG_IP6_NF_NAT: ip6tables NAT support: module (pass)
      CONFIG_NF_DEFRAG_IPV6: module (pass)
    CONFIG_BRIDGE: 802.1d Ethernet Bridging: module (pass)
      CONFIG_LLC: module (pass)
      CONFIG_STP: module (pass)
  CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: module (pass)
  CONFIG_PROC_FS: /proc file system support: built-in (pass)

What happened?

I'm using Kubevirt to run VMs with k0s. I've found that certain Kubevirt configuration causes the 'virt-launcher' pod to fail to terminate - it gets stuck in 'Terminating' state e.g.

$ kubectl get pods
NAME                          READY   STATUS        RESTARTS   AGE
virt-launcher-6wind-2ngnb     2/2     Terminating   0          32s

By trial and error, I've discovered that the specific configuration that causes this problem is a Kubevirt feature isolateEmulatorThread documented here: https://kubevirt.io/user-guide/virtual_machines/dedicated_cpu_resources/#requesting-dedicated-cpu-for-qemu-emulator

When I set isolateEmulatorThread: true the problem occurs. Note: this setting is used in conjunction with dedicatedCpuPlacement: true however if I specify only the latter, the issue does not occur.

This problem only seems to happen with k0s (I've tested against other k8s distros and not seen this problem).

Steps to reproduce

Launch a VM with isolateEmulatorThread: true
Delete the VirtualMachine k8s resource
Observe that the 'virt-launcher' pod associated with the VM never finishes terminating

Expected behavior

The VM should terminate and all associated resources should be removed from k8s.

Actual behavior

No response

Screenshots and logs

k0scontroller log output: https://gist.github.com/ianb-mp/588ef41ec05e695bc183c61726257278#file-k0scontroller-log
k0sworker log output: https://gist.github.com/ianb-mp/588ef41ec05e695bc183c61726257278#file-k0sworker-log

Here is an example minimal VM manifest to reproduce the issue:

VM.yaml

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  creationTimestamp: null
  name: testvm
spec:
  runStrategy: Always
  template:
    metadata:
      creationTimestamp: null
    spec:
      domain:
        cpu:
          dedicatedCpuPlacement: true
          isolateEmulatorThread: true
        resources:
          requests:
            memory: "500Mi"
        devices:
          disks:
            - disk:
                bus: virtio
              name: containerdisk
            - disk:
                bus: virtio
              name: cloudinitdisk
          interfaces:
            - masquerade: {}
              name: default
          rng: {}
        machine:
          type: ""
      networks:
        - name: default
          pod: {}
      terminationGracePeriodSeconds: 10
      volumes:
      - name: containerdisk
        containerDisk:
          image: quay.io/containerdisks/fedora:40
      - name: cloudinitdisk
        cloudInitNoCloud:
          userData: |-
            #cloud-config
            password: fedora
            chpasswd: { expire: False }

Additional context

This may be related to #4319 as it involves k8s CPU Manager

The text was updated successfully, but these errors were encountered:

twz123 · 2024-05-16T08:37:42Z

What's the cgroups/cpumanager related config that you provide to the other k8s distros? I suspect there are differences from the config used by k0s due to the aforementioned #4319. One way to pin this down to these settings would be to try to copying k0s' settings to the other distros and see if they start failing as well, then.

ianb-mp · 2024-05-23T00:23:58Z

What's the cgroups/cpumanager related config that you provide to the other k8s distros?

Not sure if this answers your question, but I'm passing these kubelet args to k0s and same/similar to k3s + rke2 (where I was testing previously):

--kubelet-extra-args='--cpu-manager-policy=static
       --kube-reserved=cpu=1000m,memory=2000Mi
       --system-reserved=cpu=500m,memory=1000Mi
       --memory-manager-policy=Static
       --topology-manager-policy=restricted
       --topology-manager-scope=pod
       --reserved-memory=0:memory=1550Mi;1:memory=1550Mi'

One way to pin this down to these settings would be to try to copying k0s' settings to the other distros and see if they start failing as well, then.

By "k0s' settings" are you referring to cgroups? I'm not too familiar with how to check/set those - could you provide some specific instruction?

twz123 · 2024-05-23T08:32:53Z

I've started a k0s worker with the flags you provided. This is how k0s starts kubelet and containerd:

k0s-4417-worker-0:~$ xargs -n1 -0 < /proc/$(pidof k0s)/cmdline
/usr/local/bin/k0s
worker
--data-dir=/var/lib/k0s
--kubelet-extra-args=--cpu-manager-policy=static --kube-reserved=cpu=1000m,memory=2000Mi --system-reserved=cpu=500m,memory=1000Mi --memory-manager-policy=Static --topology-manager-policy=restricted --topology-manager-scope=pod --reserved-memory=0:memory=1550Mi;1:memory=1550Mi
--token-file=/etc/k0s/k0stoken

k0s-4417-worker-0:~$ sudo /usr/local/bin/k0s version
v1.30.0+k0s.0

k0s-4417-worker-0:~$ xargs -n1 -0 < /proc/$(pidof kubelet)/cmdline
/var/lib/k0s/bin/kubelet
--root-dir=/var/lib/k0s/kubelet
--cpu-manager-policy=static
--kube-reserved=cpu=1000m,memory=2000Mi
--system-reserved=cpu=500m,memory=1000Mi
--reserved-memory=0:memory=1550Mi;1:memory=1550Mi
--config=/var/lib/k0s/kubelet-config.yaml
--kubeconfig=/var/lib/k0s/kubelet.conf
--containerd=/run/k0s/containerd.sock
--memory-manager-policy=Static
--v=1
--topology-manager-policy=restricted
--topology-manager-scope=pod
--runtime-cgroups=/system.slice/containerd.service
--cert-dir=/var/lib/k0s/kubelet/pki

k0s-4417-worker-0:~$ xargs -n1 -0 < /proc/$(pidof containerd)/cmdline
/var/lib/k0s/bin/containerd
--root=/var/lib/k0s/containerd
--state=/run/k0s/containerd
--address=/run/k0s/containerd.sock
--log-level=info
--config=/etc/k0s/containerd.toml

/var/lib/k0s/kubelet-config.yaml

apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
  anonymous: {}
  webhook:
    cacheTTL: 0s
  x509:
    clientCAFile: /var/lib/k0s/pki/ca.crt
authorization:
  webhook:
    cacheAuthorizedTTL: 0s
    cacheUnauthorizedTTL: 0s
cgroupsPerQOS: true
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
containerRuntimeEndpoint: unix:///run/k0s/containerd.sock
cpuManagerReconcilePeriod: 0s
eventRecordQPS: 0
evictionPressureTransitionPeriod: 0s
failSwapOn: false
fileCheckFrequency: 0s
httpCheckFrequency: 0s
imageMaximumGCAge: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
kubeReservedCgroup: system.slice
kubeletCgroups: /system.slice/containerd.service
logging:
  flushFrequency: 0
  options:
    json:
      infoBufferSize: "0"
    text:
      infoBufferSize: "0"
  verbosity: 0
memorySwap: {}
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
resolvConf: /etc/resolv.conf
rotateCertificates: true
runtimeRequestTimeout: 0s
serverTLSBootstrap: true
shutdownGracePeriod: 0s
shutdownGracePeriodCriticalPods: 0s
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
tlsCipherSuites:
- TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
- TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256
- TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
- TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
tlsMinVersion: VersionTLS12
volumePluginDir: /usr/libexec/k0s/kubelet-plugins/volume/exec
volumeStatsAggPeriod: 0s

/etc/k0s/containerd.toml

# k0s_managed=true
# This is a placeholder configuration for k0s managed containerd.
# If you wish to override the config, remove the first line and replace this file with your custom configuration.
# For reference see https://github.com/containerd/containerd/blob/main/docs/man/containerd-config.toml.5.md
version = 2
imports = [
        "/run/k0s/containerd-cri.toml",
]

/run/k0s/containerd-cri.toml

Note that the below file should reflect the containerd default settings, modulo sandbox_image = "registry.k8s.io/pause:3.9".

Version = 2

[plugins]

  [plugins."io.containerd.grpc.v1.cri"]
    cdi_spec_dirs = ["/etc/cdi", "/var/run/cdi"]
    device_ownership_from_security_context = false
    disable_apparmor = false
    disable_cgroup = false
    disable_hugetlb_controller = true
    disable_proc_mount = false
    disable_tcp_service = true
    drain_exec_sync_io_timeout = "0s"
    enable_cdi = false
    enable_selinux = false
    enable_tls_streaming = false
    enable_unprivileged_icmp = false
    enable_unprivileged_ports = false
    ignore_deprecation_warnings = []
    ignore_image_defined_volumes = false
    image_pull_progress_timeout = "5m0s"
    image_pull_with_sync_fs = false
    max_concurrent_downloads = 3
    max_container_log_line_size = 16384
    netns_mounts_under_state_dir = false
    restrict_oom_score_adj = false
    sandbox_image = "registry.k8s.io/pause:3.9"
    selinux_category_range = 1024
    stats_collect_period = 10
    stream_idle_timeout = "4h0m0s"
    stream_server_address = "127.0.0.1"
    stream_server_port = "0"
    systemd_cgroup = false
    tolerate_missing_hugetlb_controller = true
    unset_seccomp_profile = ""

    [plugins."io.containerd.grpc.v1.cri".cni]
      bin_dir = "/opt/cni/bin"
      conf_dir = "/etc/cni/net.d"
      conf_template = ""
      ip_pref = ""
      max_conf_num = 1
      setup_serially = false

    [plugins."io.containerd.grpc.v1.cri".containerd]
      default_runtime_name = "runc"
      disable_snapshot_annotations = true
      discard_unpacked_layers = false
      ignore_blockio_not_enabled_errors = false
      ignore_rdt_not_enabled_errors = false
      no_pivot = false
      snapshotter = "overlayfs"

      [plugins."io.containerd.grpc.v1.cri".containerd.default_runtime]
        base_runtime_spec = ""
        cni_conf_dir = ""
        cni_max_conf_num = 0
        container_annotations = []
        pod_annotations = []
        privileged_without_host_devices = false
        privileged_without_host_devices_all_devices_allowed = false
        runtime_engine = ""
        runtime_path = ""
        runtime_root = ""
        runtime_type = ""
        sandbox_mode = ""
        snapshotter = ""

        [plugins."io.containerd.grpc.v1.cri".containerd.default_runtime.options]

      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]

        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
          base_runtime_spec = ""
          cni_conf_dir = ""
          cni_max_conf_num = 0
          container_annotations = []
          pod_annotations = []
          privileged_without_host_devices = false
          privileged_without_host_devices_all_devices_allowed = false
          runtime_engine = ""
          runtime_path = ""
          runtime_root = ""
          runtime_type = "io.containerd.runc.v2"
          sandbox_mode = "podsandbox"
          snapshotter = ""

          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
            BinaryName = ""
            CriuImagePath = ""
            CriuPath = ""
            CriuWorkPath = ""
            IoGid = 0
            IoUid = 0
            NoNewKeyring = false
            NoPivotRoot = false
            Root = ""
            ShimCgroup = ""
            SystemdCgroup = false

      [plugins."io.containerd.grpc.v1.cri".containerd.untrusted_workload_runtime]
        base_runtime_spec = ""
        cni_conf_dir = ""
        cni_max_conf_num = 0
        container_annotations = []
        pod_annotations = []
        privileged_without_host_devices = false
        privileged_without_host_devices_all_devices_allowed = false
        runtime_engine = ""
        runtime_path = ""
        runtime_root = ""
        runtime_type = ""
        sandbox_mode = ""
        snapshotter = ""

        [plugins."io.containerd.grpc.v1.cri".containerd.untrusted_workload_runtime.options]

    [plugins."io.containerd.grpc.v1.cri".image_decryption]
      key_model = "node"

    [plugins."io.containerd.grpc.v1.cri".registry]
      config_path = ""

      [plugins."io.containerd.grpc.v1.cri".registry.auths]

      [plugins."io.containerd.grpc.v1.cri".registry.configs]

      [plugins."io.containerd.grpc.v1.cri".registry.headers]

      [plugins."io.containerd.grpc.v1.cri".registry.mirrors]

    [plugins."io.containerd.grpc.v1.cri".x509_key_pair_streaming]
      tls_cert_file = ""
      tls_key_file = ""

You might want to compare these settings to the other distros. If you aren't using NLLB in your cluster, you can also stop the k0s worker and then start containerd and kubelet manually with the above flags. Then you can experiment which settings make it behave badly. The hardcoded non-overrideable settings in k0s are kubeReservedCgroup: system.slice and kubeletCgroups: /system.slice/containerd.service in the kubelet configuration file, and the --runtime-cgroups=/system.slice/containerd.service kubelet CLI flag.

ianb-mp · 2024-05-27T01:33:29Z

Thanks for the example commands. I've compared your output against my k0s worker and it seems to match up - couldn't see any differences that look relevant to the issue.

I've also run the same/similar commands on my other host which is running rke2 (single node controller/worker):

[root@bne-lab-vr-2 ~]# rke2 -v
rke2 version v1.29.3+rke2r1 (1c82f7ed292c4ac172692bb82b13d20733909804)
go version go1.21.8 X:boringcrypto

[root@bne-lab-vr-2 ~]# cat /etc/rancher/rke2/config.yaml 
cni:
- multus
- canal
kubelet-arg:
- "cpu-manager-policy=static"
- "kube-reserved=cpu=1000m,memory=2000Mi"
- "system-reserved=cpu=500m,memory=1000Mi"
- "memory-manager-policy=Static"
- "topology-manager-policy=restricted"
- "topology-manager-scope=pod"
- "reserved-memory=0:memory=1500Mi;1:memory=1500Mi"
disable:
  - rke2-snapshot-controller
  - rke2-snapshot-controller-crd
  - rke2-snapshot-validation-webhook

[root@bne-lab-vr-2 ~]# xargs -n1 -0 < /proc/$(pidof kubelet)/cmdline
kubelet
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
--file-check-frequency=5s
--sync-frequency=30s
--address=0.0.0.0
--anonymous-auth=false
--authentication-token-webhook=true
--authorization-mode=Webhook
--cgroup-driver=systemd
--client-ca-file=/var/lib/rancher/rke2/agent/client-ca.crt
--cloud-provider=external
--cluster-dns=10.43.0.10
--cluster-domain=cluster.local
--container-runtime-endpoint=unix:///run/k3s/containerd/containerd.sock
--containerd=/run/k3s/containerd/containerd.sock
--cpu-manager-policy=static
--eviction-hard=imagefs.available<5%,nodefs.available<5%
--eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10%
--fail-swap-on=false
--feature-gates=CloudDualStackNodeIPs=true
--healthz-bind-address=127.0.0.1
--hostname-override=bne-lab-vr-2.i.megaport.com
--kube-reserved=cpu=1000m,memory=2000Mi
--kubeconfig=/var/lib/rancher/rke2/agent/kubelet.kubeconfig
--memory-manager-policy=Static
--node-ip=10.8.55.32
--node-labels=
--pod-infra-container-image=index.docker.io/rancher/mirrored-pause:3.6
--pod-manifest-path=/var/lib/rancher/rke2/agent/pod-manifests
--read-only-port=0
--reserved-memory=0:memory=1500Mi;1:memory=1500Mi
--resolv-conf=/etc/resolv.conf
--serialize-image-pulls=false
--system-reserved=cpu=500m,memory=1000Mi
--tls-cert-file=/var/lib/rancher/rke2/agent/serving-kubelet.crt
--tls-private-key-file=/var/lib/rancher/rke2/agent/serving-kubelet.key
--topology-manager-policy=restricted
--topology-manager-scope=pod

[root@bne-lab-vr-2 ~]# xargs -n1 -0 < /proc/$(pidof containerd)/cmdline
containerd
-c
/var/lib/rancher/rke2/agent/etc/containerd/config.toml
-a
/run/k3s/containerd/containerd.sock
--state
/run/k3s/containerd
--root
/var/lib/rancher/rke2/agent/containerd

/var/lib/rancher/rke2/agent/etc/containerd/config.toml

# File generated by rke2. DO NOT EDIT. Use config.toml.tmpl instead.
version = 2

[plugins."io.containerd.internal.v1.opt"]
  path = "/var/lib/rancher/rke2/agent/containerd"
[plugins."io.containerd.grpc.v1.cri"]
  stream_server_address = "127.0.0.1"
  stream_server_port = "10010"
  enable_selinux = true
  enable_unprivileged_ports = true
  enable_unprivileged_icmp = true
  sandbox_image = "index.docker.io/rancher/mirrored-pause:3.6"

[plugins."io.containerd.grpc.v1.cri".containerd]
  snapshotter = "overlayfs"
  disable_snapshot_annotations = true
  



[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
  runtime_type = "io.containerd.runc.v2"

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
  SystemdCgroup = true

[plugins."io.containerd.grpc.v1.cri".registry]
  config_path = "/var/lib/rancher/rke2/agent/etc/containerd/certs.d"






[plugins."io.containerd.grpc.v1.cri".containerd.runtimes."crun"]
  runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes."crun".options]
  BinaryName = "/usr/bin/crun"
  SystemdCgroup = true

Something else that I thought may be relevant is to compare the systemd service file for each:

/etc/systemd/system/k0sworker.service

[Unit]
Description=k0s - Zero Friction Kubernetes
Documentation=https://docs.k0sproject.io
ConditionFileIsExecutable=/usr/local/bin/k0s

After=network-online.target 
Wants=network-online.target 

[Service]
StartLimitInterval=5
StartLimitBurst=10
ExecStart=/usr/local/bin/k0s worker --data-dir=/var/lib/k0s --kubelet-extra-args='--cpu-manager-policy=static\x20--kube-reserved=cpu=1000m,memory=2000Mi\x20--system-reserved=cpu=500m,memory=1000Mi\x20--memory-manager-policy=Static\x20--topology-manager-policy=restricted\x20--topology-manager-scope=pod\x20--reserved-memory=0:memory=1550Mi;1:memory=1550Mi\x20--node-ip=192.168.1.10' --labels=feature.node.kubernetes.io/network-sriov.capable=true,openebs.io/zfs=true --token-file=/etc/k0s/k0stoken

RestartSec=120
Delegate=yes
KillMode=process
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
LimitNOFILE=999999
Restart=always

[Install]
WantedBy=multi-user.target

/usr/lib/systemd/system/rke2-server.service

[Unit]
Description=Rancher Kubernetes Engine v2 (server)
Documentation=https://github.com/rancher/rke2#readme
Wants=network-online.target
After=network-online.target
Conflicts=rke2-agent.service

[Install]
WantedBy=multi-user.target

[Service]
Type=notify
EnvironmentFile=-/etc/default/%N
EnvironmentFile=-/etc/sysconfig/%N
EnvironmentFile=-/usr/lib/systemd/system/%N.env
KillMode=process
Delegate=yes
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
Restart=always
RestartSec=5s
ExecStartPre=/bin/sh -xc '! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service'
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/bin/rke2 server
ExecStopPost=-/bin/sh -c "systemd-cgls /system.slice/%n | grep -Eo '[0-9]+ (containerd|kubelet)' | awk '{print $1}' | xargs -r kill"

If you aren't using NLLB in your cluster, you can also stop the k0s worker and then start containerd and kubelet manually with the above flags. Then you can experiment which settings make it behave badly. The hardcoded non-overrideable settings in k0s are kubeReservedCgroup: system.slice and kubeletCgroups: /system.slice/containerd.service in the kubelet configuration file, and the --runtime-cgroups=/system.slice/containerd.service kubelet CLI flag.

This is getting beyond my experience/capability 😮

ianb-mp · 2024-06-04T04:42:23Z

I've tried latest k0s v1.30.1+k0s and also latest Kubevirt v1.2.1 but the issue persists.

github-actions · 2024-07-04T23:04:24Z

The issue is marked as stale since no activity has been recorded in 30 days

twz123 · 2024-07-11T08:23:12Z

Maybe there will be time to sort this out at some point...

github-actions · 2024-08-11T23:04:16Z

The issue is marked as stale since no activity has been recorded in 30 days

github-actions · 2024-09-11T23:04:14Z

The issue is marked as stale since no activity has been recorded in 30 days

ianb-mp added the bug Something isn't working label May 15, 2024

github-actions bot added the Stale label Jul 4, 2024

github-actions bot removed the Stale label Jul 11, 2024

github-actions bot added the Stale label Aug 11, 2024

jnummelin removed the Stale label Aug 12, 2024

github-actions bot added the Stale label Sep 11, 2024

jnummelin removed the Stale label Sep 12, 2024

jnummelin added this to the 1.32 milestone Sep 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubevirt VM pod stuck in terminating state #4417

kubevirt VM pod stuck in terminating state #4417

ianb-mp commented May 15, 2024 •

edited

Loading

twz123 commented May 16, 2024

ianb-mp commented May 23, 2024

twz123 commented May 23, 2024

ianb-mp commented May 27, 2024

ianb-mp commented Jun 4, 2024

github-actions bot commented Jul 4, 2024

twz123 commented Jul 11, 2024

github-actions bot commented Aug 11, 2024

github-actions bot commented Sep 11, 2024

kubevirt VM pod stuck in terminating state #4417

kubevirt VM pod stuck in terminating state #4417

Comments

ianb-mp commented May 15, 2024 • edited Loading

Before creating an issue, make sure you've checked the following:

Platform

Version

Sysinfo

What happened?

Steps to reproduce

Expected behavior

Actual behavior

Screenshots and logs

Additional context

twz123 commented May 16, 2024

ianb-mp commented May 23, 2024

twz123 commented May 23, 2024

ianb-mp commented May 27, 2024

ianb-mp commented Jun 4, 2024

github-actions bot commented Jul 4, 2024

twz123 commented Jul 11, 2024

github-actions bot commented Aug 11, 2024

github-actions bot commented Sep 11, 2024

ianb-mp commented May 15, 2024 •

edited

Loading