-
Notifications
You must be signed in to change notification settings - Fork 353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubevirt VM pod stuck in terminating state #4417
Comments
What's the cgroups/cpumanager related config that you provide to the other k8s distros? I suspect there are differences from the config used by k0s due to the aforementioned #4319. One way to pin this down to these settings would be to try to copying k0s' settings to the other distros and see if they start failing as well, then. |
Not sure if this answers your question, but I'm passing these kubelet args to k0s and same/similar to k3s + rke2 (where I was testing previously):
By "k0s' settings" are you referring to cgroups? I'm not too familiar with how to check/set those - could you provide some specific instruction? |
I've started a k0s worker with the flags you provided. This is how k0s starts kubelet and containerd: k0s-4417-worker-0:~$ xargs -n1 -0 < /proc/$(pidof k0s)/cmdline
/usr/local/bin/k0s
worker
--data-dir=/var/lib/k0s
--kubelet-extra-args=--cpu-manager-policy=static --kube-reserved=cpu=1000m,memory=2000Mi --system-reserved=cpu=500m,memory=1000Mi --memory-manager-policy=Static --topology-manager-policy=restricted --topology-manager-scope=pod --reserved-memory=0:memory=1550Mi;1:memory=1550Mi
--token-file=/etc/k0s/k0stoken
k0s-4417-worker-0:~$ sudo /usr/local/bin/k0s version
v1.30.0+k0s.0
k0s-4417-worker-0:~$ xargs -n1 -0 < /proc/$(pidof kubelet)/cmdline
/var/lib/k0s/bin/kubelet
--root-dir=/var/lib/k0s/kubelet
--cpu-manager-policy=static
--kube-reserved=cpu=1000m,memory=2000Mi
--system-reserved=cpu=500m,memory=1000Mi
--reserved-memory=0:memory=1550Mi;1:memory=1550Mi
--config=/var/lib/k0s/kubelet-config.yaml
--kubeconfig=/var/lib/k0s/kubelet.conf
--containerd=/run/k0s/containerd.sock
--memory-manager-policy=Static
--v=1
--topology-manager-policy=restricted
--topology-manager-scope=pod
--runtime-cgroups=/system.slice/containerd.service
--cert-dir=/var/lib/k0s/kubelet/pki
k0s-4417-worker-0:~$ xargs -n1 -0 < /proc/$(pidof containerd)/cmdline
/var/lib/k0s/bin/containerd
--root=/var/lib/k0s/containerd
--state=/run/k0s/containerd
--address=/run/k0s/containerd.sock
--log-level=info
--config=/etc/k0s/containerd.toml /var/lib/k0s/kubelet-config.yamlapiVersion: kubelet.config.k8s.io/v1beta1
authentication:
anonymous: {}
webhook:
cacheTTL: 0s
x509:
clientCAFile: /var/lib/k0s/pki/ca.crt
authorization:
webhook:
cacheAuthorizedTTL: 0s
cacheUnauthorizedTTL: 0s
cgroupsPerQOS: true
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
containerRuntimeEndpoint: unix:///run/k0s/containerd.sock
cpuManagerReconcilePeriod: 0s
eventRecordQPS: 0
evictionPressureTransitionPeriod: 0s
failSwapOn: false
fileCheckFrequency: 0s
httpCheckFrequency: 0s
imageMaximumGCAge: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
kubeReservedCgroup: system.slice
kubeletCgroups: /system.slice/containerd.service
logging:
flushFrequency: 0
options:
json:
infoBufferSize: "0"
text:
infoBufferSize: "0"
verbosity: 0
memorySwap: {}
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
resolvConf: /etc/resolv.conf
rotateCertificates: true
runtimeRequestTimeout: 0s
serverTLSBootstrap: true
shutdownGracePeriod: 0s
shutdownGracePeriodCriticalPods: 0s
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
tlsCipherSuites:
- TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
- TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256
- TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
- TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
tlsMinVersion: VersionTLS12
volumePluginDir: /usr/libexec/k0s/kubelet-plugins/volume/exec
volumeStatsAggPeriod: 0s /etc/k0s/containerd.toml# k0s_managed=true
# This is a placeholder configuration for k0s managed containerd.
# If you wish to override the config, remove the first line and replace this file with your custom configuration.
# For reference see https://github.com/containerd/containerd/blob/main/docs/man/containerd-config.toml.5.md
version = 2
imports = [
"/run/k0s/containerd-cri.toml",
] /run/k0s/containerd-cri.tomlNote that the below file should reflect the containerd default settings, modulo Version = 2
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
cdi_spec_dirs = ["/etc/cdi", "/var/run/cdi"]
device_ownership_from_security_context = false
disable_apparmor = false
disable_cgroup = false
disable_hugetlb_controller = true
disable_proc_mount = false
disable_tcp_service = true
drain_exec_sync_io_timeout = "0s"
enable_cdi = false
enable_selinux = false
enable_tls_streaming = false
enable_unprivileged_icmp = false
enable_unprivileged_ports = false
ignore_deprecation_warnings = []
ignore_image_defined_volumes = false
image_pull_progress_timeout = "5m0s"
image_pull_with_sync_fs = false
max_concurrent_downloads = 3
max_container_log_line_size = 16384
netns_mounts_under_state_dir = false
restrict_oom_score_adj = false
sandbox_image = "registry.k8s.io/pause:3.9"
selinux_category_range = 1024
stats_collect_period = 10
stream_idle_timeout = "4h0m0s"
stream_server_address = "127.0.0.1"
stream_server_port = "0"
systemd_cgroup = false
tolerate_missing_hugetlb_controller = true
unset_seccomp_profile = ""
[plugins."io.containerd.grpc.v1.cri".cni]
bin_dir = "/opt/cni/bin"
conf_dir = "/etc/cni/net.d"
conf_template = ""
ip_pref = ""
max_conf_num = 1
setup_serially = false
[plugins."io.containerd.grpc.v1.cri".containerd]
default_runtime_name = "runc"
disable_snapshot_annotations = true
discard_unpacked_layers = false
ignore_blockio_not_enabled_errors = false
ignore_rdt_not_enabled_errors = false
no_pivot = false
snapshotter = "overlayfs"
[plugins."io.containerd.grpc.v1.cri".containerd.default_runtime]
base_runtime_spec = ""
cni_conf_dir = ""
cni_max_conf_num = 0
container_annotations = []
pod_annotations = []
privileged_without_host_devices = false
privileged_without_host_devices_all_devices_allowed = false
runtime_engine = ""
runtime_path = ""
runtime_root = ""
runtime_type = ""
sandbox_mode = ""
snapshotter = ""
[plugins."io.containerd.grpc.v1.cri".containerd.default_runtime.options]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
base_runtime_spec = ""
cni_conf_dir = ""
cni_max_conf_num = 0
container_annotations = []
pod_annotations = []
privileged_without_host_devices = false
privileged_without_host_devices_all_devices_allowed = false
runtime_engine = ""
runtime_path = ""
runtime_root = ""
runtime_type = "io.containerd.runc.v2"
sandbox_mode = "podsandbox"
snapshotter = ""
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
BinaryName = ""
CriuImagePath = ""
CriuPath = ""
CriuWorkPath = ""
IoGid = 0
IoUid = 0
NoNewKeyring = false
NoPivotRoot = false
Root = ""
ShimCgroup = ""
SystemdCgroup = false
[plugins."io.containerd.grpc.v1.cri".containerd.untrusted_workload_runtime]
base_runtime_spec = ""
cni_conf_dir = ""
cni_max_conf_num = 0
container_annotations = []
pod_annotations = []
privileged_without_host_devices = false
privileged_without_host_devices_all_devices_allowed = false
runtime_engine = ""
runtime_path = ""
runtime_root = ""
runtime_type = ""
sandbox_mode = ""
snapshotter = ""
[plugins."io.containerd.grpc.v1.cri".containerd.untrusted_workload_runtime.options]
[plugins."io.containerd.grpc.v1.cri".image_decryption]
key_model = "node"
[plugins."io.containerd.grpc.v1.cri".registry]
config_path = ""
[plugins."io.containerd.grpc.v1.cri".registry.auths]
[plugins."io.containerd.grpc.v1.cri".registry.configs]
[plugins."io.containerd.grpc.v1.cri".registry.headers]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors]
[plugins."io.containerd.grpc.v1.cri".x509_key_pair_streaming]
tls_cert_file = ""
tls_key_file = "" You might want to compare these settings to the other distros. If you aren't using NLLB in your cluster, you can also stop the k0s worker and then start containerd and kubelet manually with the above flags. Then you can experiment which settings make it behave badly. The hardcoded non-overrideable settings in k0s are |
Thanks for the example commands. I've compared your output against my k0s worker and it seems to match up - couldn't see any differences that look relevant to the issue. I've also run the same/similar commands on my other host which is running rke2 (single node controller/worker): [root@bne-lab-vr-2 ~]# rke2 -v
rke2 version v1.29.3+rke2r1 (1c82f7ed292c4ac172692bb82b13d20733909804)
go version go1.21.8 X:boringcrypto
[root@bne-lab-vr-2 ~]# cat /etc/rancher/rke2/config.yaml
cni:
- multus
- canal
kubelet-arg:
- "cpu-manager-policy=static"
- "kube-reserved=cpu=1000m,memory=2000Mi"
- "system-reserved=cpu=500m,memory=1000Mi"
- "memory-manager-policy=Static"
- "topology-manager-policy=restricted"
- "topology-manager-scope=pod"
- "reserved-memory=0:memory=1500Mi;1:memory=1500Mi"
disable:
- rke2-snapshot-controller
- rke2-snapshot-controller-crd
- rke2-snapshot-validation-webhook
[root@bne-lab-vr-2 ~]# xargs -n1 -0 < /proc/$(pidof kubelet)/cmdline
kubelet
--volume-plugin-dir=/var/lib/kubelet/volumeplugins
--file-check-frequency=5s
--sync-frequency=30s
--address=0.0.0.0
--anonymous-auth=false
--authentication-token-webhook=true
--authorization-mode=Webhook
--cgroup-driver=systemd
--client-ca-file=/var/lib/rancher/rke2/agent/client-ca.crt
--cloud-provider=external
--cluster-dns=10.43.0.10
--cluster-domain=cluster.local
--container-runtime-endpoint=unix:///run/k3s/containerd/containerd.sock
--containerd=/run/k3s/containerd/containerd.sock
--cpu-manager-policy=static
--eviction-hard=imagefs.available<5%,nodefs.available<5%
--eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10%
--fail-swap-on=false
--feature-gates=CloudDualStackNodeIPs=true
--healthz-bind-address=127.0.0.1
--hostname-override=bne-lab-vr-2.i.megaport.com
--kube-reserved=cpu=1000m,memory=2000Mi
--kubeconfig=/var/lib/rancher/rke2/agent/kubelet.kubeconfig
--memory-manager-policy=Static
--node-ip=10.8.55.32
--node-labels=
--pod-infra-container-image=index.docker.io/rancher/mirrored-pause:3.6
--pod-manifest-path=/var/lib/rancher/rke2/agent/pod-manifests
--read-only-port=0
--reserved-memory=0:memory=1500Mi;1:memory=1500Mi
--resolv-conf=/etc/resolv.conf
--serialize-image-pulls=false
--system-reserved=cpu=500m,memory=1000Mi
--tls-cert-file=/var/lib/rancher/rke2/agent/serving-kubelet.crt
--tls-private-key-file=/var/lib/rancher/rke2/agent/serving-kubelet.key
--topology-manager-policy=restricted
--topology-manager-scope=pod
[root@bne-lab-vr-2 ~]# xargs -n1 -0 < /proc/$(pidof containerd)/cmdline
containerd
-c
/var/lib/rancher/rke2/agent/etc/containerd/config.toml
-a
/run/k3s/containerd/containerd.sock
--state
/run/k3s/containerd
--root
/var/lib/rancher/rke2/agent/containerd /var/lib/rancher/rke2/agent/etc/containerd/config.toml# File generated by rke2. DO NOT EDIT. Use config.toml.tmpl instead.
version = 2
[plugins."io.containerd.internal.v1.opt"]
path = "/var/lib/rancher/rke2/agent/containerd"
[plugins."io.containerd.grpc.v1.cri"]
stream_server_address = "127.0.0.1"
stream_server_port = "10010"
enable_selinux = true
enable_unprivileged_ports = true
enable_unprivileged_icmp = true
sandbox_image = "index.docker.io/rancher/mirrored-pause:3.6"
[plugins."io.containerd.grpc.v1.cri".containerd]
snapshotter = "overlayfs"
disable_snapshot_annotations = true
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
SystemdCgroup = true
[plugins."io.containerd.grpc.v1.cri".registry]
config_path = "/var/lib/rancher/rke2/agent/etc/containerd/certs.d"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes."crun"]
runtime_type = "io.containerd.runc.v2"
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes."crun".options]
BinaryName = "/usr/bin/crun"
SystemdCgroup = true Something else that I thought may be relevant is to compare the systemd service file for each: /etc/systemd/system/k0sworker.service[Unit]
Description=k0s - Zero Friction Kubernetes
Documentation=https://docs.k0sproject.io
ConditionFileIsExecutable=/usr/local/bin/k0s
After=network-online.target
Wants=network-online.target
[Service]
StartLimitInterval=5
StartLimitBurst=10
ExecStart=/usr/local/bin/k0s worker --data-dir=/var/lib/k0s --kubelet-extra-args='--cpu-manager-policy=static\x20--kube-reserved=cpu=1000m,memory=2000Mi\x20--system-reserved=cpu=500m,memory=1000Mi\x20--memory-manager-policy=Static\x20--topology-manager-policy=restricted\x20--topology-manager-scope=pod\x20--reserved-memory=0:memory=1550Mi;1:memory=1550Mi\x20--node-ip=192.168.1.10' --labels=feature.node.kubernetes.io/network-sriov.capable=true,openebs.io/zfs=true --token-file=/etc/k0s/k0stoken
RestartSec=120
Delegate=yes
KillMode=process
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
LimitNOFILE=999999
Restart=always
[Install]
WantedBy=multi-user.target /usr/lib/systemd/system/rke2-server.service[Unit]
Description=Rancher Kubernetes Engine v2 (server)
Documentation=https://github.com/rancher/rke2#readme
Wants=network-online.target
After=network-online.target
Conflicts=rke2-agent.service
[Install]
WantedBy=multi-user.target
[Service]
Type=notify
EnvironmentFile=-/etc/default/%N
EnvironmentFile=-/etc/sysconfig/%N
EnvironmentFile=-/usr/lib/systemd/system/%N.env
KillMode=process
Delegate=yes
LimitNOFILE=1048576
LimitNPROC=infinity
LimitCORE=infinity
TasksMax=infinity
TimeoutStartSec=0
Restart=always
RestartSec=5s
ExecStartPre=/bin/sh -xc '! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service'
ExecStartPre=-/sbin/modprobe br_netfilter
ExecStartPre=-/sbin/modprobe overlay
ExecStart=/usr/bin/rke2 server
ExecStopPost=-/bin/sh -c "systemd-cgls /system.slice/%n | grep -Eo '[0-9]+ (containerd|kubelet)' | awk '{print $1}' | xargs -r kill"
This is getting beyond my experience/capability 😮 |
I've tried latest k0s |
The issue is marked as stale since no activity has been recorded in 30 days |
Maybe there will be time to sort this out at some point... |
The issue is marked as stale since no activity has been recorded in 30 days |
The issue is marked as stale since no activity has been recorded in 30 days |
Before creating an issue, make sure you've checked the following:
Platform
Version
v1.30.0+k0s.0
Sysinfo
k0s sysinfo
What happened?
I'm using Kubevirt to run VMs with k0s. I've found that certain Kubevirt configuration causes the 'virt-launcher' pod to fail to terminate - it gets stuck in 'Terminating' state e.g.
By trial and error, I've discovered that the specific configuration that causes this problem is a Kubevirt feature
isolateEmulatorThread
documented here: https://kubevirt.io/user-guide/virtual_machines/dedicated_cpu_resources/#requesting-dedicated-cpu-for-qemu-emulatorWhen I set
isolateEmulatorThread: true
the problem occurs. Note: this setting is used in conjunction withdedicatedCpuPlacement: true
however if I specify only the latter, the issue does not occur.This problem only seems to happen with k0s (I've tested against other k8s distros and not seen this problem).
Steps to reproduce
isolateEmulatorThread: true
VirtualMachine
k8s resourceExpected behavior
The VM should terminate and all associated resources should be removed from k8s.
Actual behavior
No response
Screenshots and logs
k0scontroller log output: https://gist.github.com/ianb-mp/588ef41ec05e695bc183c61726257278#file-k0scontroller-log
k0sworker log output: https://gist.github.com/ianb-mp/588ef41ec05e695bc183c61726257278#file-k0sworker-log
Here is an example minimal VM manifest to reproduce the issue:
VM.yaml
Additional context
This may be related to #4319 as it involves k8s CPU Manager
The text was updated successfully, but these errors were encountered: