0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod. #2114

kokibas · 2024-08-06T13:27:43Z

✋ I have searched the open/closed issues and my issue is not listed.

I am trying to deploy a Spark application on Kubernetes using the Spark Operator, but I'm encountering an issue related to CPU allocation for my executor pods. Below is my current SparkApplication YAML configuration
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
name: spark-test-workorderpnhz
namespace: default
spec:
type: Scala
mode: cluster
image: "10.123.13.133:8082/repository/spark-test:latest"
imagePullSecrets: ["nexus-registry-secret"]
mainClass: org.example.WorkorderPnhz
mainApplicationFile: "local:///app/app.jar"
sparkVersion: "3.1.1"
restartPolicy:
type: Never
driver:
cores: 1
coreLimit: "1200m"
memory: "512m"
labels:
version: 3.1.1
serviceAccount: spark-operator-spark
volumeMounts:
- name: spark-conf-volume-driver
mountPath: /opt/spark/conf
javaOptions: "-Dconfig.file=/opt/spark/conf/application.conf"
podName: "spark-test-workorderpnhz"
envVars:
MY_ENV_VAR: "value"
executor:
cores: 1
coreLimit: "1200m"
instances: 1
memory: "512m"
labels:
version: 3.1.1
volumeMounts:
- name: spark-conf-volume-exec
mountPath: /opt/spark/conf
envVars:
SPARK_EXECUTOR_MEMORY: "512Mi"
SPARK_EXECUTOR_CORES: "1"
SPARK_EXECUTOR_INSTANCES: "2"
SPARK_EXECUTOR_MEMORYOVERHEAD: "500m"
dynamicAllocation:
enabled: true
initialExecutors: 1
maxExecutors: 2
minExecutors: 1
sparkConf:
"spark.dynamicAllocation.executorIdleTimeout": "60s"
"spark.shuffle.service.enabled": "true"
volumes:
- name: spark-conf-volume-driver
configMap:
name: spark-drv-conf-map
- name: spark-conf-volume-exec
configMap:
name: spark-exec-conf-map

When I try to deploy this configuration, I receive the following error message:
0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
I have specified coreRequest and coreLimit for both the driver and executor, but the pods still seem to require more CPU resources than available.
The executor is set to use 1 core and a coreLimit of 1200m, but the node can't accommodate this.
I am using Spark Operator version v1beta2-1.4.2-3.5.0 with Kubernetes.

How can I adjust my SparkApplication YAML to ensure the executor pods are scheduled successfully with the available CPU resources? Are there any best practices for configuring CPU requests and limits for Spark executors on Kubernetes to avoid resource insufficiency errors?

ChenYi015 · 2024-08-07T06:22:30Z

@kokibas The error message indicates that your Kubernetes cluster doesn't have enough CPU resources to schedule the Spark pods. You can use kubectl top node to checkout the node capacity. You need to scale your Kubernetes cluster or you can reduce the CPU request for your spark pods if possible.

kokibas · 2024-08-07T06:30:56Z

@ChenYi015 Thank you for the response. I ran kubectl top node, and here are the results:
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% kuber.spark.io 1168m 14% 11189Mi 70%
As you can see, the node is using 1168m CPU (14%) and 11189Mi memory (70%).

In addition, my application logs show the following message:
Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources. It seems like the cluster might not have enough resources allocated to run the application, especially regarding memory. Could this be causing the issue? What would be the best way to resolve this, given the current resource usage?

I have also tried to reduce the CPU consumption for the Spark executors by setting coreRequest and coreLimit to lower values (e.g., 400m and 800m), but they still seem to use 1 CPU each. I want to ensure that the executors are not over-allocating resources unnecessarily. Could there be an issue with how Spark Operator is interpreting these values?

ChenYi015 · 2024-08-07T06:40:13Z

@kokibas Currently, the driver.cores and executor.cores must be integers, that means driver.cores and executor.cores must be at least 1. So you will need at least 2 cpu core (i.e. 2000m) to run the application, but your cluster only have 1168m cpu, so you must scale up your cluster.

ChenYi015 · 2024-08-07T06:46:23Z

@kokibas Spark operator will exec spark-submit inside it, and when you run spark-submit --help, you will find only integers are supported to specify num of cores for driver and executors.

kokibas · 2024-08-07T06:49:59Z

@ChenYi015 Thank you for the clarification. I understand now that the driver.cores and executor.cores must be set to integer values, requiring at least 1 CPU core each. Given my current cluster's available CPU resources (1168m), it seems I don't have enough to meet the 2 CPU core requirement for the Spark application.

I will work on scaling up my cluster to ensure that enough CPU resources are available. If you have any further advice on optimizing resource usage or scaling best practices, I'd appreciate it!

kokibas added the question Further information is requested label Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod. #2114

0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod. #2114

kokibas commented Aug 6, 2024

ChenYi015 commented Aug 7, 2024

kokibas commented Aug 7, 2024 •

edited

Loading

ChenYi015 commented Aug 7, 2024

ChenYi015 commented Aug 7, 2024

kokibas commented Aug 7, 2024

0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod. #2114

0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod. #2114

Comments

kokibas commented Aug 6, 2024

ChenYi015 commented Aug 7, 2024

kokibas commented Aug 7, 2024 • edited Loading

ChenYi015 commented Aug 7, 2024

ChenYi015 commented Aug 7, 2024

kokibas commented Aug 7, 2024

kokibas commented Aug 7, 2024 •

edited

Loading