-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod. #2114
Comments
@kokibas The error message indicates that your Kubernetes cluster doesn't have enough CPU resources to schedule the Spark pods. You can use |
@ChenYi015 Thank you for the response. I ran kubectl top node, and here are the results: In addition, my application logs show the following message: I have also tried to reduce the CPU consumption for the Spark executors by setting coreRequest and coreLimit to lower values (e.g., 400m and 800m), but they still seem to use 1 CPU each. I want to ensure that the executors are not over-allocating resources unnecessarily. Could there be an issue with how Spark Operator is interpreting these values? |
@kokibas Currently, the |
@kokibas Spark operator will exec |
@ChenYi015 Thank you for the clarification. I understand now that the driver.cores and executor.cores must be set to integer values, requiring at least 1 CPU core each. Given my current cluster's available CPU resources (1168m), it seems I don't have enough to meet the 2 CPU core requirement for the Spark application. I will work on scaling up my cluster to ensure that enough CPU resources are available. If you have any further advice on optimizing resource usage or scaling best practices, I'd appreciate it! |
I am trying to deploy a Spark application on Kubernetes using the Spark Operator, but I'm encountering an issue related to CPU allocation for my executor pods. Below is my current SparkApplication YAML configuration
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
name: spark-test-workorderpnhz
namespace: default
spec:
type: Scala
mode: cluster
image: "10.123.13.133:8082/repository/spark-test:latest"
imagePullSecrets: ["nexus-registry-secret"]
mainClass: org.example.WorkorderPnhz
mainApplicationFile: "local:///app/app.jar"
sparkVersion: "3.1.1"
restartPolicy:
type: Never
driver:
cores: 1
coreLimit: "1200m"
memory: "512m"
labels:
version: 3.1.1
serviceAccount: spark-operator-spark
volumeMounts:
- name: spark-conf-volume-driver
mountPath: /opt/spark/conf
javaOptions: "-Dconfig.file=/opt/spark/conf/application.conf"
podName: "spark-test-workorderpnhz"
envVars:
MY_ENV_VAR: "value"
executor:
cores: 1
coreLimit: "1200m"
instances: 1
memory: "512m"
labels:
version: 3.1.1
volumeMounts:
- name: spark-conf-volume-exec
mountPath: /opt/spark/conf
envVars:
SPARK_EXECUTOR_MEMORY: "512Mi"
SPARK_EXECUTOR_CORES: "1"
SPARK_EXECUTOR_INSTANCES: "2"
SPARK_EXECUTOR_MEMORYOVERHEAD: "500m"
dynamicAllocation:
enabled: true
initialExecutors: 1
maxExecutors: 2
minExecutors: 1
sparkConf:
"spark.dynamicAllocation.executorIdleTimeout": "60s"
"spark.shuffle.service.enabled": "true"
volumes:
- name: spark-conf-volume-driver
configMap:
name: spark-drv-conf-map
- name: spark-conf-volume-exec
configMap:
name: spark-exec-conf-map
When I try to deploy this configuration, I receive the following error message:
0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
I have specified coreRequest and coreLimit for both the driver and executor, but the pods still seem to require more CPU resources than available.
The executor is set to use 1 core and a coreLimit of 1200m, but the node can't accommodate this.
I am using Spark Operator version v1beta2-1.4.2-3.5.0 with Kubernetes.
How can I adjust my SparkApplication YAML to ensure the executor pods are scheduled successfully with the available CPU resources? Are there any best practices for configuring CPU requests and limits for Spark executors on Kubernetes to avoid resource insufficiency errors?
The text was updated successfully, but these errors were encountered: