Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow setting ephemeral-storage requests and limits on Pods #1942

Open
bscaleb opened this issue Mar 27, 2024 · 7 comments
Open

Allow setting ephemeral-storage requests and limits on Pods #1942

bscaleb opened this issue Mar 27, 2024 · 7 comments

Comments

@bscaleb
Copy link

bscaleb commented Mar 27, 2024

I would like to be able to set spec.containers[].resources.limits.ephemeral-storage and spec.containers[].resources.requests.ephemeral-storage on the driver/executor Pods spawned by my SparkApplication. As far as I can tell, that is not possible. I see in this issue the user wants to do the same: #1546.

My use case is that I am running SparkApplication pods on Amazon EKS Fargate. The root volume of all Fargate Pods used to be fixed at 20GiB but now supports variable requests and limits up to 175GiB. These requests and limits are controlled by the two Pod spec values I mentioned above.

@cinesia
Copy link

cinesia commented Apr 12, 2024

Hi @bscaleb ,
I faced a similar problem about setting requests and limits on the ephemeral storage. There is a solution even tough is a bit tricky.

  1. First define and create a pod-template.yaml ConfigMap with the desired quotas of the ephemeral storage and apply the manifests to the cluster. You can define a file for each ephemeral storage configurations.
apiVersion: v1
kind: ConfigMap
metadata:
  name: spark-driver-pod-template-configmap
  namespace: default
data:
  template.yaml: |
    apiVersion: v1
    kind: Pod
    spec:
      containers:
      - name: spark-kubernetes-driver
        resources:
          requests:
            ephemeral-storage: "2Gi"
          limits:
            ephemeral-storage: "4Gi"
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: spark-driver-executor-template-configmap
  namespace: default
data:
  template.yaml: |
    apiVersion: v1
    kind: Pod
    spec:
      containers:
      - name: spark-kubernetes-executor
        resources:
          requests:
            ephemeral-storage: "2Gi"
          limits:
            ephemeral-storage: "4Gi"
  1. Install the Spark Operator mounting the ConfigMap as a volume using the values below.
spark-operator:
  webhook:
    enable: true
  volumes:
    - name: spark-driver-pod-template-configmap
      configMap:
        name: spark-driver-pod-template-configmap
        items:
          - key: template.yaml
             path: spark-driver-pod-template.yaml
    - name: spark-executor-pod-template-configmap
      configMap:
        name: spark-executor-pod-template-configmap
        items:
          - key: template.yaml
             path: spark-executor-pod-template.yaml
  volumeMounts:
    - name: spark-driver-pod-template-configmap
       mountPath: /opt/spark/conf/spark-driver-pod-template.yaml
       subPath: spark-driver-pod-template.yaml
    - name: spark-executor-pod-template-configmap
       mountPath: /opt/spark/conf/spark-executor-pod-template.yaml
       subPath: spark-executor-pod-template.yaml
  1. Apply the SparkApplication with the configuration described here(https://spark.apache.org/docs/latest/running-on-kubernetes.html#pod-template).
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: pyspark-pi
  namespace: default
spec:
  type: Python
  pythonVersion: "3"
  mode: cluster
  image: "<SPARK-OPERATOR-IMAGE" # <-- Change with a existing image !!!
  imagePullPolicy: Always
  mainApplicationFile: local:///opt/spark/examples/src/main/python/pi.py
  sparkVersion: "3.4.0"
  sparkConf:
    spark.kubernetes.driver.podTemplateFile: /opt/spark/conf/spark-driver-pod-template.yaml  # <--- Must match the  mountPath of the volumeMounts                                                                                                 
    spark.kubernetes.executor.podTemplateFile: /opt/spark/conf/spark-executor-pod-template.yaml # <--- Must match the  mountPath of the volumeMounts                                                                                                 
  restartPolicy:
    type: OnFailure
    onFailureRetries: 3
    onFailureRetryInterval: 10
    onSubmissionFailureRetries: 5
    onSubmissionFailureRetryInterval: 20
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"
    labels:
      version: 3.1.1
    serviceAccount: spark
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 3.1.1

Do not pay attention to the SparkApplication itself, but only to the spec.sparkConf section.

Feel free to ask any question.

@bscaleb
Copy link
Author

bscaleb commented Apr 12, 2024

Hi @cinesia , thanks for your response! The best solution I could find was fairly similar. The main difference is that I just built my own Docker image for the spark-operator and copied the pod template files onto the image, then deployed the spark-operator chart with that image. Mounting them on Pod volumes via ConfigMaps is probably a bit nicer solution than that.

The issue with both of these solutions is that they still rely on changes being made to the spark-operator every time an individual SparkApplication in the cluster needs a new pod template file. Ideally the SparkApplication would be able to either pass this pod template directly to the operator or pass arbitrary Pod spec values which would support setting these specific ephemeral-storage values.

Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@bscaleb
Copy link
Author

bscaleb commented Jul 24, 2024

This shouldn't be marked stale as it is still an issue that would be very nice to see resolved.

@SamWheating
Copy link

SamWheating commented Jul 30, 2024

FWIW I think that this would require some changes in apache/spark, since the existing resource request flags are just passed into the spark-submit command.

For example, the spec.driver.coreRequest value set on the SparkApplication object results in the addition of the spark.kubernetes.driver.request.cores=Value arg to the spark-submit command, defined here.

A similar argument doesn't exist for setting the ephemeralStorage requests/limits in Spark at the moment, so any work would have to start by adding the option there.

Until then, I think the only option is using the podTemplate settings as suggested above.

@jacobsalway
Copy link
Member

jacobsalway commented Jul 31, 2024

@SamWheating is correct in that this isn't supported by Spark itself, but this also could be patched by the mutating webhook in the operator. An ephemeral storage field could be added to the pod spec and patched into the pod spec

@bscaleb
Copy link
Author

bscaleb commented Sep 9, 2024

Should be solved by #2141 I think

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants