Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to Use Environment Variable and Custom Configuration in Spark Operator for Spark Application #2138

Open
potlurip opened this issue Aug 21, 2024 · 4 comments
Labels
question Further information is requested

Comments

@potlurip
Copy link

Issue Description:

I'm encountering an issue where I'm unable to utilize an environment variable that contains a password within my Spark application when deploying with the Spark Operator.

Approaches Taken:

Environment Variable Approach:

  1. Using Environment Variable in Spark Configuration:
    • I tried to pass the password as an environment variable and then attempted to access this variable within the Spark configuration.
    • Example:
      apiVersion: "sparkoperator.k8s.io/v1beta2"
      kind: SparkApplication
      metadata:
        name: example-spark-app
        namespace: default
      spec:
        ...
        driver:
          env:
            - name: PASSWORD
              valueFrom:
                secretKeyRef:
                  name: my-secret
                  key: password
        executor:
          env:
            - name: PASSWORD
              valueFrom:
                secretKeyRef:
                  name: my-secret
                  key: password
        sparkConf:
          "spark.myapp.password": "$(PASSWORD)"
        ...
    • The Spark application was unable to resolve and use the password in the configuration.

Alternative Approaches:

  1. Adding spark-defaults.conf in /opt/spark/conf:

    • I added a spark-defaults.conf file in the /opt/spark/conf directory with a single property for the password using init-container.
    • Example content:
      spark.myapp.password=my_password_value
    • The Spark Operator is overwritting the conf dir.
  2. Using Spark ConfigMap:

    • I created a ConfigMap containing the Spark configuration and specified it using spec.sparkConfigMap.
    • Example ConfigMap:
      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: my-spark-config
        namespace: default
      data:
        spark-defaults.conf: |
          spark.myapp.password=my_password_value
    • The ConfigMap successfully created a configuration file in the /etc/spark/conf directory, but the application continued to read configurations from /opt/spark/conf/spark.properties, ignoring the file generated by the ConfigMap.

Expected Behavior:

  • The Spark application should be able to read and use the environment variable in the Spark configuration.
  • Alternatively, it should be able to read the configurations from the ConfigMap file in the /etc/spark/conf directory.

Actual Behavior:

  • The Spark application ignores the environment variable set in the configuration.
  • The application ignores the configuration file created by the ConfigMap in /etc/spark/conf.

Potential Solution:

I considered the option of creating an additional Spark configuration file and configuring the Spark application to use both configuration files. However, I'm unable to find a way to achieve this within the Spark Operator setup.

@potlurip potlurip added the question Further information is requested label Aug 21, 2024
@ChenYi015
Copy link
Contributor

@potlurip Could you provide more detailed information e.g the helm chart version and how you install the chart? Did you enable the webhook server?

@potlurip
Copy link
Author

Helm Chart Version:

  • Version: v1beta2-1.3.8-3.1.1
  • The Spark Operator was installed using Helm with the following command: helm install

Webhook Server:

  • Yes, the webhook server was enabled.

Kubernetes Version:

  • Version: 1.26

@ChenYi015
Copy link
Contributor

ChenYi015 commented Aug 23, 2024

Spark operator internally calls spark-submit, and it should be noted that using environment variables in Spark conf is not supported by spark-submit. One possible approach is to hard-code the application password into the spec.sparkConf, although this method is not secure. Alternatively, you could modify your application to fetch the password from environment variables, which is a more secure practice.

@ha2hi
Copy link
Contributor

ha2hi commented Sep 4, 2024

How about using Kubernetes Secret?

  1. Create Secret
  • Example
kubectl create secret generic myapp-secrets \
  --from-literal= PASSWORD=<YOUR_PASSWORD> \
  1. Using Environment Variable in Spark Configuration
  • Example
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: example-spark-app
  namespace: default
spec:
  ...
  driver:
    env:
      - name: PASSWORD
        valueFrom:
          secretKeyRef:
            name: my-secret
            key: password
  executor:
    env:
      - name: PASSWORD
        valueFrom:
          secretKeyRef:
            name: my-secret
            key: password
  sparkConf:
    "spark.myapp.password": "myapp-secrets:PASSWORD"
  ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants