Documentation

Use cases

Solution for webhook issue with NetApp Supported Apache Spark on OpenShift

To use port 443, additional permissions may be required for the POD and/or container. Therefore, in Operator YAML, you can change the default webhook port from 443 to any port above 1024.

Here below is an example of using Port 9443:

Please navigate to NetApp Supported Apache Spark in the OpenShift installer.
Under the Spark Operator tab, if you are creating the Spark Operator from scratch, then just change the webhook’s port to your desired port while creating the operator.

If the Spark Operator has already been created, you can instead modify the Spark operator’s YAML of webhooks port from 443 to 9443 and save it. This will update the existing operator. So, the final value will be spec.spark-operator.webhook.port:9443.

Below is the code snippet:

spec:  
   spark-operator:  
      webhook:  
           enable: true  
           jobTTLSecondsAfterFinished: 60
           port: 9443

spec:

spark-operator:

webhook:

enable: true

jobTTLSecondsAfterFinished: 60

port: 9443

Use custom service account for Netapp spark operator

By default, there is one service account. To create your own custom service account, follow these steps:

Navigate to the Service Account section under User Management. Create a new service account and update the name in the YAML file accordingly.
If you wish to create the service account in a different namespace, make sure to change the namespace field as well. Ensure that the namespace has been created beforehand.
Create a role binding or cluster binding with the required permissions and bind it to the service account.
- Navigate to the Roles section under User Management, create the role, and add the required access for this role.
- Navigate to the Role Binding sections under User Management. Create a binding and add the necessary details. Select the role and the service account that was created earlier.

Access your data via S3 bucket

By default, our base image has all of the necessary JAR files to connect to an AWS S3 bucket. To connect to an S3 bucket in an Apache Spark application, you need to add the following in the sparkConf section:

sparkConf: 
     spark.hadoop.fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem 
     spark.hadoop.fs.s3a.endpoint: s3.<aws-bucket-region>.amazonaws.com 
    spark.hadoop.fs.s3a.aws.credentials.provider:  com.amazonaws.auth.EnvironmentVariableCredentialsProvider

sparkConf:

spark.hadoop.fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem

spark.hadoop.fs.s3a.endpoint: s3.<aws-bucket-region>.amazonaws.com

spark.hadoop.fs.s3a.aws.credentials.provider: com.amazonaws.auth.EnvironmentVariableCredentialsProvider

We can use different options in spark.hadoop.fs.s3a.aws.credentials.provider to authenticate the S3 bucket. These options include using the Environment Variable Credentials Provider, or the Web Identity Token Credentials Provider.

Environment Variable Credentials Provider:
The Environment Variable Credentials Provider can be used to set the authentication credentials in the Kubernetes secret, instead of as properties in the Spark configuration.

The following configuration needs to be added:

spark.hadoop.fs.s3a.aws.credentials.provider: com.amazonaws.auth.EnvironmentVariableCredentialsProvider

1

spark.hadoop.fs.s3a.aws.credentials.provider: com.amazonaws.auth.EnvironmentVariableCredentialsProvider
To create a secret with AWS credentials, you will need to:
a. Navigate to the Secrets section under Workloads. Click “Create”, then select “key/value secret” and add the credentials details.

b. Add the following environment variable to the driver configuration. Apply the same variable for the executor configuration as well.

driver:
 env:
    - name: AWS_ACCESS_KEY_ID
      valueFrom:
        secretKeyRef:
          name: <your-secrets-name>
          key: AWS_ACCESS_KEY_ID
    - name: AWS_SECRET_ACCESS_KEY
      valueFrom:
        secretKeyRef:
          name: <your-secrets-name>
          key: AWS_SECRET_ACCESS_KEY

driver:

env:

- name: AWS_ACCESS_KEY_ID

valueFrom:

secretKeyRef:

name: <your-secrets-name>

key: AWS_ACCESS_KEY_ID

- name: AWS_SECRET_ACCESS_KEY

valueFrom:

secretKeyRef:

name: <your-secrets-name>

key: AWS_SECRET_ACCESS_KEY

Web Identity Token Credentials Provider:
Web Identity Token Credentials Providers can be used to authenticate via service account that has the appropriate IAM role and permissions. The role must include the OpenID Connect (OIDC) provider of your OpenShift cluster.

Please refer to the service account section if you want to use a custom service account.

The following configuration needs to be added:

spark.hadoop.fs.s3a.region: 'us-west-2' spark.hadoop.fs.s3a.aws.credentials.provider: 'com.amazonaws.auth.WebIdentityTokenCredentialsProvider'

1
2

spark.hadoop.fs.s3a.region: 'us-west-2'
spark.hadoop.fs.s3a.aws.credentials.provider: 'com.amazonaws.auth.WebIdentityTokenCredentialsProvider'
Create an IAM OIDC provider for your cluster.
a) Use the OC command to retrieve the ISSUER_URL from the OpenShift authentication configuration. This command uses jq to parse the JSON output and extract the serviceAccountIssuer,

oc get authentication.config.openshift.io cluster -ojson | jq -r .spec.serviceAccountIssuer

1

oc get authentication.config.openshift.io cluster -ojson | jq -r .spec.serviceAccountIssuer

b) Use OpenSSL to connect to the ISSUER_URL and extract the thumbprint. This command will connect to the URL, retrieve the certificate, and extract the fingerprint,

echo | openssl s_client -connect oidc.op1.openshiftapps.com:443 2>/dev/null | openssl x509 -fingerprint -noout | cut -d= -f2 | sed 's/://g'

1

echo | openssl s_client -connect oidc.op1.openshiftapps.com:443 2>/dev/null | openssl x509 -fingerprint -noout | cut -d= -f2 | sed 's/://g'

c) In most cases, an OIDC provider is already present, such as in a ROSA cluster or a classic ROSA cluster with STS enabled at the time of cluster creation. If an IAM OIDC provider does not exist for your cluster, follow the steps below.

aws iam create-open-id-connect-provider --url <ISSUER URL FROM STEP a> --client-id-list sts.amazonaws.com --thumbprint-list <THUMBPRINT FROM STEP b>

1

aws iam create-open-id-connect-provider --url <ISSUER URL FROM STEP a> --client-id-list sts.amazonaws.com --thumbprint-list <THUMBPRINT FROM STEP b>
Create an IAM role with a trust policy with necessary bucket permissions, and associate it with the OpenShift service account by adding annotations. Refer to the AWS Service Account documentation to create it.

a) Navigate to the service account under user management, select the service account, and click edit annotation.
b) Enter the key and value in the corresponding fields. Then press “Save”.
Configure your Spark application to use the custom service account. Ensure the service account has the necessary roles to access the pods. Add the following into the configuration file.

driver: serviceAccount: <service account name> env: - name: AWS_REGION value: <aws-region>

1
2
3
4
5

driver:
    serviceAccount: <service account name>
    env:
      - name: AWS_REGION
        value:  <aws-region>

By Instaclustr Support

Previous Article NetApp Apache Spark on Red Hat OpenShift - Image and Operator Next Article Support

Need Support?

Learn More

Experiencing difficulties on the website or console?

Status page for known incidents

Already have an account?

Need help with your cluster?

Contact Support

Why sign up?

To experience the ease of creating and managing clusters via the Instaclustr Console

Spin up a cluster in minutes

Free Trial

Use cases

Table of Contents

Solution for webhook issue with NetApp Supported Apache Spark on OpenShift

Use custom service account for Netapp spark operator

Access your data via S3 bucket