Use cases
Table of Contents
Solution for webhook issue with NetApp Supported Apache Spark on OpenShift
To use port 443, additional permissions may be required for the POD and/or container. Therefore, in Operator YAML, you can change the default webhook port from 443 to any port above 1024.
Here below is an example of using Port 9443:
- Please navigate to NetApp Supported Apache Spark in the OpenShift installer.
- Under the Spark Operator tab, if you are creating the Spark Operator from scratch, then just change the webhook’s port to your desired port while creating the operator.
If the Spark Operator has already been created, you can instead modify the Spark operator’s YAML of webhooks port from 443 to 9443 and save it. This will update the existing operator. So, the final value will be spec.spark-operator.webhook.port:9443.
Below is the code snippet:
1 2 3 4 5 6 |
spec: spark-operator: webhook: enable: true jobTTLSecondsAfterFinished: 60 port: 9443 |
Use custom service account for Netapp spark operator
By default, there is one service account. To create your own custom service account, follow these steps:
- Navigate to the Service Account section under User Management. Create a new service account and update the name in the YAML file accordingly.
- If you wish to create the service account in a different namespace, make sure to change the namespace field as well. Ensure that the namespace has been created beforehand.
- Create a role binding or cluster binding with the required permissions and bind it to the service account.
-
Navigate to the Roles section under User Management, create the role, and add the required access for this role.
- Navigate to the Role Binding sections under User Management. Create a binding and add the necessary details. Select the role and the service account that was created earlier.
-
Access your data via S3 bucket
By default, our base image has all of the necessary JAR files to connect to an AWS S3 bucket. To connect to an S3 bucket in an Apache Spark application, you need to add the following in the sparkConf section:
1 2 3 4 |
sparkConf: spark.hadoop.fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem spark.hadoop.fs.s3a.endpoint: s3.<aws-bucket-region>.amazonaws.com spark.hadoop.fs.s3a.aws.credentials.provider: com.amazonaws.auth.EnvironmentVariableCredentialsProvider |
We can use different options in spark.hadoop.fs.s3a.aws.credentials.provider to authenticate the S3 bucket. These options include using the Environment Variable Credentials Provider, or the Web Identity Token Credentials Provider.
Environment Variable Credentials Provider:
The Environment Variable Credentials Provider can be used to set the authentication credentials in the Kubernetes secret, instead of as properties in the Spark configuration.
- The following configuration needs to be added:
1spark.hadoop.fs.s3a.aws.credentials.provider: com.amazonaws.auth.EnvironmentVariableCredentialsProvider - To create a secret with AWS credentials, you will need to:
a. Navigate to the Secrets section under Workloads. Click “Create”, then select “key/value secret” and add the credentials details.
b. Add the following environment variable to the driver configuration. Apply the same variable for the executor configuration as well.
1 2 3 4 5 6 7 8 9 10 11 12 |
driver: env: - name: AWS_ACCESS_KEY_ID valueFrom: secretKeyRef: name: <your-secrets-name> key: AWS_ACCESS_KEY_ID - name: AWS_SECRET_ACCESS_KEY valueFrom: secretKeyRef: name: <your-secrets-name> key: AWS_SECRET_ACCESS_KEY |
Web Identity Token Credentials Provider:
Web Identity Token Credentials Providers can be used to authenticate via service account that has the appropriate IAM role and permissions. The role must include the OpenID Connect (OIDC) provider of your OpenShift cluster.
Please refer to the service account section if you want to use a custom service account.
- The following configuration needs to be added:
12spark.hadoop.fs.s3a.region: 'us-west-2'spark.hadoop.fs.s3a.aws.credentials.provider: 'com.amazonaws.auth.WebIdentityTokenCredentialsProvider' - Create an IAM OIDC provider for your cluster.
a) Use the OC command to retrieve the ISSUER_URL from the OpenShift authentication configuration. This command uses jq to parse the JSON output and extract the serviceAccountIssuer,1oc get authentication.config.openshift.io cluster -ojson | jq -r .spec.serviceAccountIssuerb) Use OpenSSL to connect to the ISSUER_URL and extract the thumbprint. This command will connect to the URL, retrieve the certificate, and extract the fingerprint,
1echo | openssl s_client -connect oidc.op1.openshiftapps.com:443 2>/dev/null | openssl x509 -fingerprint -noout | cut -d= -f2 | sed 's/://g'c) In most cases, an OIDC provider is already present, such as in a ROSA cluster or a classic ROSA cluster with STS enabled at the time of cluster creation. If an IAM OIDC provider does not exist for your cluster, follow the steps below.
1aws iam create-open-id-connect-provider --url <ISSUER URL FROM STEP a> --client-id-list sts.amazonaws.com --thumbprint-list <THUMBPRINT FROM STEP b> - Create an IAM role with a trust policy with necessary bucket permissions, and associate it with the OpenShift service account by adding annotations. Refer to the AWS Service Account documentation to create it.
a) Navigate to the service account under user management, select the service account, and click edit annotation.
b) Enter the key and value in the corresponding fields. Then press “Save”. - Configure your Spark application to use the custom service account. Ensure the service account has the necessary roles to access the pods. Add the following into the configuration file.
12345driver:serviceAccount: <service account name>env:- name: AWS_REGIONvalue: <aws-region>