Skip to main content

SDK Usage for Spark Jobs

IOMETE's Python SDK provides a simple and convenient way to interact with our platform using Python code. With our SDK, you can easily integrate our platform's functionality into your Python projects and automate various tasks.

This document provides examples of how to use the iomete-sdk Python module to interact with the IOMETE platform. The examples demonstrate various operations such as creating, updating, and deleting jobs, as well as submitting job runs.

If you have any questions or feedback, please do not hesitate to reach out to our support team.

Installation

To follow these examples, make sure you have the following dependency installed:

pip install iomete-sdk

Prerequisites

Also, you should have a valid IOMETE API token and workspace ID. For this example, we are using environment variables to store these values.
In the Resources section below, you can find a link to the documentation on how to create an API token. Workspace ID and Endpoint (for cURL examples) can be found in the Workspace Settings page of the IOMETE platform.

Workspace ID | IOMETEWorkspace ID | IOMETE

SDK Usage Examples

Initialization

First, import the required libraries and create a SparkJobApiClient instance to interact with the Iomete platform:

import os
import uuid
from iomete_sdk import SparkJobApiClient

TEST_TOKEN = os.environ.get("TEST_TOKEN")
WORKSPACE_ID = "pceh7-816"

job_client = SparkJobApiClient(
workspace_id=WORKSPACE_ID,
api_key=TEST_TOKEN,
)

Creating a Job

Here's an example of how to create a new job:

job_payload = {
"name": "test-job",
"template": {
"sparkVersion": "3.2.1",
"mainApplicationFile": "local:///opt/spark/examples/jars/spark-examples_2.12-3.2.1-iomete.jar",
"mainClass": "org.apache.spark.examples.SparkPi",
"arguments": \["10"]
}
}
job_create_response = job_client.create_job(payload=job_payload)

print(job_create_response)

This will create a new job with a name, a specific Spark version, main application file, main class, and arguments.

tip

All our API's for spark job works with job_id and job_name. You can use either of them in the request URL to interact with the job. For example, both
.../api/v1/workspaces/pceh7-816/jobs/{job_id} and
.../api/v1/workspaces/pceh7-816/jobs/{job_name} will work fine. For simplicity we will use job_name = test-job in the examples.

Updating a Job

To update a job, you can use the update_job method. Here's an example of how to update a job to add a schedule:

cron_schedule = "0 0 */1 * *"

update_payload = job_create_response.copy()
update_payload["schedule"] = cron_schedule
job_update_response = job_client.update_job(job_id=job_create_response["id"], payload=update_payload)
print(job_update_response)

This will update the job created earlier with a schedule that runs it once every day.

Deleting a Job

To delete a job, use the delete_job_by_id method:

job_client.delete_job_by_id(job_id=job_create_response["id"])

This will delete the job created earlier using its ID.

Running Job

To submit a job run, use the submit_job_run method. This will submit a new run for the job created earlier:

run_response = job_client.submit_job_run(job_id=job_create_response["id"], payload={})
print(job_run_response)

If you need to override configuration, for example arguments, you can pass them in the payload.

run_response = job_client.submit_job_run(job_id=job_create_response["id"], payload={
"arguments": ["arg1", "arg2"]
})
print(job_run_response)

Getting Job Runs

To get the runs for a specific job, use the get_job_runs method:

job_runs = job_client.get_job_runs(job_id=job_create_response["id"])
print(job_runs)

This will return a list of runs for the job.

Cancelling a Job Run

To cancel a job run, use the cancel_job_run method:

job_client.cancel_job_run(job_id=job_create_response["id"], run_id=job_run_response["id"])

This will cancel the job run submitted earlier.

Resources