Skip to main content

Spark Connect

In Apache Spark > 3.4, Spark Connect introduced a decoupled client-server architecture that allows remote connectivity to Spark clusters using the DataFrame API and unresolved logical plans as the protocol. The separation between client and server allows Spark and its open ecosystem to be leveraged from everywhere. It can be embedded in modern data applications, in IDEs, Notebooks and programming languages.


Create a new Cluster

1. Go to the Spark Connect and click the button

Clusters | IOMETEClusters | IOMETE

2. Give the new cluster a name under Name.

Create Spark Cluster | IOMETECreate Spark Cluster | IOMETE

3. Select driver, under the Node driver section. Read more about node types.

Cluster Node Driver | IOMETECluster Node Driver | IOMETE
Node driver

Spark driver is running all the time until cluster stopped manually. Driver is responsible for managing executors/workers and connections. If stopped, no connections could be established to the cluster.


4. Select executor, under the Node executor section. Read more about spark executors.

Cluster executor select | IOMETECluster executor select | IOMETE

5. Input executor count, under the Executor count section.

Below these inputs, a real-time preview of Total CPU and Total Memory is provided. This information helps you make informed decisions about the selection of Node Executors and the number of Executors. It ensures that you allocate sufficient resources to meet the demands of your workload.

Cluster executor | IOMETECluster executor | IOMETE
Node executor

Executors basically are responsible for executing the queries. They will be scaled up and down automatically based on the auto-suspend parameter.

Keep auto-suspend on to minimize clusters costs.


6. Select volume, under the Volume section. Read more about volumes.

Cluster volume select | IOMETECluster volume select | IOMETE

7. Set Auto suspend under Auto suspend section.

By clicking checkbox in the left side we can disabled Auto suspend functionality.

Cluster auto suspend | IOMETECluster auto suspend | IOMETE
info

Executors will be scaled down after the specified time of inactivity. Executors will be scaled up automatically on demand (Scale up time around 10-15 seconds). It is recommended to keep auto-suspend on to minimize monthly costs.


8. Click the button after adding a description to the optional description field.

Cluster Description | IOMETECluster Description | IOMETE

🎉 🎉🎉 Tadaa! The newly created test-cluster details view is shown.

Cluster details

Cluster Details | IOMETECluster Details | IOMETE
  1. Navigation buttons - Spark UI - this button will take us Spark Jobs information. - Edit - this button will take us to the editing form. - Terminate / Start / Restart - buttons for the cluster's terminate, start and restart.

  2. For detailed information on statuses, please refer to the Lakehouse Status documentation section.

  3. General information.

  4. In the Connections section, you can copy the endpoint for the connection.

  5. Events In this section we may check your cluster's Start/Terminate events.

    Cluster events | IOMETECluster events | IOMETE
  6. Delete - this button makes it simple to delete Cluster.