Spark Connect
In Apache Spark > 3.4, Spark Connect introduced a decoupled client-server architecture that allows remote connectivity to Spark clusters using the DataFrame API and unresolved logical plans as the protocol. The separation between client and server allows Spark and its open ecosystem to be leveraged from everywhere. It can be embedded in modern data applications, in IDEs, Notebooks and programming languages.
Create a new Cluster
1. Go to the Spark Connect and click the button
![Clusters | IOMETE](/resources/assets/images/clusters-fc20b2a1a448bde5eccf2d78ec341c4d.png)
![Clusters | IOMETE](/resources/assets/images/clusters-dark-c4bae3ff3dc1f6f50555991f8d017434.png)
2. Give the new cluster a name under Name.
![Create Spark Cluster | IOMETE](/resources/assets/images/cluster-create-ff233d8aeea003d820d5110a188e21b7.png)
![Create Spark Cluster | IOMETE](/resources/assets/images/cluster-create-dark-6ede6389384901263c41f3b49b0eaf29.png)
3. Select driver, under the Node driver section. Read more about node types.
![Cluster Node Driver | IOMETE](/resources/assets/images/cluster-driver-select-93959f991e5d56b644be66f1245e9e90.png)
![Cluster Node Driver | IOMETE](/resources/assets/images/cluster-driver-select-dark-5fd52d4abe24d4a6aefe49bebd25d95a.png)
Spark driver is running all the time until cluster stopped manually. Driver is responsible for managing executors/workers and connections. If stopped, no connections could be established to the cluster.
4. Select executor, under the Node executor section. Read more about spark executors.
![Cluster executor select | IOMETE](/resources/assets/images/cluster-executor-select-4a356df15eeada0e3e3a127b0e8cb42e.png)
![Cluster executor select | IOMETE](/resources/assets/images/cluster-executor-select-dark-23794c6e4979ce9cf498b0a465669cb2.png)
5. Input executor count, under the Executor count section.
Below these inputs, a real-time preview of Total CPU and Total Memory is provided. This information helps you make informed decisions about the selection of Node Executors and the number of Executors. It ensures that you allocate sufficient resources to meet the demands of your workload.
![Cluster executor | IOMETE](/resources/assets/images/cluster-executor-621cb760d10645b276e42bcd6e4f517d.png)
![Cluster executor | IOMETE](/resources/assets/images/cluster-executor-dark-5934ce784f08dedece261935968cead2.png)
Executors basically are responsible for executing the queries. They will be scaled up and down automatically based on the auto-suspend parameter.
Keep auto-suspend on to minimize clusters costs.
6. Select volume, under the Volume section. Read more about volumes.
![Cluster volume select | IOMETE](/resources/assets/images/lakehouse-volume-select-e81ec4e5d812a931f89cba4fadc2c563.png)
7. Set Auto suspend under Auto suspend section.
By clicking checkbox in the left side we can disabled Auto suspend functionality.
![Cluster auto suspend | IOMETE](/resources/assets/images/cluster-auto-suspend-724689155bcd8385d2b34cd17e7e79c8.png)
![Cluster auto suspend | IOMETE](/resources/assets/images/cluster-auto-suspend-dark-f3b22e79c58f6152deac77258bc98a51.png)
Executors will be scaled down after the specified time of inactivity. Executors will be scaled up automatically on demand (Scale up time around 10-15 seconds). It is recommended to keep auto-suspend on to minimize monthly costs.
8. Click the button after adding a description to the optional description field.
🎉 🎉🎉 Tadaa! The newly created test-cluster details view is shown.
Cluster details
![Cluster Details | IOMETE](/resources/assets/images/cluster-details-6dae50fe440d53ad644d0e0bff44239c.png)
![Cluster Details | IOMETE](/resources/assets/images/cluster-details-dark-436b3610a50b30da79fa743eebbdfe9c.png)
-
Navigation buttons - Spark UI - this button will take us Spark Jobs information. - Edit - this button will take us to the editing form. - Terminate / Start / Restart - buttons for the cluster's terminate, start and restart.
-
For detailed information on statuses, please refer to the Lakehouse Status documentation section.
-
General information.
-
In the Connections section, you can copy the endpoint for the connection.
-
Events In this section we may check your cluster's Start/Terminate events.
-
Delete - this button makes it simple to delete Cluster.