Spark Connect Clusters
In Apache Spark > 3.4, Spark Connect introduced a decoupled client-server architecture that allows remote connectivity to Spark clusters using the DataFrame API and unresolved logical plans as the protocol. The separation between client and server allows Spark and its open ecosystem to be leveraged from everywhere. It can be embedded in modern data applications, in IDEs, Notebooks and programming languages.
Create a new Connect Clusterβ
1. Go to the Spark Connect and click the Create
button
2. Give the new cluster a name under Name.
3. SelectΒ driver, under the Node driver section. Learn how to create a custom Node type.
The Node driver runs continuously, managing executors/workers, and connections until manually stopped. If it stops, no new connections to the cluster can be made. It acts as the control center, orchestrating all tasks.
4. Select the type of executor from the Node executor section and enter the number of executors in the Executor count section. Below these inputs, you'll see a real-time preview of Total CPU and memory. This helps you choose the right number and type of executors, ensuring you allocate enough resources for your workload. Read more about spark executors.
The Node Executor is responsible for executing queries and processing data. It scales automatically based on the auto-suspend parameter, ensuring efficient resource usage.
5. Select volume, under the Volume section. Read more about volumes.
6. Set Auto suspend under Auto suspend section. By clicking checkbox in the left side we can disabled Auto suspend functionality.
Executors will be scaled down after the specified time of inactivity. Executors will be scaled up automatically on demand (Scale up time around 10-15 seconds). It is recommended to keep auto-suspend on to minimize monthly costs.
7. Resource tags are custom key/value pairs designed to help categorize and organize IOMETE resources. They provide a flexible and convenient way to manage resources by associating them with meaningful metadata.
π ππ Tadaa! The newly created test-cluster details view is shown.
Cluster detailsβ
The Cluster Detail View in our application provides a comprehensive overview and management options for a specific cluster instance.
Navigation buttonsβ
The header of the Detail View includes the following elements:
- Spark UI link: This link redirects users to the Spark UI for real-time metrics and logs.
- Configure: Opens the configuration settings for the cluster, enabling users to modify its parameters and settings.
- Start: Starts the cluster instance if it is not already running. If the instance is already running, this button will be replaced with
Restart
andTerminate
. - Restart: Restarts the cluster instance to apply new configurations or resolve issues by stopping and then starting it.
- Terminate: This button stops the cluster instance and terminates all associated processes and jobs. You can start the instance again if needed.
- Delete: Permanently deletes the cluster instance.
General informationsβ
Under the header, there is a card displaying the following information about the Spark Connect.
Auto suspendβ
By default, scaling up usually takes 1 to 2 minutes, depending on various factors like the cloud provider's response time and resource availability.
In cloud environments, you can utilize IOMETE to establish a hot pool of preconfigured resources. This markedly accelerates the scaling process, reducing the scale-up time to a mere 10 to 15 seconds. Contact support to learn more about this feature.