Skip to main content

Node Types

This documentation explains how to configure node types for Spark drivers and executors. We also cover the internal implementation details and providing guidelines for optimal configuration.

Overview

To view available Node Types, go to the Settings menu and click on the Node Types tab. Here you can re-configure an existing node or create a new one.

Node types | IOMETENode types | IOMETE

Create new node

When creating new node template, you'll see the following options for configuration.

FieldDescription
NameShould be unique.
DescriptionThis is what user will see on node type selection.
CPU in millicoresCPU limit for Spark pod. For 1 CPU, enter 1000.
MemoryMemory limit for Spark pod. Memory size in MiB. For 1 GiB, enter 1024 MiB.
Available forDriver and Executor (At least one option should be checked).
Resource tagsTags are custom name/value pairs that you can assign to IOMETE resources. These tags enable you to categorize and organize your resources effectively.
Node type create | IOMETENode type create | IOMETE

After creating node template you can utilize them in Lakehouses, Spark Connect Clusters and Spark Jobs. Let's navigate to the Lakehouse create page. Here, you'l find options for Node driver and Node executor, which include node type selections. You can also view the Total CPU and Memory based on the selected executor and executor count.

Lakehouse Node type select | IOMETELakehouse Node type select | IOMETE

Internal Implementation

When configuring a Spark Resource (such as Lakehouse, Spark Job, or Connect), IOMETE automatically sets the following Spark configuration parameters based on the selected node type. The system calculates these parameters for both the driver and executor using a combination of the node's CPU and internal factors:

Default Spark Configuration Parameters (Driver)

Property
Default
Description
spark.driver.coresCPU * 1.5The number of Spark cores is set to the node's CPU multiplied by the core factor (1.5).
spark.kubernetes.driver.request.coresCPUThe Kubernetes CPU request is set to match the exact CPU of the selected node.
spark.kubernetes.driver.limit.coresCPU * 1.0The Kubernetes CPU limit is set to the node’s CPU multiplied by the limit factor (1.0).

Default Spark Configuration Parameters (Executors)

Property
Default
Description
spark.executor.coresCPU * 1.5The number of Spark cores is set to the node's CPU multiplied by the core factor (1.5).
spark.kubernetes.executor.request.coresCPUThe Kubernetes CPU request is set to match the exact CPU of the selected node.
spark.kubernetes.executor.limit.coresCPU * 1.0The Kubernetes CPU limit is set to the node’s CPU multiplied by the limit factor (1.0).

Customizing Spark Configuration in IOMETE

IOMETE provides flexibility to customize and manage these configuration parameters for Spark resources globally or per job template, allowing users to modify the core and limit factors, or even set static values if needed.

Globally Modifying Core and Limit Factors

The following properties can be used to modify the default multiply factors globally, or for specific job templates or lakehouses:

PropertyDefault ValueDescription
spark.iomete.driver.core.factor1.5Factor to multiply the node's CPU to calculate the number of Spark driver cores.
spark.iomete.driver.core.limit.factor1.0Factor to multiply the node's CPU to set the Kubernetes CPU limit for the driver.
spark.iomete.executor.core.factor1.5Factor to multiply the node's CPU to calculate the number of Spark executor cores.
spark.iomete.executor.core.limit.factor1.0Factor to multiply the node's CPU to set the Kubernetes CPU limit for executors.

Example
For a node with 2 CPUs, if the core limit factors are set to 2.0, the final value for spark.kubernetes.driver.core.limit or spark.kubernetes.executor.core.limit will be 4 CPUs.

Overriding with Static Values

If you want to set static values for cores and limits, you can override the default factors by using the following properties:

PropertyExampleDescription
spark.iomete.driver.cores4Static value to set the number of Spark driver cores, overriding the factor-based calculation.
spark.iomete.driver.core.limit1000mStatic value to set the Kubernetes CPU limit for the Spark driver.
spark.iomete.executor.cores6Static value to set the number of Spark executor cores, overriding the factor-based calculation.
spark.iomete.executor.core.limit2000mStatic value to set the Kubernetes CPU limit for Spark executors.

Priority of Configuration

The final configuration is determined based on the following priority:

  1. Override values: If static override values are provided (e.g., spark.iomete.driver.cores), they take the highest priority.
  2. Factor properties: If override values are not set, IOMETE will use the custom factors (e.g., spark.iomete.driver.core.factor).
  3. Default values: If neither overrides nor factors are provided, IOMETE will use the default factors (1.5 for cores and 1.0 for limits).

Performance Benchmarking

Using less than 1 CPU, such as 300m, can significantly slow down Spark jobs due to throttling. Benchmarking results will demonstrate performance variations with different CPU configurations.

Benchmark performed on TPC-DS 1GB dataset. For demonstration purposes, only TPC-DS queries 1, 2, 3, 4 are executed. Results are displayed in seconds.

Benchmark | IOMETEBenchmark | IOMETE
  • For efficient performance, it is recommended to use at least 1 vCPU for both driver and executor nodes.
  • Higher CPU allocations for executors (e.g., 2 vCPUs, 4 vCPUs) will provide better performance, especially for compute-intensive tasks.