Skip to main content

Volumes

IOMETE provides flexibility to customize the Kubernetes Volume types used in Spark workloads. Volumes are a critical part of Spark infrastructure—they power shuffle operations, RDD/DataFrame caching, disk spill, and more.

Because Spark heavily relies on volumes for performance and stability, the speed and reliability of Spark applications or compute clusters are only as strong as the underlying volume configuration.


Volume list

To see the available volumes, go to the Settings menu and click on the Volumes tab.

Volumes | IOMETEVolumes | IOMETE

Create volume

Platform admins can create new type of volumes. To create a new volume go to admin console, click the button.

Volumes | IOMETEVolumes | IOMETE

In the Create Volume screen, you can choose from four volume types:

  • Host Path: Mounts a directory from the host (node) machine’s filesystem directly into the pod.
  • On Demand PVC: Dynamically creates and mounts a persistent volume claim for each pod.
  • TMPFS: Provides a temporary, memory-backed filesystem inside the container (fast but ephemeral).
  • EmptyDir: Creates a temporary directory on the host (node) machine’s filesystem, which lasts only for the lifetime of the pod.

Host Path

A HostPath volume in Kubernetes is a special type of volume that mounts a file or directory from the host node’s filesystem directly into a pod.

However, HostPath comes with important limitations and risks:

  • No storage limits: You cannot enforce quotas on how much space the pod consumes.
  • No automatic cleanup: If the application (e.g., a compute cluster or Spark job) terminates without removing its data, those files remain on the node, which can eventually lead to node disk-full issues.
  • Security concerns: Since HostPath exposes parts of the host’s filesystem, misconfiguration or misuse can compromise the node’s integrity.

Because of these drawbacks, for most workloads, it is recommended to use safer alternatives such as EmptyDir, PersistentVolumeClaims (PVCs), or tmpfs.

Configuration:

  • Provide directory on host(node) to be mounted to the pods.
    Host Path create | IOMETEHost Path create | IOMETE
  • Resource tags are custom name/value pairs that you can assign to IOMETE resources. These tags enable you to categorize and organize your resources effectively.

On Demand PVC

An On-Demand PersistentVolumeClaim (PVC) in Kubernetes is a way for pods to request persistent storage that is automatically provisioned by a StorageClass. Instead of pre-creating volumes, the cluster dynamically allocates the required storage from the underlying infrastructure (e.g., local disks, cloud block storage, or CSI drivers) when the PVC is created.

On Demand PVC offers several advantages:

  • Automatic provisioning: Volumes are created and bound to PVCs on demand, removing the need for manual PersistentVolume(PV) management.
  • Storage guarantees: You can define the requested size and access mode, ensuring predictable resource allocation.
  • Lifecycle management: Depending on the reclaimPolicy of the StorageClass, volumes can be automatically deleted or retained when the PVC is removed.
  • Isolation & safety: Unlike HostPath, each PVC-backed volume is isolated, and quotas can prevent uncontrolled disk usage.

Configuration:

  • Storage class name: Name for storage class. Available list of storage classes can be retrieved with kubectl get storageclass command.

  • Max size: Maximum storage capacity requested in a PVC per executor.

    On Demand PVC create | IOMETEOn Demand PVC create | IOMETE

TMPFS

TMPFS volumes stores data in the node’s RAM instead of on disk. This means the volume is created when a pod starts and exists only while the pod is running. All data is lost when the pod is removed.

Key characteristics:

  • Memory-backed: Data is stored in RAM (tmpfs), offering very fast read/write performance compared to disk.
  • Ephemeral: The volume and its contents are automatically deleted when the pod stops or restarts.
  • Space limits: By default, usage is limited by the pod’s memory requests/limits, preventing it from consuming all node memory.
  • Use cases: Ideal for temporary, high-speed scratch space such as caches, buffers, or intermediate computation results.

For workloads that need fast but ephemeral storage, tmpfs-backed emptyDir provides a safe and efficient option without touching the host’s filesystem or requiring persistent storage.

⚠️ Warning: Since TMPFS uses the container's memory, enabling it for Spark's local storage will increase the memory usage of your Spark pods. To accommodate this, you may need to adjust the memory overhead allocated to Spark's driver and executor pods by configuring spark.{driver,executor}.memoryOverheadFactor. This ensures that your Spark workloads have enough memory to handle both their regular tasks and the additional overhead introduced by using TMPFS.

Configuration:

  • Max size: The maximum storage limit for the volume, specified in units such as GiB.
    This value acts as an upper bound, not a reservation — the actual available storage depends on the node’s memory capacity.
    On Demand PVC create | IOMETEOn Demand PVC create | IOMETE

EmptyDir

EmptyDir is a temporary storage space that Kubernetes creates when a pod is assigned to a node. The volume initially starts empty and is deleted automatically when the pod is removed. By default, data is stored on the node’s backing storage (e.g., local SSD, HDD, or node filesystem).

Key characteristics:

  • Ephemeral: The data lives only for the lifetime of the pod. Once the pod is deleted or rescheduled, all contents are lost.
  • Disk-backed: Data is written to the node’s disk, making it slower than TMPFS but suitable for larger temporary data.
  • Storage capacity: The size is limited by the available disk space on the node. Per-pod usage limit can be defined.
  • Use cases: Good for temporary files, caching, scratch space, or intermediate results that do not need to survive beyond the pod’s lifecycle.

⚠️ Warning: Since EmptyDir volumes use the node’s local storage, excessive consumption can fill the node’s disk and cause other workloads to fail. Always monitor ephemeral storage usage or configure pod-level ephemeral storage limits.

Configuration:

  • Max size: The maximum storage limit for the volume, specified in units such as GiB.
    This value acts as an upper bound, not a reservation — the actual available storage depends on the node’s capacity.
    If omitted, the volume can grow without a defined limit (constrained only by the node’s available storage).

    On Demand PVC create | IOMETEOn Demand PVC create | IOMETE

Volume Types Comparison

Volume TypeStorageAutomatic CleanupPerf.Pod Usage LimitRisks
HostPathNode FSFastSecurity, disk leaks
EmptyDirNode FSFastNode storage full
TMPFSRAMFastestPod OOM
On-Demand PVC StorageClassVaries (usually slower than others)Slow/stuck provisioning under high load

Using volumes

You can utilize node types in Lakehouses, Spark Connect Clusters and Spark Jobs. Let's navigate to the Lakehouse create page. Here, you'l find options for Volume which include volume selections.

Lakehouse Volume select | IOMETELakehouse Volume select | IOMETE

Delete volume

To delete a volume, locate it in the volumes list, click the "Edit" button, and then click the button below the inputs. Afterward, you'll receive a confirmation message; click "Yes, delete" to confirm the deletion.

Node type delete | IOMETENode type delete | IOMETE

Resources: