Virtual Lakehouses

A virtual lakehouse is a cluster of compute resources that provide the required resources, such as CPU, memory to perform the querying processing. Table data files are stored in cloud data storage (S3) as a shared data storage that allows multiple virtual lakehouse clusters to share the same data while isolating compute. IOMETE uses Apache Spark as a data lakehouse query engine with ACID support

info

In production environments, it is often required to isolate workloads, for example, to avoid the overhead of batch ETL jobs on ad-hoc analytical queries. Since data is decoupled and shared from virtual lakehouse, it enables the creation of multiple lakehouse clusters to isolate the workloads and turn on/off clusters based on requirements to save costs. Cluster size can be defined based on requirements and workloads.

Create a new Lakehouse

1. Go to the Lakehouses and click the Create button

2. Give the new lakehouse a name under Name.

3. Select driver, under the Node driver section. Learn how to create a custom Node type.

Node driver

The Node driver runs continuously, managing executors/workers, and connections until manually stopped. If it stops, no new connections to the lakehouse can be made. It acts as the control center, orchestrating all tasks.

4. Select the type of executor from the Node executor section and enter the number of executors in the Executor count section. Below these inputs, you'll see a real-time preview of Total CPU and memory. This helps you choose the right number and type of executors, ensuring you allocate enough resources for your workload. Read more about spark executors.

Node executor

The Node Executor is responsible for executing queries and processing data. It scales automatically based on the auto-suspend parameter, ensuring efficient resource usage.

5. Select volume, under the Volume section. Read more about volumes.

6. Set Auto suspend under Auto suspend section. By clicking checkbox in the left side we can disabled Auto suspend functionality.

info

Executors will be scaled down after the specified time of inactivity. Executors will be scaled up automatically on demand (Scale up time around 10-15 seconds). It is recommended to keep auto-suspend on to minimize monthly costs.

7. Resource tags are custom key/value pairs designed to help categorize and organize IOMETE resources. They provide a flexible and convenient way to manage resources by associating them with meaningful metadata.

🎉 🎉🎉 Tadaa! The newly created test-lakehouse details view is shown.

Lakehouse details

The Lakehouse Detail View in our application provides a comprehensive overview and management options for a specific lakehouse instance.

The header of the Detail View includes the following elements:

Spark UI link: This link redirects users to the Spark UI for real-time metrics and logs.
Configure: Opens the configuration settings for the lakehouse, enabling users to modify its parameters and settings.
Start: Starts the lakehouse instance if it is not already running. If the instance is already running, this button will be replaced with Restart and Terminate.
Restart: Restarts the lakehouse instance to apply new configurations or resolve issues by stopping and then starting it.
Terminate: This button stops the lakehouse instance and terminates all associated processes and jobs. You can start the instance again if needed.
Delete: Permanently deletes the lakehouse instance.

General informations

Under the header, there is a card displaying the following information about the lakehouse.

Auto suspend

By default, scaling up usually takes 1 to 2 minutes, depending on various factors like the cloud provider's response time and resource availability.

Faster Scaling-Up

In cloud environments, you can utilize IOMETE to establish a hot pool of preconfigured resources. This markedly accelerates the scaling process, reducing the scale-up time to a mere 10 to 15 seconds. Contact support to learn more about this feature.

Driver state

Starting: The Driver is booting up.
Active: The Driver is running and ready to accept connections.
Stopped: The Driver is offline and not accepting any connections.
Failed: The Driver couldn't start. Contact support for assistance.

Cost Implications

You're only charged for the Driver when it's in the Active state.

Executor state

No running executors: There is no active executor. This happens when auto-suspend is configured. In this case, when there is no workload for a configured auto-suspend time, the cluster scales down to zero. Executors will scale up automatically based on demand.
Running: Executors are active and processing data.
Scaling: Executors are scheduled to start and waiting for resources to start.

Status examples

Running 1/4: One out of four Executors is active. The cluster scales down to save costs when the workload is light.
Running 1/4 Scaling 3/4: One Executor is active, and three are waiting to start due to an increase in workload.
Running 4/4: All Executors are active, and the cluster is at full capacity.

Cost Implications

You're only billed for Executors when they're in the Running state.

Connections

In this section we may observe various connections details in this part. IOMETE supports the following types of connections:

Spark logs

In this section we can see Spark logs.

Kubernetes events

Kubernetes events are only stored for a duration of one hour by default. After this period, events are automatically deleted from the system.

Resource activity

In this section we may check your lakehouse's Start/Terminate events.

Virtual Lakehouses

Create a new Lakehouse

Lakehouse details