Virtual Lakehouses
A virtual lakehouse is a cluster of compute resources that provide the required resources, such as CPU, memory to perform the querying processing. Table data files are stored in cloud data storage (S3) as a shared data storage that allows multiple virtual lakehouse clusters to share the same data while isolating compute. IOMETE uses Apache Spark as a data lakehouse query engine with ACID support
In production environments, it is often required to isolate workloads, for example, to avoid the overhead of batch ETL jobs on ad-hoc analytical queries. Since data is decoupled and shared from virtual lakehouse, it enables the creation of multiple lakehouse clusters to isolate the workloads and turn on/off clusters based on requirements to save costs. Cluster size can be defined based on requirements and workloads.
Create a new Lakehouse
1. Go to the Lakehouses and click the Create New button

2. Give the new lakehouse a name under Name.

3. Under the Type section, choose type.

Type defines the maximum number of executors/workers that spark could scale. Read more about spark executors here.
4. Select driver, under the Driver section.

Spark driver is running all the time until lakehouse stopped manually. Driver is responsible for managing executors/workers and connections. If stopped, no connections could be established to the lakehouse.
5. Select executor, under the Executor section.

Executors basically are responsible for executing the queries. They will be scaled up and down automatically based on the auto-scale parameter. Keep auto-scale on to minimize lakehouse costs.
6. Set auto scale, under Auto scale section.

Executors will be scaled down after the specified time of inactivity. Executors will be scaled up automatically on demand (Scale up time around 10-15 seconds). It is recommended to keep auto-scale on to minimize monthly costs.
By clicking checkbox in the left side we can disabled auto scale functionality.

7. Click the Create button after adding a description to the optional description field.

🎉 🎉🎉 Tadaa! The newly created test-lakehouse details view is shown.

Navigation buttons
- Spark UI - this button will take us Spark Jobs information.
- Edit - this button will take us to the editing form.
- Terminate / Start - buttons for the lakehouse's start and stop.
Lakehouse's general information.
Lakehouse statuses
infoMore details about lakehouse statuses click here
Connections details In this section we may observe various connections details in this part. For instance, Python, JDBC, and others connections.
Audit logs In this section we may check your lakehouse's start/stop logs.
Delete - this button makes it simple to remove Lakehouse.
Lakehouse Statuses
We need to understand the cluster components to understand the lakehouse cluster's statuses. Lakehouse cluster comprises of driver and executors.
- Driver: is the gateway to accept and keep connections, plan executions, and orchestrate executors.
- Executors: are the components that do the actual processing.
Statuses
A lakehouse cluster can be one of the following statuses:
Status | Description |
---|---|
Stopped | Cluster is completely turned off. Driver: Not running Executors: No executors running Accepting connections: No Cost charging: No |
Pending | Cluster is newly started manually and waiting for resources for the driver Driver: Not running. Waiting for the resources Executors: No executors running Accepting connections: No Cost charging: No |
Suspended | This status happens when auto-scale is enabled on the cluster. When the cluster stays without any workload, it scales down and turns off the executors to prevent charging costs. Only the driver is running. When the driver gets a query, it starts executors to handle the processing Driver: Running Executors: No executors running Accepting connections: Yes Cost charging: Only for driver |
Scaling-up | This status happens when auto-scale is enabled on the cluster. The cluster decides to scale up executors based on the workload needs up to the maximum of the cluster size. Driver: Running Executors: 0 or some already running, and new executors are being started Accepting connections: Yes Cost charging: For driver and already running executors |
Running | Cluster is running state Driver: Running Executors: Running Accepting connections: Yes Cost charging: For driver and running executors |