Data Lakehouse Features

Platform

No vendor lock-in

Built on open standards

Easy setup

Unlimited compute

Lakehouse

Clusters

Cluster Size

Complete SQL Data Lakehouse

Decoupled Compute & Storage

Auto-Scaling

Unlimited Time Travel

Full ANSI SQL

User Defined Functions (UDFs)

Query History and Profile

Unlimited

Lakehouse - Query Federation

Native support for structured and semi-structured file formats

Analyze data stored in RDBMS databases (e.g. MySQL, PostgreSQL)

Analyze data stored in NoSQL sources (e.g. MongoDB, Cassandra)

Lakehouse - Connectivity

Connecting from Java, Python, Go, Node.js

DBT Integration (data build tool)

BI Integrations - Tableau, Power BI, Metabase etc.

Notebooks Integration

Rest API

Spark Jobs

Number of Executors

Spark Jobs

Auto-Scaling

Advanced Metrics & Logs

Unlimited

SQL Editor

SQL Worksheets

Query History

Autocomplete and Syntax Highlighter

Data Governance - Data Access Control

Centralized Data ACL

RBAC (Role Based Access Control)

Data Masking

Data Governance - Data Catalog

Data Discovery

Data Monitoring & Observability

Automatic PII/PCI Detection

Lineage

Notebook service

Jupyter Notebook Service

Pyspark Kernel

Python Kernel with pre-installed Mongo libraries

Security and Compliance

SOC2

GDPR, HIPAA ready

Always-on Enterprise Grade Encryption (in transit and at rest)

Encryption with Customer-Managed Keys (BYOK)

Audit Log

Deployment options

On-Premise Deployment

Hybrid Deployment

Private Cloud Deployment

AWS Deployment

Azure Deployment

Google Cloud Deployment

Multi-Cloud Deployment

Multi-RegionDeployment

SLA/Authentication

SLA

Federated Authentication & SSO

Support

First Class Onboarding

Migration Assistance

Enterprise Level 24 x 7 Support

Dedicated Communication Channel (e.g. Slack, MS Teams, Discord)

Want to learn more about our product or book a demo?

Book a call

Complete SQL Data Lakehouse

The best performance SQL Data Lakehouse service with data warehouse functionality and data lake flexibility across all of your data. Run all SQL and BI applications at scale with up to 10x better price performance.

Use cases:

BI - business intelligence workloads that require handling a high volume of concurrent requests
Exploratory SQL
SQL ETL/ELT (for example, using DBT or custom backend applications)
Data science and ML - Prepare (clean/enrich/transform) training data, build feature stores, etc.
Collect and build centralized data lake for your whole organization

Decoupled compute & storage

Separation of compute and storage brings greater flexibility and cost savings to organizations planning to monetize their data using big data and advanced analytics

Compute

Leveraging modern, battle-tested open-source engines:

Apache Spark
Apache Iceberg (Storage Format)

With the following benefits

Ensuring the highest data reliability andintegrity through ACID transaction support
Ensuring the highest data reliability andintegrity through schema enforcement and governance
Multi-Cluster Lakehouse for workload isolations
Enjoy blazing-fast performance - Query petabytesof data in seconds
Enjoy the benefit of unlimited scaling backed byAWS compute and storage capacity
One source of truth - keep all your structured and unstructured data in one place

Storage

AWS S3 provides outstanding durability and unlimited scalability for the data. Your data is stored in your AWS S3 bucketin open standard Apache Parquet format. Data is compressed by 5-20x, whichtranslates into an equivalent monetary saving

Auto-Scaling

Separation of compute and storage brings greater flexibility and cost savings to organizations planning to monetize their data using big data and advanced analytics

Full ANSI SQL

Full ANSI SQL Compatible.

Run any ANSI SQL-compatible SQL code without any modification at IOMETE.

User Defined Functions (UDFs)

Allows to write custom functions in Scala/Java and use it in the SQL similar to built-in functions.

Spark Support 2 types of UDFs:

UDFs: User-Defined Functions (UDFs) are user-programmable routines that act on one row.
UDAFs: User-Defined Aggregate Functions (UDAFs) are user-programmable routines that act on multiple rows at once and return a single aggregated value as a result.

Example: Calling user-defined functions from SQL

Infinitely scalable and cost-effective

The IOMETE Lakehouse works off of data stored in cheap and scalable cloud data storage provided by the three major cloud vendors

This means IOMETE can handle all types of data (structured, semi-structured, and unstructured).
It can also handle everything from AI to BI.

Scalability and cost-effectiveness can mean higher margins, ROEs, and transparency on the benefits of technology spending.

No vendor lock-in

The future is open. Vendor lock-in and proprietary data formats slow down innovation. A single company cannot out-innovate a global community of innovators. Even the most regulated industries realize that open source is the best way to foster innovation, recruit and retain the best talent, and future-proof a technology platform.

How IOMETE Guarantees the no vendor lock-in

Open

Built on two world-known open-source technologies, Apache Spark and Apache Iceberg.

Open means attracting and retaining the best and brightest tech talent that wants to work on the latest open-source tools. This means there is a rich ecosystem of innovation to leverage as well as a rich pool of talent that knows how to leverage the technology

Use the same technology that other world-leading companies, like Apple, Netflix, Alibaba, Adobe, you name it, use!

Your data stays in your own cloud accounts in an open format

Your data stays in your own cloud accounts in an open format. Using ORC/Parquet, open standard file format. No proprietary file formats

If you go with a cloud data warehouse vendor, they will take your data and put it in their proprietary formats

Built on open standards

IOMETE is built on open standards to ensure your data is secure while being universally and easily accessible.

Built on two world-known open-source technologies, Apache Spark and Apache Iceberg. The technologies that are already adopted by world-leading companies, like Apple, Netflix, Alibaba, Adobe, you name it, use!

Clusters

Running multiple compute clusters on shared data allow you to isolate your compute workloads based on team or use cases without duplicating data.

IOMETE lakehouse uses decoupled compute & storage architecture, aka multi-cluster shared-data architecture.

Example 1.

Use different clusters for your BI and ETL workloads. So, those workloads will not affect each other’s performance while still accessing the same shared data.

Example 2

You define dedicated clusters for each team (sales, marketing, engineering, etc.) to give a budget of power. So, those team can do their work without affecting other team’s compute resources. As your organization grows, the system can scale horizontally.

Query History and Profile

You can easily see the history of your previously run queries and profile the query plan using an intuitive UI

IOMETE Docs

SQL Editor

Query History

You can easily see the history of your previously run queries in the SQL editor.

The query history is even keeping the result (30 days) of that query so you can compare what was the result of the query when you ran it yesterday.

Read more:

IOMETE Docs

SQL Editor

Cluster Size

Each lakehouse cluster has a defined size. It indicates the number of executors of the cluster. Executors are the compute nodes where the actual execution happens. Higher the cluster size (executors count) greater the clusters’ total performance.

Lakehouse clusters support the following sizes:

XSmall (1 executor)
Small (2 executors)
Medium (4 executors)
Large (8 executors)
XLarge (16 executors)
Gold (32 executors)
Platinum (64 executors)
Diamond (128 executors)

A single executor node configuration can be defined by the customer. The default one is a 16CPU/64GB machine. With this configuration, we can see that even a medium size cluster is quite powerful, which has 128CPU total power.

Read more:

IOMETE Docs

Virtual Lakehouses

Unlimited Time Travel

IOMETE Time Travel enables accessing historical data (i.e. data that has been changed or deleted) at any point with no time limitation (!). Think of it as your magical undo-button and it serves as a powerful tool for performing the following tasks:

Restoring data-related objects (tables, schemas, and databases) that might have been accidentally or intentionally deleted.
Duplicating and backing up data from key points in the past
Analyzing data usage/manipulation over specified periods of time.

Easy setup

Exceptional data infrastructure, set up in minutes

Native support for structured and semi-structured file formats (e.g., Parquet, CSV, JSON)

Read structured/semi-structured files from any location without moving data to IOMETE.

Key benefits

ETL-less access your data that lives anywhere!
Allow analyzing data without moving to IOMETE.
Automatically infer schema on table creation. No need for a manual declaration of all the columns and types

SQL: Reading CSV file

Read more:

Analyze data stored in RDBMS databases (e.g., MySQL, PostgreSQL)

Read data from other databases using *JDBC within IOMETE

Supported databases:

MySQL
PostgreSQL
Oracle
Microsoft SQL Server

The connection is bi-directional! You can also write back to your operational (e.g., MySQL) from the IOMETE. Do heavy lifting aggregations at IOMETE and write the aggregated result back to the operational database to be used by the operational services.

Key benefits

ETL-less access your data that lives anywhere!
Allow analyzing data without moving to IOMETE.
Automatically infer schema on table creation. No need for a manual declaration of all the columns and types
Fast read as IOMETE reads it distributedly.
Bi-directional connection. Read and write to the source database

SQL: MySQL Example

Analyze data stored in NoSQL sources (e.g., MongoDB, Cassandra)

Read data from other NoSQL databases **within IOMETE

Supported databases:

MongoDB
Cassandra
AWS DocumentDB (MongoDB Compatible)
ElasticSearch

Because of the complex nature of NoSQL databases, SQL support could be limited. But, there is always DataFrame API support, which can be used easily to write PySpark applications.

Key benefits

ETL-less access your data that lives anywhere!
Allow analyzing data without moving to IOMETE.
Automatically infer schema on table creation. No need for a manual declaration of all the columns and types
Fast read as IOMETE reads it distributedly.
Bi-directional connection. Read and write to the source database

PySpark: MongoDB Example

Connecting from Java, Python, Go, Node.js

Using provided JDBC, ODBC, and Python drivers, you can connect from your backend applications to the Lakehouse Clusters. Experience is the same experience as you have with other operational database connections like MySQL.

Supported languages

Java
Python
Scala
Kotlin
Node.js
Ruby
Go

Example (Python SQL Alchemy):

See more examples here:

DBT (data build tool)

IOMETE provides a dedicated DBT adapter that provides native integration with Apache Iceberg and the whole IOMETE ecosystem.

Read more here:

IOMETE Docs

Getting started with DBT

IOMETE Docs

IOMETE setup

BI Integrations - Tableau, Power BI, Metabase, Superset, etc.

IOMETE Provides BI integrations to all major BI platforms
‍
Currently supported BI platforms

Metabase
Tableau
Looker
Power BI
Apache Superset

Read more here:

IOMETE Docs

IOMETE - Tableau Integration

IOMETE Docs

Metabase Connection

IOMETE Docs

Power BI - Connecting to IOMETE

IOMETE Docs

Apache Superset Connection

Notebooks Integration

Connect from the notebook instances to IOMETE Lakehouse clusters using JDBC/Python drivers.

Max. number of executors

The maximum number of executors that can be running at the same time across all jobs.

SQL Worksheets

The worksheet is a document that allows you to save, search and share SQL statements.

Read more here:

IOMETE Docs

SQL Editor

Autocomplete and Syntax Highlighter

With our built-in query editor, you can easily query large data sets from our intuitive interface. With auto-complete and syntax highlighting, you would think that writing SQL couldn’t get any easier

Data Discovery

Organize data by creating data lables and tags on column, row, or cell level
Easily find data with advanced discovery and search functionality

Federated Authentication & SSO

Federated authentication enables your users to connect to IOMETE using secure SSO (single sign-on). With SSO enabled, your users authenticate through an external, SAML 2.0-compliant identity provider (IdP). Once authenticated by this IdP, users can securely initiate one or more sessions in IOMETE for the duration of their IdP session without having to log into IOMETE. They can choose to initiate their sessions from within the interface provided by the IdP or directly in IOMETE.

For example, in the IOMETE web interface, a user connects by clicking the IdP option on the login page:

If they have already been authenticated by the IdP, they are immediately granted access to IOMETE.
If they have not yet been authenticated by the IdP, they are taken to the IdP interface where they authenticate, after which they are granted access to IOMETE.