Skip to main content

Introduction

Welcome to IOMETE's documentation! 👋

Here you'll find comprehensive guides and documentation to get you up and running with IOMETE quickly and easily

The IOMETE lakehouse platform

The platform

  • Modern lakehouse built on top of Apache Iceberg and Apache Spark.
  • Includes:
    • Lakehouse
    • Spark Jobs
    • SQL editor
    • Advanced data catalog
    • Jupyter Notebook

IOMETE's extreme value proposition

IOMETE offers a comprehensive and versatile data lakehouse platform, merging the capabilities of data lakes and data warehouses into a single, integrated solution. This platform is suitable for both cloud and on-premise deployments, providing a cloud-like data analytics experience in on-premise environments:

Platform features

IOMETE's blend of advanced features and flexible architecture makes it suitable for a wide range of data management, analytics, and visualization tasks. It effectively consolidates data lakes and warehouses, streamlines data preparation, and enables real-time analytics, machine learning, predictive modeling, data visualization, and time-series analytics.

Object Storage: A durable and available object storage system, MinIO is configured for optimal performance, considering factors like CPU, RAM, disk speed, and network specifications.

Compute Cluster with Kubernetes: Kubernetes orchestrates IOMETE services, including Lakehouse and Spark job clusters. These clusters are elastically scalable, automatically adjusting based on load to maximize resource utilization.

Lakehouse Platform: Powered by Apache Spark and Apache Iceberg, it offers a robust SQL interface for data exploration and analytics. Features include ACID-compliant operations, data versioning, and time travel etc.

Query Federation Engine: Allows querying data from multiple sources, including relational and NoSQL databases, and flat files, without complex ETL pipelines.

Spark Job Service: Facilitates running and monitoring Spark Jobs, including Spark Streaming for real-time data processing.

Notebook Service: Provides an environment for running ad-hoc queries with results streamed back from the IOMETE cluster.

Data Catalog: A centralized repository for dataset metadata, enhancing collaboration and resource sharing.

Central Data Access Control: Manages access control policies across all datasets, including selective masking for sensitive fields. Built on Apache Ranger

Built-in SQL Editor: Features syntax highlighting and auto-completion for writing SQL queries.

Integration with External Tools: Supports integration with tools like Apache Airflow and BI tools, enhancing extensibility and existing infrastructure compatibility.

BI Tool Integration: Works seamlessly with major BI tools like Tableau, PowerBI, and Looker for deep BI experiences.

DBT Plugin: Facilitates SQL-based data pipelines, enhancing data integration workflows.

Data Ingestion & ETL: Integrates with data ingestion tools like Airbyte and Singer for efficient data import.

Orchestration: Integrates with Apache Airflow for advanced orchestration capabilities.

Data Catalog & Lineage: Offers robust data cataloging and integrates with platforms like Amundsen, Atlas, Talend, and Informatica for metadata and lineage collection.

Exceptional Performance: Enhanced with optimizations and modern technologies for high performance, as evidenced by benchmark results showing significant advantages over competitors

Security Features: Includes robust authentication, authorization, encryption, and advanced monitoring/logging for data protection.

Cloud-Native Architecture: Offers cloud-like elasticity, scalability, and ease of deployment, both in the cloud and on-premise.