Here you'll find comprehensive guides and documentation to get you up and running with IOMETE quickly and easily
The IOMETE lakehouse platform
- Modern lakehouse built on top of Apache Iceberg and Apache Spark.
- Spark Jobs
- SQL editor
- Advanced data catalog
- Jupyter Notebook
IOMETE's extreme value proposition
IOMETE offers a comprehensive and versatile data lakehouse platform, merging the capabilities of data lakes and data warehouses into a single, integrated solution. This platform is suitable for both cloud and on-premise deployments, providing a cloud-like data analytics experience in on-premise environments:
IOMETE's blend of advanced features and flexible architecture makes it suitable for a wide range of data management, analytics, and visualization tasks. It effectively consolidates data lakes and warehouses, streamlines data preparation, and enables real-time analytics, machine learning, predictive modeling, data visualization, and time-series analytics.
Object Storage: A durable and available object storage system, MinIO is configured for optimal performance, considering factors like CPU, RAM, disk speed, and network specifications.
Compute Cluster with Kubernetes: Kubernetes orchestrates IOMETE services, including Lakehouse and Spark job clusters. These clusters are elastically scalable, automatically adjusting based on load to maximize resource utilization.
Lakehouse Platform: Powered by Apache Spark and Apache Iceberg, it offers a robust SQL interface for data exploration and analytics. Features include ACID-compliant operations, data versioning, and time travel etc.
Query Federation Engine: Allows querying data from multiple sources, including relational and NoSQL databases, and flat files, without complex ETL pipelines.
Spark Job Service: Facilitates running and monitoring Spark Jobs, including Spark Streaming for real-time data processing.
Notebook Service: Provides an environment for running ad-hoc queries with results streamed back from the IOMETE cluster.
Data Catalog: A centralized repository for dataset metadata, enhancing collaboration and resource sharing.
Central Data Access Control: Manages access control policies across all datasets, including selective masking for sensitive fields. Built on Apache Ranger
Built-in SQL Editor: Features syntax highlighting and auto-completion for writing SQL queries.
Integration with External Tools: Supports integration with tools like Apache Airflow and BI tools, enhancing extensibility and existing infrastructure compatibility.
BI Tool Integration: Works seamlessly with major BI tools like Tableau, PowerBI, and Looker for deep BI experiences.
DBT Plugin: Facilitates SQL-based data pipelines, enhancing data integration workflows.
Data Ingestion & ETL: Integrates with data ingestion tools like Airbyte and Singer for efficient data import.
Orchestration: Integrates with Apache Airflow for advanced orchestration capabilities.
Data Catalog & Lineage: Offers robust data cataloging and integrates with platforms like Amundsen, Atlas, Talend, and Informatica for metadata and lineage collection.
Exceptional Performance: Enhanced with optimizations and modern technologies for high performance, as evidenced by benchmark results showing significant advantages over competitors
Security Features: Includes robust authentication, authorization, encryption, and advanced monitoring/logging for data protection.
Cloud-Native Architecture: Offers cloud-like elasticity, scalability, and ease of deployment, both in the cloud and on-premise.