Apache Iceberg
On this page we explore Apache Iceberg, including its main benefits and how IOMETE leverages this modern open source table format.
Note: An extensive version of the information below can be found on the website of the Apache Iceberg organization.
What is Apache Iceberg?
Iceberg is a high-performance open table format for huge analytic datasets. Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for engines like Spark, Trino, Flink, Presto, Hive and Impala to safely work with the same tables.
Open Standard
Iceberg has been designed and developed to be an open community standard to ensure compatibility across languages and implementations. Apache Iceberg is open source, and is developed at the Apache Software Foundation.
IOMETE and Apache Iceberg
IOMETE is a fully-managed (ready to use, batteries included) data platform. IOMETE optimizes clustering, compaction, and access control to Iceberg tables. The core of the IOMETE platform is a serverless lakehouse that leverages Apache Iceberg as its core table format.
Apache Iceberg Benefits
Iceberg avoids unpleasant surprises. Schema evolution works and won’t inadvertently un-delete data. Users don’t need to know about partitioning to get fast queries.
- Schema evolution supports add, drop, update, or rename, and has no side effects.
- Hidden partitioning prevents user mistakes that cause silently incorrect results or extremely slow queries.
- Partition layout evolution can update the layout of a table as data volume or query patterns change.
- Time Travel enables reproducible queries that use exactly the same table snapshot, or lets users easily examine changes.
Reliability and Performance
Iceberg was built for huge tables. Iceberg is used in production where a single table can contain tens of petabytes of data and even these huge tables can be read without a distributed SQL engine.
- Scan planning is fast - A distributed SQL engine isn’t needed to read a table or find files.
- Advanced filtering - Data files are pruned with partition and column-level stats, using table metadata.
- Iceberg was designed to solve correctness problems in eventually-consistent cloud object stores.
- Works with any cloud store and reduces NN congestion when in HDFS, by avoiding listing and renames.
- Multiple concurrent writers use optimistic concurrency and will retry to ensure that compatible updates succeed, even when writes conflict.
- Serializable isolation - table changes are atomic and readers never see partial or uncommitted changes.
Related Docs
- IOMETE Docs | Getting started with managed Apache Iceberg on IOMETE.
- IOMETE Docs | DDL for Iceberg Tables: Create, Alter, Manage Operations on IOMETE.
- IOMETE Docs | Iceberg Table Queries: Time Travel, Snapshots, and Metadata on IOMETE.
- IOMETE Docs | Iceberg tables INSERT, MERGE, and DELETE Operations on IOMETE.
- IOMETE Docs | Iceberg Spark Procedures for Snapshot and Metadata Management on IOMETE.
- IOMETE Docs | Time travel in Apache Iceberg with SQL examples on IOMETE.
- IOMETE Docs | Apache Iceberg Maintenance on IOMETE.
- IOMETE Docs | Download the IOMETE Apache Iceberg Cheat Sheet (PDF).
- IOMETE Blog | IOMETE is now listed as an Apache Iceberg vendor.
- IOMETE Blog | Apache Iceberg vs Delta Lake case study.