Skip to main content

Z-ORDER sorting during compaction

· One min read
Aytan Jalilova

Apache Iceberg supports Z-ORDER sorting during compaction (rewrite_data_files), but not during normal inserts or as a create table configuration.

To force the whole dataset to be ordered using Z-ORDER, you can use the following steps:

  1. Set a default WRITE ORDERED BY for the table.

    ALTER TABLE db.table_name WRITE ORDERED BY (col1, col2);
  2. Perform a rewrite_data_files operation with the sort strategy specified and rewrite-all option set to true.

    CALL spark_catalog.system.rewrite_data_files(
    table => 'db.table_name',
    strategy => 'sort',
    sort_order => 'zorder(col1, col2)',
    options => map('rewrite-all', 'true')
    );

Discovering the data lakehouse platform?

Try Sandbox

Additional notes

  • It is important to note that rewriting the whole dataset can be a very expensive operation, so it is important to only do this when necessary.
  • It is also worth noting that there is an open issue on GitHub to add support for Z-ORDER sorting during normal inserts.