Skip to main content

Advanced Configuration

Each maintenance operation has a set of properties that control how it runs, like target file sizes for compaction, snapshot retention windows, and sort strategies. The defaults are tuned to work well in most cases, so you usually don’t need to change them. But if a table has specific needs, you can override these at the catalog level (for all tables) or just for an individual table (overrides the catalog default for that table only).

Open the Advanced Settings panel on any operation card to see available properties. Use the Add Property dropdown to override a value, or click the button next to a property to revert it to its inherited default.

Operation card with Advanced Settings panel expanded showing property input fields | IOMETEOperation card with Advanced Settings panel expanded showing property input fields | IOMETE

Validation errors display inline below each field. If an error is inside a collapsed Advanced Settings panel, the panel expands automatically and the page scrolls to the first invalid field.

Operations

Rewrite Data Files

As data arrives in small batches (through streaming, frequent appends, or small updates), tables accumulate many tiny data files. Each file adds query-planning overhead, and reading dozens of small files is far less efficient than reading a few larger files.

This operation combines small files into larger ones, targeting an optimal file size. The result is shorter query-planning time and better I/O throughput.

PropertyTypePlatform DefaultDescription
StrategyStringbinpackMust be binpack or sort. Use sort to cluster data by sort order; requires a sort order to be defined on the table or set in Sort Order.
Sort OrderStringSort order for sort strategy, e.g. col1 DESC, col2 ASC. Falls back to the table's default sort order if not set.
Where ClauseStringOptional filter to restrict which files are compacted.
Target File Size BytesLong512 MBDesired output file size after compaction
Min File Size BytesLong128 MBFiles smaller than this are candidates for compaction.
Max File Size BytesLong1 GBFiles larger than this are excluded.
Min Input FilesInteger5Minimum number of files required to trigger compaction in a group.
Max Concurrent File Group RewritesInteger5Higher values increase parallelism but risk commit conflicts.
Delete File ThresholdInteger2,147,483,647Number of delete files that triggers compaction of a file group.
Delete Ratio ThresholdDouble0.3Ratio of delete entries to data rows that triggers compaction.
Partial Progress EnabledBooleanfalseCommits progress incrementally instead of all at once. Useful for very large tables.
Partial Progress Max CommitsInteger10Max number of incremental commits per run.
Partial Progress Max Failed CommitsInteger10Maximum number of commits that this rewrite is allowed to produce if partial progress is enabled.
Max File Group Size BytesLong100 GBLargest amount of data that should be rewritten in a single file group.
Remove Dangling DeletesBooleanfalseRemove delete files that no longer reference any data rows.

Rewrite Manifest Files

Each snapshot references data files through manifest files. Over time, the manifest count grows, and the query engine must read every one of them during planning. This operation consolidates manifests, reducing the metadata the query planner needs to scan.

PropertyTypeDefaultDescription
Use CachingBooleanfalseEnable Spark caching during the operation. This can increase executor memory usage.

Expire Snapshots

Every write, update, or delete creates a new Iceberg snapshot. Over time, hundreds of snapshots pile up, each retaining references to old data files. This bloats metadata and prevents old data files from being garbage collected.

Expiring snapshots removes those beyond a retention window, freeing the referenced data files for cleanup.

PropertyTypeDefaultDescription
Older ThanDuration5 daysSnapshots older than this value are eligible for removal.
Retain LastInteger1Minimum number of snapshots to keep, regardless of age.

Cleanup Orphan Files

Failed writes, aborted jobs, and certain table operations can leave files on storage that aren't referenced by any snapshot. These "orphan" files consume storage without serving any purpose. This operation scans the entire table storage location and removes unreferenced files.

Because this operation performs a full scan, it runs on its own cron schedule instead of triggering on every table change.

PropertyTypeDefaultDescription
Older ThanDuration3 daysRemove orphaned files older than this duration.
Cron ScheduleCron0 0 * * 7 (weekly, Sunday midnight)Schedule for orphan cleanup. Weekly or monthly is recommended. (Requires 5-field UNIX cron)

Orphan cleanup has several built-in safety mechanisms:

  • Minimum retention period: the backend enforces a minimum retention period of 3 days. If the configured Older Than value is below this minimum, the run fails with a non-retryable error.
  • Orphan percentage threshold: the operation aborts if orphan files exceed 30% of total files, to prevent against accidental mass deletion. When this happens, check for misconfiguration or data corruption first, then run remove_orphan_files manually via the SQL Editor.
  • Batched deletion: files are deleted in batches with a cooldown between each batch to avoid overwhelming storage.
  • Flink file exclusion: files matching an active Flink job's checkpoint pattern (flink.job-id.*) are automatically skipped, even if they appear unreferenced.

Execution Model

  • Rewrite Data Files and Rewrite Manifest Files run as Spark SQL jobs on your configured compute cluster.
  • Expire Snapshots and Cleanup Orphan Files run directly on the iom-maintenance service. No compute cluster is needed, but they consume service CPU and memory. If either operation becomes slow or causes service degradation, tune the service resources under Resource Defaults.

How Property Values Are Resolved

Every property value is resolved through a five-level precedence chain. The system works down the list and uses the first value it finds:

PrioritySourceWhat it is
1Table maintenance configValues you've explicitly set for this table from the IOMETE UI. Highest priority; always wins.
2Iceberg table propertiesRaw Iceberg properties set directly on the table (e.g., via ALTER TABLE SET TBLPROPERTIES). Only a few IOMETE properties read from here (more details below).
3Catalog maintenance configThe catalog-level defaults you've configured from IOMETE UI. Applies to all tables in the catalog unless overridden.
4Iceberg catalog propertiesRaw Iceberg properties set at the catalog level. Same as table properties but catalog-scoped.
5Platform defaultsBuilt-in IOMETE defaults. Used as the final fallback when nothing else is set.
Why do Iceberg properties affect maintenance settings?

These are native Iceberg properties, not IOMETE maintenance settings. They're defined on the table or catalog and influence how maintenance operations run.

If a table already defines one of these properties, IOMETE uses it as the default instead of requiring the same value in the maintenance configuration. This means your existing table settings are respected automatically.

Only a few IOMETE settings map to Iceberg properties. If these exist on the table or catalog and are not overridden in the maintenance configuration, their values are used.

The maintenance settings that read from Iceberg properties:

IOMETE PropertyOperationIceberg Property
Target File Size BytesRewrite Data Fileswrite.target-file-size-bytes
Older ThanExpire Snapshotshistory.expire.max-snapshot-age-ms
Retain LastExpire Snapshotshistory.expire.min-snapshots-to-keep