Skip to main content

Data Compaction Job


Over the time iceberg tables could slow down and require to run data compaction to clean up tables. IOMETE provides built-in job to run data compactions for each table. This job triggers the next iceberg processes:

  1. ExpireSnapshots Maintenance - Expire Snapshots
  2. Delete Orphan Files - See Maintenance - Delete Orphan Files
  3. Rewrite Data Files - See Maintenance - Rewrite Data Files
  4. Rewrite Manifests - See Maintenance

To enable data compaction spark job follow the next steps:

  1. Navigate to the Job Templates, then click the Deploy button on the Data Compaction Job card.
IOMETE Spark Jobs | IOMETEIOMETE Spark Jobs | IOMETE
  1. You will see the job creation page with all inputs filled.
Create data compaction job | IOMETECreate data compaction job | IOMETE

Job Configurations

Data compaction job configurations | IOMETEData compaction job configurations | IOMETE

Instance

Data compaction job instance | IOMETEData compaction job instance | IOMETE

Github

We've created initial job for data-compaction which will be enough in most cases. Feel free to fork and create new data compaction image based on your company requirements. View in Github

ON THIS PAGE