Skip to main content

Data Compaction Job


Over the time iceberg tables could slow down and require to run data compaction to clean up tables. IOMETE provides built-in job to run data compactions for each table. This job triggers the next iceberg processes:

  1. ExpireSnapshots Maintenance - Expire Snapshots
  2. Delete Orphan Files - See Maintenance - Delete Orphan Files
  3. Rewrite Data Files - See Maintenance - Rewrite Data Files
  4. Rewrite Manifests - See Maintenance

To enable data compaction spark job follow the next steps:

  1. In the left sidebar menu choose Spark Jobs
  2. Create new job
  3. Fill the form with below values:
Field NameValue
Schedule (example will run job every Sunday at 12:00, feel free to change the value)0 12 * * SUN
Docker Imageiomete/iomete_data_compaction:0.2.0
Main application filelocal:///app/driver.py
Main classLeave empty
Instance: Size (ICU) (feel free to increase)2

See example screenshot below

data compaction job


Github‚Äč

We've created initial job for data-compaction which will be enough in most cases. Feel free to fork and create new data compaction image based on your company requirements. View in Github