Data Compaction Job
Over the time iceberg tables could slow down and require to run data compaction to clean up tables. IOMETE provides built-in job to run data compactions for each table. This job triggers the next iceberg processes:
- ExpireSnapshots Maintenance - Expire Snapshots
- Delete Orphan Files - See Maintenance - Delete Orphan Files
- Rewrite Data Files - See Maintenance - Rewrite Data Files
- Rewrite Manifests - See Maintenance
To enable data compaction spark job follow the next steps:
- In the left sidebar menu choose Spark Jobs
- Click Create for a new job
![IOMETE Spark Jobs | IOMETE](/resources/assets/images/job-lis-create-26d32822af362d81cf131993c9973985.png)
![IOMETE Spark Jobs | IOMETE](/resources/assets/images/job-lis-create-26d32822af362d81cf131993c9973985.png)
- Fill the form with below values:
Field name | Value |
---|---|
Schedule (example will run job every Sunday at 12:00, feel free to change the value) | 0 12 * * SUN |
Docker image | iomete/iomete_data_compaction:0.2.0 |
Main application file | local:///app/driver.py |
Main class | Leave empty |
Instance: Size (ICU) (feel free to increase) | 2 |
See example screenshot below
![Create data compaction job | IOMETE](/resources/assets/images/spark-job-data-compaction-deployment-43c8545ee46701a0fbd055aa2da70159.png)
![Create data compaction job | IOMETE](/resources/assets/images/spark-job-data-compaction-deployment-43c8545ee46701a0fbd055aa2da70159.png)
Job instance
![Data compaction job instance | IOMETE](/resources/assets/images/spark-job-instance-c2efe1cfa9819764d843353ab4212908.png)
![Data compaction job instance | IOMETE](/resources/assets/images/spark-job-instance-c2efe1cfa9819764d843353ab4212908.png)
Github
We've created initial job for data-compaction which will be enough in most cases. Feel free to fork and create new data compaction image based on your company requirements. View in Github