File Streaming
Transfer files to iceberg continuously.
Table of Contents
File formats
Tested file formats.
- CSV
Job creation
- Go to
Spark Jobs
. - Click on
Create New
.
Specify the following parameters (these are examples, you can change them based on your preference):
- Name:
file-streaming-job
- Docker Image:
iomete/iomete_file_streaming_job:0.2.0
- Main application file:
local:///app/driver.py
- Environment Variables:
LOG_LEVEL
:INFO
or ERROR - Config file:
{
file: {
format: csv,
path: "files/",
max_files_per_trigger: 1,
latest_first: false,
max_file_age: "7d"
}
database: {
schema: default,
table: awesome_csv_addresses
}
processing_time: {
interval: 5
unit: seconds # minutes
}
}
Configuration properties
Property | Description |
---|---|
file | Required properties to connect and configure.
|
database | Destination database properties.
|
processing_time | Processing time to persist incoming data on iceberg.
|
Create Spark Job - Deployment
Create Spark Job - Instance
note
You can use Environment Variables to store your sensitive data like password, secrets, etc. Then you can use these variables in your config file using the ${ENV_NAME}
syntax.
Create Spark Job - Application Config
Tests
Prepare the dev environment
virtualenv .env #or python3 -m venv .env
source .env/bin/activate
pip install -e ."[dev]"
Run test
python3 -m pytest # or just pytest