File Streaming
Transfer files to iceberg continuously.
File formats
Tested file formats.
- CSV
Job creation
- In the left sidebar menu choose Spark Jobs
- Click on Create
Specify the following parameters (these are examples, you can change them based on your preference):
- Name:
file-streaming-job
- Docker image:
iomete/iomete_file_streaming_job:0.2.0
- Main application file:
local:///app/driver.py
- Environment variables:
LOG_LEVEL
:INFO
or ERROR
You can use Environment variables to store your sensitive variables like password, secrets, etc. Then you can use these variables in your config file using the ${DB_PASSWORD}
syntax.
Config file
-
Config file: Scroll down and expand
Application configurations
section and clickAdd config file
and paste following JSON.
{
file: {
format: csv,
path: "files/",
max_files_per_trigger: 1,
latest_first: false,
max_file_age: "7d"
}
database: {
schema: default,
table: awesome_csv_addresses
}
processing_time: {
interval: 5
unit: seconds # minutes
}
}
Configuration properties
Property | Description |
---|---|
file | Required properties to connect and configure.
|
database | Destination database properties.
|
processing_time | Processing time to persist incoming data on iceberg.
|
Create Spark Job - Deployment
Create Spark Job - Instance
You can use Environment Variables to store your sensitive data like password, secrets, etc. Then you can use these variables in your config file using the ${ENV_NAME}
syntax.
Create Spark Job - Application Config
Tests
Prepare the dev environment
virtualenv .env #or python3 -m venv .env
source .env/bin/activate
pip install -e ."[dev]"
Run test
python3 -m pytest # or just pytest