Skip to main content

Deployment Architecture

This reference describes how IOMETE maps onto Kubernetes: the Helm chart structure, the full service inventory, feature flags, and infrastructure options. For a conceptual overview of each service, see the Architecture Overview. For installation steps, see the On-Premises Deployment Guide.

Helm Chart Structure

Everything ships as a single Helm chart (iomete-data-plane-enterprise), which deploys all platform services into one Kubernetes namespace. Services talk to each other over internal Kubernetes DNS.

Initialization Job

Before any service starts, a pre-install/pre-upgrade Helm hook job (iomete-data-plane-init) bootstraps the environment:

  1. Creates PostgreSQL databases for each microservice
  2. Creates the Hive Metastore secret
  3. Creates the Hadoop configuration secret
  4. Creates the Spark ConfigMap
  5. Creates the Spark History folder in object storage
Init Job Failures

This job runs with backoffLimit: 2. If it fails, the entire Helm install or upgrade fails. When troubleshooting a failed deployment, check the init job logs first.

Configuration Distribution

All Helm values (including feature flags) are serialized to YAML and stored in a Kubernetes Secret named data-plane. Every backend microservice reads this secret at startup, so a Helm upgrade propagates config changes to all services on their next restart.

Service Inventory

Knowing which services run (and which are optional) helps you plan resource allocation and troubleshoot startup issues. The table below lists every service the Helm chart deploys. Services tied to a feature flag only appear when that flag is enabled.

ServiceTypeFeature FlagNotes
iom-gatewayDeploymentAlwaysNginx reverse proxy, entry point
iom-appDeploymentAlwaysFrontend SPA (Nginx)
iom-coreDeploymentAlwaysPlatform settings, auth, Spark History proxy
iom-clusterDeploymentAlwaysSpark resource management
iom-identityDeploymentAlwaysIAM, Ranger, SSO, domains
iom-sqlDeploymentAlwaysSQL editor backend
iom-catalogDeploymentAlwaysData catalog, governance
iom-rest-catalogDeploymentAlwaysIceberg REST Catalog
iom-health-checkDeploymentAlwaysService health monitoring
iom-socketDeploymentAlwaysWebSocket relay
spark-operator-controllerDeploymentAlwaysSpark CRD controller
spark-operator-webhookDeploymentAlwaysAdmission webhook
spark-submit-serviceDeploymentAlwaysSpark app submission
spark-connect-driverSparkApplicationAlwaysInternal metadata extraction
spark-connect-restDeploymentAlwaysREST client for Spark Connect
spark-historyDeploymentAlwaysSpark job history UI
metastoreDeploymentAlwaysHive Metastore
typesenseDeploymentAlwaysSearch engine (20Gi PVC)
iom-collabDeploymentenableCollaborativeSqlEditorCollaborative SQL editing
natsStatefulSet (3 replicas)services.nats.enabledJetStream messaging
iom-event-streamStatefulSet (2 replicas)eventStreamEvent ingestion + Iceberg writer
iom-event-stream-proxyDeploymenteventStreamIngestion request routing
prefect-serverDeploymentjobOrchestratorWorkflow orchestration
prefect-workerDeployment (per-namespace)jobOrchestratorScheduled job execution
job-orchestrator-metrics-exporterDeploymentjobOrchestratorPrometheus metrics
iom-maintenanceDeploymentenableAutomatedMaintenanceTable compaction
iom-ratelimiterDeploymentratelimiterRedis-based rate limiting
spark-proxy-serverDeployment (per-namespace)sparkProxyForArrowFlightArrow Flight proxy

All Quarkus-based services expose /health for liveness and readiness probes. They use a RollingUpdate strategy (maxUnavailable: 0, maxSurge: 1), so at least one replica stays available during deploys.

Gateway Routing

Every request into the platform passes through the Gateway (an Nginx reverse proxy), which routes traffic to the correct backend based on URI prefix. The tables below show the complete routing map.

API Routes

URI PatternBackend Service
/api/v*/domains/*/sql, /api/v*/domains/*/gitiom-sql
/api/v*/health-checkiom-health-check
/api/v*/admin/compaction, /api/v*/domains/*/compactioniom-maintenance
/api/v*/domains/*/data-catalog, /api/v*/domains/*/data-product, /api/v*/domains/*/governance, /api/v*/admin/governanceiom-catalog
/api/v*/authz, /api/v*/auth, /api/v*/data-planes, /api/v*/modules, /api/v*/system-config, /api/v*/admin/node-types, /api/v*/admin/volumesiom-core
/api/v*/domains/*/compute, /api/v*/domains/*/spark/*, /api/v*/domains/*/jupyter-containers, /api/v*/domains/*/namespaces, /api/v*/domains/*/secrets, /api/v*/domains/*/schedules, /api/v*/domains/*/event-streamsiom-cluster
/api/v*/domains/*/roles, /api/v*/domains/*/members, /api/v*/admin/identity, /api/v*/admin/domains, /api/v*/users, /api/v*/groups, /api/v*/bundles, /api/v*/identityiom-identity

Non-API Routes

URI PatternBackend ServiceProtocol
/ (default), /domains/*, /admin/*iom-appHTTP
/catalogs/iom-rest-catalogHTTP
/socket.ioiom-socketWebSocket
/collaborationiom-collabWebSocket (feature-flagged)
/spark-history, /spark-ui, /monitoringiom-coreHTTP
/sso-proxyiom-coreHTTP
/service/(plugins|roles|xusers|tags)iom-identityHTTP (Ranger admin API)

Dynamic Workload Routes

URI PatternTargetProtocol
/data-plane/*/lakehouse/*/...cc-<name>.<ns>:10000HTTP (HiveServer2 JDBC)
/spark.connect.*cc-<name>.<ns>:15002gRPC (Spark Connect)
/arrow.flight.protocol.FlightServicecc-<name>.<ns>:33333 or Spark ProxygRPC (Arrow Flight)
/data-plane/*/jupyter/*/...jc-<name>.<ns>:8888HTTP + WebSocket
/data-plane/*/event-stream/*/...iom-event-stream-proxyHTTP

The Gateway supports gRPC for Spark Connect and Arrow Flight, with keepalive enabled and a 24-hour read timeout (86400s).

Node Placement

To keep platform services and Spark workloads on separate hardware, the chart exposes two pairs of node selectors and tolerations:

SettingApplied ToPurpose
controlPlaneNodeSelector / controlPlaneTolerationsAll IOMETE platform servicesIsolate control plane on dedicated nodes
dataPlaneNodeSelector / dataPlaneTolerationsSpark workloads (drivers, executors)Isolate compute on data nodes

Multi-Namespace Support

If different teams need separate CPU and memory quotas, you can deploy Spark workloads (drivers and executors) into additional namespaces via the namespaces list in values.yaml. The data plane's own namespace is always included automatically.

Each extra namespace gets its own copies of:

  • Prefect Worker
  • Spark Proxy Server (when Arrow Flight proxy is enabled)
  • Event Stream pods (when event streams are enabled)

For details, see Connect Namespace. For full multi-cluster topology, see Multi-Cluster Setup.

Priority Classes

PriorityClasses let Kubernetes preempt lower-priority pods when resources are scarce. When priorityClasses is enabled, Spark workloads are assigned these classes:

Priority ClassWorkload Type
iomete-computeCompute clusters
iomete-spark-jobSpark jobs
iomete-notebookJupyter containers
iomete-operational-supportJob Orchestrator workers

Autoscaling

For services that handle variable load, you can turn on Horizontal Pod Autoscaling (HPA). It's disabled by default and must be enabled per-service in values.yaml.

ServiceHPA SupportNotes
iom-gatewayYesDisabled by default
iom-coreYesDisabled by default
iom-clusterYesDisabled by default
iom-identityYesDisabled by default
iom-catalogYesDisabled by default
iom-rest-catalogYesDisabled by default
spark-submit-serviceYesDisabled by default
iom-collabYesMemory-based (85%) primary, CPU (80%) secondary
iom-event-stream-proxyYesEnabled by default (1-4 replicas, 80% CPU)
iom-sqlNoFixed at 1 replica

Feature Flags

Feature flags control what gets deployed and how the platform behaves. Most live in values.yaml under features: and reach every service through the data-plane secret. A few flags use different paths (noted in the table).

Deployment Flags

These flags control whether entire services or subsystems are deployed at all:

FlagServices DeployedDefault
enableCollaborativeSqlEditoriom-collabfalse
services.nats.enabledNATS clusterfalse
eventStreamiom-event-stream, iom-event-stream-proxyfalse
jobOrchestratorprefect-server, prefect-worker, metrics-exporterfalse
sparkProxyForArrowFlightspark-proxy-server (per-namespace)false
enableAutomatedMaintenanceiom-maintenancefalse
ratelimiteriom-ratelimiterfalse
jupyterContainersjupyter-containers ConfigMapfalse

Runtime Behavior Flags

These flags toggle features without adding or removing services:

FlagEffect When EnabledDefault
arrowFlightConnectionArrow Flight connections available for compute clusterstrue
arrowFlightForDbExplorerDB Explorer uses Arrow Flight instead of Spark Connect RESTfalse
activityMonitoringQuery monitoring and resource trackingfalse
downloadQueryResultsSQL query result download in UItrue
caseInsensitiveIcebergIdentifiersIceberg table/database names are case-insensitivefalse
iometeSparkLivenessProbeAdditional liveness probe on Spark driver podstrue
icebergRestCatalogStrictModeDatabase must exist before creating tablesfalse
priorityClassesSpark workloads use PriorityClassesfalse
emailNotificationsEmail notifications availabletrue
sparkJobArchivalArchival of Spark job historyfalse
onboardComputeRasResource Access Service for compute clustersfalse
onboardSparkJobRasResource Access Service for Spark jobsfalse
onboardWorkspaceRasResource Access Service for SQL workspacesfalse
onboardNamespaceMappingRasNamespace-level access controlfalse
domainLevelBundleAuthorizationDomain-level bundle-based authorizationfalse
secretsV2New secrets management systemfalse
dataAccessAuditData access audit loggingfalse
icebergMetricsIceberg table metrics sent to event streamtrue
schedulingSQL scheduling (requires jobOrchestrator)false
ldapGroupInheritanceLDAP group hierarchy inheritancetrue
identitySoftDeleteSoft delete for users/groupsfalse
showExecutorLogsExecutor logs visible in UItrue
Prerequisite Chain

Don't enable domainLevelBundleAuthorization until all four RAS flags are active (onboardComputeRas, onboardSparkJobRas, onboardWorkspaceRas, onboardNamespaceMappingRas) and you've run the migration script. Skipping this order breaks authorization.

Storage Configuration

Object Storage

All table data, Spark event logs, and SQL results live in object storage, making it the most important infrastructure dependency. Configure the provider and credentials in values.yaml.

Storage TypeConfig KeyURI SchemeRequired Settings
MinIOminios3a://endpoint, accessKey, secretKey
Dell ECSdell_ecss3a://endpoint, accessKey, secretKey
AWS S3aws_s3s3a://IAM role, cloud.region
Google Cloud Storagegcsgs://GCP service account
Azure Blob (Gen1)azure_gen1wasbs://storageAccountName, storageAccountKey
Azure Data Lake (Gen2)azure_gen2abfs://storageAccountName, storageAccountKey

Object storage paths:

PathContents
data/Lakehouse table data
iomete-assets/spark-history/Spark event logs
iomete-assets/sql-results/SQL query results
iomete-assets/sql-editor/worksheets/SQL editor worksheet content
ranger/auditRanger audit logs (when HDFS audit is enabled)

Database

All services share a single PostgreSQL server. The init job creates databases with a configurable prefix (default: iomete_):

  • <prefix>metastore_db (Hive Metastore)
  • <prefix>ranger_db (Ranger policies)
  • <prefix>iceberg_db (Iceberg catalog metadata)
  • Per-microservice databases (Core, Cluster, Identity, SQL, Catalog, etc.)
  • <prefix>prefect_db (Prefect job orchestrator, when enabled)

Multi-cluster database support: You can point clusterDatabase at a different server than the main database. This is handy in multi-region setups where high-volume operations (Kubernetes/Spark data) hit a local database while metadata stays on a global one.

SSL: PostgreSQL SSL is optional. When ssl.enabled is true, JDBC connections use sslmode=verify-full.

Secret Store

You can choose where IOMETE keeps sensitive values like database passwords and API keys:

TypeDescription
kubernetes (default)Kubernetes Secrets
databaseEncrypted in the IOMETE database
vaultHashiCorp Vault (requires endpoint, path, token)

Logging

Depending on your observability stack, you can route logs to one of several backends:

SourceDescription
kubernetes (default)Reads logs directly from the Kubernetes API
lokiGrafana Loki (requires host, port)
elasticsearchElasticsearch (requires endpoint, apiKey, indexPattern)
splunkSplunk Enterprise (requires endpoint, token, indexName)

Hot storage: When hotStorage.enabled is true, the platform tries the Kubernetes API first (for recent logs), then falls back to the external backend for older entries.

Monitoring

If you run Prometheus, set monitoring.enabled to true. This adds scrape annotations to every service pod. Metrics endpoints differ by framework:

  • /q/metrics for Quarkus services (Core, Cluster, Identity, SQL, Catalog, REST Catalog)
  • /metrics for everything else (Spark Operator, Collab)

TLS and TrustStore

If your environment uses self-signed certificates, mount a Java TrustStore into all services via a Kubernetes Secret. The TrustStore must include both the default public certificates and your custom ones. See TrustStore Configuration for setup details.

Ingress

The Gateway expects an ingress controller in front of it. When ingress.httpsEnabled is true (the default), the Gateway sets the public-facing protocol to HTTPS in forwarded headers. See Configure Ingress for setup instructions.

OpenAPI Documentation

Every backend service publishes an OpenAPI spec, and the Gateway proxies them all under /openapi/:

PathService
/openapi/uiSwagger UI (served by Core Service)
/openapi/coreCore Service
/openapi/clusterCluster Service
/openapi/iamIdentity Service
/openapi/sqlSQL Service
/openapi/catalogCatalog Service
/openapi/rest-catalogIceberg REST Catalog

Cloud Provider Support

IOMETE runs on AWS, Azure, GCP, and on-premises Kubernetes. The main differences between providers are scratch directory paths and zone-aware scheduling:

ProviderScratch DirectoryZone-Aware Scheduling
On-Premises/local1No
AWS/local1Yes (region + availability zone)
Azure/mntNo
GCP/mnt/stateful_partitionNo

Regardless of provider, Spark workload pods include tolerations for dedicated nodes (k8s.iomete.com/dedicated) and ARM64 architecture (kubernetes.io/arch=arm64).

Technology Stack

ComponentTechnology
Backend languageKotlin
JVMJava
Backend frameworkQuarkus
Build systemGradle (Kotlin DSL)
FrontendReact, TypeScript, Vite
UI libraryAnt Design
Spark engineApache Spark
Table formatApache Iceberg
Hive MetastoreApache Hive
Spark OperatorKubernetes Operator (Go)
Event ingestionRust
Job orchestrationPrefect (Python)
MessagingNATS JetStream
SearchTypesense
Rate limitingRedis