The IOMETE Data Plane is hosted within the customer's own AWS (Amazon Web Services) account, providing the customer with complete control over data, processing, and associated costs.
Data Plane consists of two main components and operates entirely within the perimeter of the customer's AWS infrastructure:
- Data Storage: A cloud object store (AWS S3), which primarily stores data in the Iceberg (Parquet) open format.
- Compute Cluster: This is an EKS (Elastic Kubernetes Service) cluster that manages all the computational resources required to run Lakehouse clusters and Spark jobs.
As you see in the diagram above, there are two main cost components:
- AWS S3 Storage Cost
- AWS EC2 Compute Cost
AWS S3 Storage Cost
Lakehouse data is stored in AWS S3, highly durable and scalable object storage.
AWS S3 is a pay-as-you-go service. You pay for the storage you use. The cost of the S3 is around $0.023 per GB per month. Check the AWS S3 pricing page for the latest pricing.
The IOMETE Data Plane infrastructure establishes an S3 Gateway endpoint to access S3 data from the Data Plane VPC. The S3 Gateway endpoint is a highly available and scalable service that delivers a secure connection between the VPC and S3 over AWS's backbone network. Consequently, communication between the Data Plane VPC and S3 doesn't go through the public internet. This yields benefits, including no data transfer cost between S3 and compute machines. Additionally, it stays in compliance with customer security policies and regulations. Read more about AWS S3 Gateway endpoints.
AWS Compute Cost
Our data plane compute infrastructure operates on top of EKS (Elastic Kubernetes Service), which utilizes AWS EC2 instances as the compute nodes.
- EKS: AWS charges approximately $72/month for an EKS cluster. Refer to the AWS EKS pricing page for more details.
- IOMETE core services: These refer to the base services that control the Data Plane infrastructure. They are consistently in operation. These services require a modest amount of resources (2vCPU, 16GB RAM), which corresponds to the
r6g.largeEC2 instance type. The cost is typically around ~$74/month.
In summary, the fixed costs for the Data Plane infrastructure are roughly $146/month.
AWS Pricing are different for each region. The prices mentioned above are for the
us-east-1 region. Please check the AWS Pricing page for the latest pricing for your region.
The remaining resources are utilized by user workloads, such as Lakehouse/Spark Job/Notebook clusters. These costs fluctuate based on the workload type, size, and usage pattern.
All these clusters are based on Apache Spark and consist of a driver and one or more executors. The driver is responsible for managing the executors. The executors are responsible for running the data processing tasks.
The Lakehouse cluster driver should be operational at all times as it acts as the endpoint for BI tools. The executors are elastic and can be scaled up or down according to the load. Learn more about the Lakehouse cluster.
AWS Cost Monitoring
IOMETE Data Plane Terraform script adds tags to all the AWS resources. These tags can be used to monitor the AWS cost.
The available tags are:
iomete.com/cluster_name: The cluster name
iomete.com/managed: true: The cluster is managed by IOMETE
Using these tags, you can monitor the AWS cost using AWS Cost Explorer. Follow the steps below to monitor the AWS cost.
1. Activate tag
First, you need to create a Cost Allocation Tag. You can do that by following the steps below:
- Go to the AWS Billing Console
- Click on the
Cost Allocation Tagsfrom the left menu
- Find and select the
- Click on the
It will take up to 24 hours for AWS to activate the tag. Once the tag is activated, you can use it to monitor the AWS cost.
Go to the AWS Cost Explorer and filter the cost by the tag
AWS Cloud Cost Optimization
Since IOMETE Data Plane runs on the customer's AWS account, the customer has full control over the AWS cost and can optimize the cost based on their needs. Customer can use Reserved Instances, Spot Instances, and AWS Savings Plans to optimize costs.
AWS Spot Instances
AWS Spot Instances allow you to take advantage of unused EC2 capacity at a lower cost (approx 3x lower) than on-demand instances.
They are suitable for workloads that are flexible about the timing of instance termination, such as Spark Job (ETL) executors. Lakehouse clusters that are used for DBT workloads.
AWS Savings Plan
AWS Savings Plan is a flexible pricing model that discounts your Amazon EC2 and Fargate usage in exchange for a commitment to use a specific amount of compute usage (measured in $/hour) for a one or three-year term. It provides a lower cost compared to On-Demand pricing and is flexible enough to adjust the compute usage you commit to as your needs change.
AWS Reserved Instances
AWS Reserved Instances are a cost-effective way to reduce the overall cost of running EC2 instances. They provide a significant discount (up to 75%) compared to on-demand instances by committing to use a specific instance type for a specific period (1 or 3 years).
To optimize cost, it is recommended to reserve instances that have a stable and predictable usage pattern.
It is also recommended to use the right instance type and size to monitor usage and adjust the number of reserved instances as necessary to ensure that you are using your reserved instances effectively.
If you need help with AWS cost optimization, please with IOMETE support team at [email@example.com]