Skip to main content

Querying Files in AWS S3

This is an end-to-end guide about how to move files from your AWS S3 to IOMETE.

Your files in AWS S3

Let's assume you have an external bucket in AWS S3 with the CSV files you want to query in IOMETE. In this example, we will use the area-for-iomete bucket.

Provide permissions to IOMETE

In order to access files in this external S3 bucket, IOMETE requires permissions to read/write the files. Follow this guide to provide the necessary permissions.

Querying files in AWS S3

It's extremely easy to query files in AWS S3 with IOMETE. You just need to provide the bucket name and the path to the file you want to query.

SELECT  * FROM json.`s3a://area-for-iomete/countries.json`;

Learn more about querying files like CSV, JSON, Parquet, ORC and more detailed configuration options in the following guides:

Moving data into the lakehouse

If you want to move files to IOMETE Lakehouse, which provides a managed data lake experience, you can use the following commands to create tables and insert data into them.

Non-partitioned Table

  • Option 1. Create a table from select
-- Create table directly from the query
CREATE TABLE countries
AS SELECT * FROM json.`s3a://area-for-iomete/countries.json`;

-- To inspect the table use the following query
DESC TABLE EXTENDED countries;
  • Option 2. Insert into to existing table
-- just append data
INSERT INTO countries
SELECT * FROM json.`s3a://area-for-iomete/countries.json`;

-- first clean an existing data and then insert new data
INSERT OVERWRITE TABLE countries
SELECT * FROM json.`s3a://area-for-iomete/countries.json`;
  • Option 3. Merge with existing data
MERGE INTO countries
USING (SELECT * FROM json.`s3a://area-for-iomete/countries.json`) updates
ON countries.id = updates.id
WHEN MATCHED THEN
UPDATE SET *
WHEN NOT MATCHED
THEN INSERT *;

Partitioned Table

Partitioning data to speed up queries or DML that have predicates involving the partition columns. Here we use country_code as a partition column

  • Option 1. Create a partitioned table from select
-- Create a partitioned table directly from the query
CREATE TABLE countries_partitioned
PARTITIONED BY (country_code)
AS SELECT * FROM json.`s3a://area-for-iomete/countries.json`
ORDER BY country_code;

-- To inspect the table use the following query
DESC TABLE EXTENDED countries_partitioned;
  • Option 2. Insert into to existing table
-- just append data
INSERT INTO countries_partitioned
SELECT * FROM json.`s3a://area-for-iomete/countries.json`
ORDER BY country_code;

-- or you can use the following command to overwerite data
INSERT OVERWRITE TABLE countries_partitioned
SELECT * FROM json.`s3a://area-for-iomete/countries.json`
ORDER BY country_code;
  • Option 3. Merge with existing data
MERGE INTO countries_partitioned
USING (SELECT * FROM json.`s3a://area-for-iomete/countries.json`) updates
ON countries_partitioned.id = updates.id
WHEN MATCHED THEN
UPDATE SET *
WHEN NOT MATCHED
THEN INSERT *;

Conclusion

It's extremly easy to query files in AWS S3 with IOMETE. You just need to provide the bucket name and the path to the file you want to query.