Skip to main content

Sync data from S3

· 3 min read
Namig Aliyev

Intro

This is an end-to-end guide about how to move files from your AWS S3 to iomete and show it in the BI dashboard. In this example, we use a JSON file, but for other file types (such as CSV, Parquet, and ORC) visit docs.

Your files in AWS S3

Let's say you have a dedicated bucket where you have files you want to move to iomete:​

Your files in AWS S3

note

This bucket will be different in your case. This is just an example bucket for demonstration purpose. We want to query/migrate countries.json file in the iomete platform:

The file (countries.json) we want to move to iomete

You could download the countries.json file for yourself with this command:

wget https://iomete-public.s3.eu-central-1.amazonaws.com/datasets/countries.json​

Create a storage integration in iomete

  1. Choose AWS External Storage: Create a storage integration in iomete

  2. Specify a name and enter your AWS S3 Location to create integration between to: Specify a name

  3. Once it is created copy policies created to be added to your S3 Bucket permissions: Generated bucket policies in iomete User bucket policies

  4. Go to your AWS S3 Bucket and add generated JSON policy to your S3 Bucket's Permission: S3 permissions Edit bucket policy

Create warehouse

Create a new warehouse instance and specify the storage integration you created in the previous step.

Create warehouse

Moving Data

In the SQL Editor, you should be able to query the file and migrate to iomete using the following methods. Querying JSON file data without moving to iomete:

Moving Data

Once you decided that you want to move data to iomete you could use the following commands: ‍

Option 1. Create a table from select

CREATE TABLE countries USING delta
AS SELECT * FROM json.`s3a://my-staging-area-for-iomete/countries.json`

Option 2. Insert into to existing table

-- just append data
INSERT INTO countries
SELECT * FROM json.`s3a://my-staging-area-for-iomete/countries.json`

-- or you can use the follwing command to overwerite data
INSERT OVERWRITE TABLE countries
SELECT * FROM json.`s3a://my-staging-area-for-iomete/countries.json`Insert to the existing table.

‍Option 3. Merge with existing data

MERGE INTO countries
USING (SELECT * FROM json.`s3a://my-staging-area-for-iomete/countries.json`) updates
ON countries.id = updates.id
WHEN MATCHED THEN
UPDATE SET
*WHEN NOT MATCHED
THEN INSERT *

Visualize Data

First, let's create a view with clean column names to be used in BI dashboarding:

CREATE OR REPLACE VIEW countries_view
AS SELECT
country_code,
region,
`SP.POP.TOTL` AS sp_pop_totl
FROM countries;

Open BI Application

Open BI Application

Add new database connection.

Select Data -> Databases from the menu: Add new database connection.

Choose Database Type. Here you need to choose Apache Hive from the dropdown: Choose Database Type

Replace iomete_username and warehouse_name with your values.

hive://:XXXXXXXXXX@-warehouse-thriftserver:10000/?auth=CUSTOM&transport_mode=http

Connect database

Add new dataset

From the menu choose Data -> Dataset and click + Dataset button on the right top corner

Add new dataset

Create a new chart

Click on the newly created dataset countries_view which opens chart view. Choose the visualization type and corresponding settings:

Create new pie chart

Save this chart to the dashboard too and navigate to the dashboard. And, here is the dashboard of the Countries that we just created:

Created dashboard

That was easy...