Skip to main content

Sync data from JDBC sources

· 4 min read
Namig Aliyev

Intro

‌This is an end-to-end guide about migrating tables from JDBC sources (MySQL, PostgreSQL, etc.) to IOMETE and display it in the BI dashboard.

info

First, you need to establish an SSH tunnel between iomete and your database in your private network. See Database Connection Options

Given database to migrate

‌Let's assume that we want to replicate the MySQL database (or any other supported JDBC database) to the IOMETE warehouse

info

In this tutorial, we will be using a publicly accessible iomete-tutorial database instance that contains the Employees Sample Database.

info

In case of connecting to your own database instance see Database Connection Options for the details‌.


Here are the details of IOMETE-tutorial public database:

Host: iomete-tutorial.cetmtjnompsh.eu-central-1.rds.amazonaws.com
Port: 3306
Username: tutorial_user
Password: 9tVDVEKp

The database contains the following tables:

Table nameRow count
employees300024
departments9
dept_manager24
dept_emp331603
titles443308
salaries2844047

Create lakehouse

‌Create a new lakehouse instance:

SQL Editor

Querying Source Table

After having the warehouse created, we create a table using JDBC Sources using the CREATE TABLE command. In the OPTIONS part we specify credentials of the database to which we want to connect as follows (see JDBC Sources):

CREATE TABLE IF NOT EXISTS employees_proxy USING org.apache.spark.sql.jdbc OPTIONS (
url "jdbc:mysql://iomete-tutorial.cetmtjnompsh.eu-central-1.rds.amazonaws.com:3306/employees",
dbtable "employees.employees",
user 'tutorial_user',
password '9tVDVEKp'
);

SELECT * FROM employees_proxy limit 100;
info

This table doesn't hold the actual data. Data will be retrieved from the actual source once we query the table

SQL Editor

Migrating Data

To move the data from the source to the warehouse, you can use one of the following options:

Option 1. Create a table from select

-- Create table directly from the query
CREATE TABLE employees USING delta
AS SELECT * FROM employees_proxy;--

To inspect the table use the following query
DESC TABLE EXTENDED employees;

Option 2. Insert into to existing table

--just append data
INSERT INTO employees
SELECT * FROM employees_proxy

--or you can use the follwing command to overwrite data

--first clean an existing data and then insert new data
INSERT OVERWRITE TABLE employees
SELECT * FROM employees_proxy

Option 3. Merge with existing data

MERGE INTO employees
USING (SELECT * FROM employees_proxy) updates
ON employees.emp_no = updates.emp_no
WHEN MATCHED THEN
UPDATE SET *WHEN NOT MATCHED
THEN INSERT *

Visualize Data

Let's move employees.salaries before moving to BI visualization:

CREATE TABLE IF NOT EXISTS salaries_proxy
USING org.apache.spark.sql.jdbc
OPTIONS ( url "jdbc:mysql://iomete-tutorial.cetmtjnompsh.eu-central-1.rds.amazonaws.com:3306/employees",
dbtable "employees.salaries", user 'tutorial_user', password '9tVDVEKp');

CREATE TABLE salaries USING delta
AS SELECT * FROM salaries_proxy;

Create a view joining employees and salaries tables:

CREATE OR REPLACE VIEW employee_salaries AS
SELECT e.emp_no, e.first_name, e.last_name, e.gender, s.salary
FROM employees e
JOIN salaries s ON e.emp_no = s.emp_no;
BI Integrations

Visit this page to learn more about BI Integrations.

Congratulations! You did it!

Bonus part There is a dedicated python library to help to automate this table replication with just a configuration. Please, check out Syncing JDBC Sources.