Skip to main content

Starting with Notebook

Jupyter Notebook is a powerful tool for interactive data exploration and prototyping. By connecting to IOMETE's Jupyter Gateway you will be able to explore and analyze data stored in IOMETE's data lake directly from your local environment.

Installing Jupyter Clients

There are a lot of Jupyter clients that you can use to connect to Jupyter Gateway. To quickstart we will provide installation instruction for official web-based client Jupyter Lab. For more detailed information please refer to official website.

To install Jupyter Notebook on your local machine, you can use pip (Python Package Installer). Open your terminal or command prompt and type the following command:

pip install jupyterlab

Once the installation is complete you will need to connect to Jupyter Gateway provided by IOMETE. Please follow the instructions below.

P.S: We recommend to use Visual Stuido Code as your Jupyter Client. For more information read this.

Connecting to Jupyter Gateway

Jupyter Gateway is used to manage Jupyter kernels and provide access to remote kernels. To connect to a Jupyter Gateway, you will need to specify the gateway URL and token.
Go to IOMETE Console app → Notebook menu → Copy connection string as in screenshot below

Connection String | IOMETEConnection String | IOMETE

The --gateway-url is different for each workspace.
Below you can find a template for connection string depending on your client.

Jupyter Lab

Replace values in the template. You can find them in your IOMETE Console app.

jupyter-lab --gateway-url=GATEWAY_URL \
--GatewayClient.request_timeout=180 \
--GatewayClient.auth_token=TOKEN

Visual Studio Code

Replace values in the template. You can find them in your IOMETE Console app.

https://GATEWAY_URL/?token=TOKEN

Kernels Overview

Currently IOMETE supports 2 python-based kernels. Creating custom kernels will be available soon.

  1. Pyspark Kernel
  2. Python Kernel with pre-intsalled Mongo libraries
Available Kernels | IOMETEAvailable Kernels | IOMETE

When starting/selecting a kernel the Spark driver will be deployed in the background. This process may take up to 2 minutes. Please be patient. You can track the status of kernel from client UI or from IOMETE Console app under the section "Active Kernels".

Active Kernels | IOMETEActive Kernels | IOMETE

By default kernels are terminated after closing the client. You can do it manually from IOMETE Console app.

caution

For cost-saving purposes it is strongly recommended to make sure that you closed the client, after finishing your job, and no kernels are running.

Accessing SparkSession object

To access the SparkSession in Python code, you can use the SparkSession.builder method to create a new session. Once you have a session, you can use it to create DataFrames, read data from various sources, and execute SQL queries.

For example, you can use the following code:

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("JupiterClient").getOrCreate()

df = spark.sql("show databases")
df.show()

Customizing Spark Environment

Kernels are basically Spark Applications, you can provide and override various spark parameters (like environment variables, executors count, dynamic allocation, etc).

To customize the Spark environment in Jupyter Notebook go to
IOMETE Console app → Notebook menu → Built-in Kernels → Select kernel → Edit

Edit Kernels | IOMETEEdit Kernels | IOMETE

Another method to modify the Spark configuration properties is directly from python code.
You can modify the Spark configuration properties by creating a SparkConf object and setting the properties you want to modify. For example, to set the log level to WARN, you can use the following code:

from pyspark.conf import SparkConf
from pyspark.sql import SparkSession

conf = SparkConf().set("spark.log.level", "WARN")
spark = SparkSession.builder.config(conf=conf).getOrCreate()

But this configuration will only be applied to the current Notebook session.

Installing additional libraries

To install and use additional Python libraries in Jupyter Notebook, you can use pip. For example, to install the NumPy library, you can use the following commands (install and load):

%pip install numpy
%load_ext numpy

This will install the NumPy library and its dependencies

What's next?