Skip to main content

Using ChatGPT for data engineering

· 5 min read
Aytan Jalilova

When it comes to searching for information, many of us turn to search engines like Google. However, with the growth of user-generated content on the web, finding the right information can be a challenge. That's where chatbots like ChatGPT come in. Unlike traditional search engines, chatbots like ChatGPT are more direct and can understand context. This makes them a powerful tool for data engineers who need quick answers to complex problems. In this article, we'll explore how data engineers are using AI on the job, and how it's changing the way they work.

Dependency management

One of the most challenging aspects of data engineering is managing dependencies. With complex data pipelines, it can be difficult to keep track of all the dependencies and ensure that everything is working correctly. AI tools like ChatGPT can help with this by providing quick solutions to common problems. For example, in Python, you can use ChatGPT to write a script that will invert a dependency tree in just a few minutes.

Regex patterns

ChatGPT can also be used to generate regular expressions. Data engineers often use regular expressions to extract specific patterns from text, but creating a regular expression can be time-consuming and challenging. With ChatGPT, data engineers can provide a prompt for a regular expression, and it will generate a regular expression that can be used for the task at hand.

SQL dialects

Different SQL dialects can be challenging to work with. It is also an excellent tool for finding equivalent functions between different SQL dialects. It is much easier to use ChatGPT to get the answer you need than to wade through a bunch of unclear results and seemingly unrelated functions from a traditional search engine. For example, if a data engineer knows what they want to do but doesn't know the correct syntax to do it, they can provide a prompt to ChatGPT, and it will generate the correct SQL query.

Want a free-forever data lakehouse platform?

Learn more

Error messages

Error messages can be a nightmare for data engineers, especially when working on complex projects. Chatbots like ChatGPT can help simplify this process by providing quick explanations of error messages and suggesting fixes. This can save data engineers hours of time and frustration.

Troubleshooting code

Data engineering is a field that involves managing large volumes of data, and this can be a challenging task. There are often complex data structures to manage, processing pipelines to design, and a lot of troubleshooting involved. ChatGPT can be used to help data engineers with a range of tasks, from troubleshooting code to generating API calls.

Generating Mermaid Diagrams

One of the most popular use cases for ChatGPT among data engineers is generating Mermaid diagrams. Data engineers often use Mermaid diagrams to sketch out architecture and processes, and ChatGPT can be used to generate these diagrams based on descriptions. For instance, if a data engineer wants to create a flow chart to describe how an ETL process works, they can provide a prompt to ChatGPT, and it will generate a Mermaid diagram that can be pasted into the Mermaid live editor.

Mermaid diagram building with ChatGPT

SQL Queries

Data engineers also use ChatGPT to help them write SQL queries. ChatGPT is not great at generating complex SQL queries, but it can be helpful in generating simple queries that data engineers may not be familiar with. For example, if a data engineer knows what they want to do but doesn't know the correct syntax to do it, they can provide a prompt to ChatGPT, and it will generate the correct SQL query.

Discovering the data lakehouse platform?

Try Sandbox

Writing Code

ChatGPT has also proven useful for writing code. It saves time in browsing through StackOverflow posts and is super helpful in getting some preliminary or outline code for a large task. The AI can also help explain error messages and suggest fixes. For instance, if you paste in code or code snippets, ChatGPT can help solve the problem. This has saved data engineers hours if not more.

Data Modeling

Data engineers use ChatGPT to help them with data modeling. ChatGPT can be used to answer general questions about data modeling, and data engineers can ask follow-up questions and request examples. This can be helpful for data engineers who are new to data modeling or who need help with a specific aspect of data modeling.

Generating Sample Data

ChatGPT can also be used to generate sample data. Data engineers often need sample data to test their code and to create mock data for various purposes. ChatGPT can generate somewhat realistic data based on SQL DDL for a table. Data engineers can provide a prompt to this table, and ChatGPT will generate sample data that can be used for testing or mock data generation.

Writing Documentation

ChatGPT is also useful in writing documents such as OKRs, where it can save a full day's worth of back and forth about small details. It can summarize and produce a cutover plan as well as the next steps for the user.

Will ChatGPT replace data engineers?

While chatbots like ChatGPT can be powerful tools for data engineers, they are not always reliable. In some cases, they may provide misleading or incorrect answers. This is known as "hallucination" in the AI literature. Data engineers need to be vigilant and double-check any answers provided by chatbots like ChatGPT to ensure that they are correct.

In conclusion, AI tools like ChatGPT are changing the way data engineers work. By providing quick solutions to common problems, these tools can save data engineers hours of time and frustration. However, it's essential to be vigilant and double-check any answers provided by these tools to ensure that they are correct. Overall, AI is a powerful tool that can help data engineers to be more efficient and effective in their work.