Towards Dev

Home

Newsletter

About

Follow publication

A publication for sharing projects, ideas, codes, and new theories.

Follow publication

How to get the Job ID, Run ID & Start Time for a Databricks Job with working code

Canadian Data Guy | Moved To Substack

Published in

Towards Dev

2 min readFeb 23, 2023

Table Of Contents:

∘ Step 1: Pass the parameters
∘ Step 2: Get/Fetch and print the values
· Advanced & quicker method to implement
· Footnote

It’s crucial to monitor task parameter variables such as job_id, run_id, and start_time while running ELT jobs. These system-generated values can be saved or printed for future reference. Please refer below to find the comprehensive list of supported parameters.

This is a simple 2-step process:

Pass the parameter when defining the job/task
Get/Fetch and print the values

Step 1: Pass the parameters

Step 2: Get/Fetch and print the values

print(f"""
  job_id: {dbutils.widgets.get('job_id')}
  run_id: {dbutils.widgets.get('run_id')}
  parent_run_id: {dbutils.widgets.get('parent_run_id')}
  task_key: {dbutils.widgets.get('task_key')}
  """)

Next step, when you run the job; you should see an output like this

Advanced & quicker method to implement

Add the following boilerplate code on top of the notebook. It will capture whole context information instead, and you can parse whatever information is helpful to you.

The below is code based and attributes are subject to change without notice

import json, pprint

dict_job_run_metadata = json.loads(dbutils.notebook.entry_point.getDbutils().notebook().getContext().toJson())

print(f'''      
      currentRunId: {dict_job_run_metadata['currentRunId']}
      jobGroup: {dict_job_run_metadata['jobGroup']}
      ''')

# Pretty print the dictionary
pprint.pprint(dict_job_run_metadata)

Footnote

Thank you for taking the time to read this article. If you found it helpful or enjoyable, please clapping to show appreciation and help others discover it. Don’t forget to follow me for more insightful content, and visit my website CanadianDataGuy.com for additional resources and information. Your support and feedback are essential to me, and I appreciate your engagement with my work.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Published in Towards Dev

6.6K Followers

Last published 3 days ago

A publication for sharing projects, ideas, codes, and new theories.

Written by Canadian Data Guy | Moved To Substack

450 Followers

17 Following

Moved to Substack https://blogs.canadiandataguy.com . New content only at Substack

Responses (1)

Write a response

What are your thoughts?

Also publish to my profile

Sudip Pandit

Feb 20

This is great post. I really appreciate it. When I use these I see the parent_run_id value as job_run_id in workflow. Can we treat this as batch_id? As I tried to get the workflow url link for exception handling, this parent run id comes at the last part of the URL.

More from Canadian Data Guy | Moved To Substack and Towards Dev

How to Choose Between Liquid Clustering and Partitioning with Z-Order in Delta Lake

Canadian Data Guy | Moved To Substack

How to Choose Between Liquid Clustering and Partitioning with Z-Order in Delta Lake

Oct 30, 2024

This Is How I Scrape 99% of Websites (Step-by-Step Guide)

Towards Dev

Kevin Meneses González

This Is How I Scrape 99% of Websites (Step-by-Step Guide)

“Without data, you’re just another person with an opinion.” — W. Edwards Deming

Jan 15

12 Python Libraries for Free Market Data That Everyone Should Know

Towards Dev

DataScience Nexus

12 Python Libraries for Free Market Data That Everyone Should Know

Access to accurate and timely market data is crucial for traders, financial analysts, and data scientists. Whether you are building a…

Jan 2

Using Spark Streaming to merge/upsert data into a Delta Lake with working code

Canadian Data Guy | Moved To Substack

Using Spark Streaming to merge/upsert data into a Delta Lake with working code

This blog will discuss how to read from a Spark Streaming and merge/upsert data into a Delta Lake. We will also optimize/cluster data of…

Oct 12, 2022

See all from Canadian Data Guy | Moved To Substack

See all from Towards Dev

Recommended from Medium

Performance Tuning — How we brought Down the join runtime from 10+ Hours to under 30 Minutes in…

Praveen Kumar B N

Performance Tuning — How we brought Down the join runtime from 10+ Hours to under 30 Minutes in…

Scenario🔍

Mar 6

Extracting Data from an API on Databricks

Ryan Chynoweth

Extracting Data from an API on Databricks

Introduction

Feb 11, 2024

Testing Databricks SQL Notebooks: A Practical Guide

Filip Niziol

Testing Databricks SQL Notebooks: A Practical Guide

Testing notebooks may not sound glamorous, but it’s essential for building robust and scalable data pipelines. Imagine assembling IKEA…

Dec 30, 2024

100 Days of Data Engineering on Databricks Day 44: PySpark vs. Scala:

THE BRICK LEARNING

100 Days of Data Engineering on Databricks Day 44: PySpark vs. Scala:

Performance Comparison for Data Engineers — Lessons from Experience

5d ago

Implementing Unity Catalog with Medallion Architecture: A Mini Project

Nidhi Gupta

Implementing Unity Catalog with Medallion Architecture: A Mini Project

Project Description: Enable a Databricks workspace with Unity Catalog for centralized data governance and access control. Implement a…

Feb 16

10 MindBlowing Free APIs to Supercharge Your Next Project

The Pythoneers

Abhay Parashar

10 MindBlowing Free APIs to Supercharge Your Next Project

Make your projects 10x better!

1d ago

See more recommendations

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams