The event organized by dbt was back this year. You could physically attend in New Orleans or watch the talks online.
As dbt adoption is rising, we were expecting a lot from this conference. Sessions on different topics that were not limited to the use of dbt were proposed. For instance, there were sessions about career tracks for data teams.
Without further delay, here are the key lessons from this edition in my opinion:
Let’s dive into the details.
Python models, finally!
It certainly was the most expected feature. You can now execute Python models. The behavior is very similar to SQL models.
This feature is game-changing. I think we are a lot to experience the same issue with a workflow we can’t run end to end because of one or two operations that are very tricky to do in SQL. This is painful because we need an extra layer. We don’t want to manage this back and forth between dbt and another component.
This was the case in particular for advanced statistics, text manipulation, and everything that is ML-related (feature engineering, data enrichment …). Those edge cases are the target use cases of Python models. Product managers have been very clear during the keynote that it will be for basic use cases that imply data transformations. Calling external APIs is not recommended.
So, how does it work?
First, similar to SQL models, the code will be executed on your cloud data platform.
Second, in the same way as SQL models, you must adapt your syntax depending on the underlying cloud platform. In SQL, you need to use the appropriate SQL dialect. In Python, you have a different set of libraries that will be available.
The feature is available on three data platforms as of today:
For example, if you use Snowflake, you can leverage snowpark for your transformations. Note that the feature is still in the early days as mentioned by Eda Johnson and Venkatesh Sekar in their talk “Empowering pythonistas with dbt and snowpark”. snowpark is still in public preview.
As stated during the keynote, there is room for improvement to get closer to the experience of a Python software engineer (facilitate code reuse across models, provide test capabilities, and use docstrings for documentation …).
A lot of improvements for dbt cloud
A few months ago, a blog post entitled “We need to talk about dbt”, written by Petram Navid made waves. Tristan Handy, the CEO of dbt labs, replied to Pedram’s concerns, especially the ones about dbt cloud. Indeed, in the original blog post, the long-time dbt practitioner pointed out the poor experience he had on dbt cloud. Tristan agreed that they should work hard to improve the developer experience.
And they did! This week, dbt Labs announced a complete revamp of the cloud IDE, UI improvements, and a reduction of the latency for common operations such as saving a file.
This will be good news for dbt cloud adopters!
The semantic layer is a structural shift in the way you manage your data
This is a hot topic!
During the keynote the speakers defined the semantic layer as “the “platform for compiling and accessing dbt assets in downstream tools”.
The semantic layer aims to solve common data governance challenges:
The goal here is to extend the scope of dbt. For now, the scope is limited to the transformation layer. We could add this semantic layer on top of the transformation layer.
This makes sense. In version 1.0, metrics had been introduced. This was the first step toward the vision of a semantics layer.
dbt at the heart of the modern data stack ecosystem
What hit me during this conference is the number of partnerships announced. Also, a majority of the talks were given by partners.
Software vendors like Atlan, Collibra, or MonteCarlo need to integrate to dbt because their customers asked them to. dbt is slowly becoming the standard for data transformation. You want to see your transformations in your global data lineage that might be managed with an external tool like Collibra. You also want to monitor the results of your dbt tests with your preferred tool etc. You need integration between your tools.
Unlike dataform, the only competitor to dbt as of today, I have the feeling that dbt labs wants to remain cloud-neutral. They offer a lot of integrations with niche solutions to better manage your data quality or your metadata for example.
That’s a wrap! This edition was very rich. And we end this week with a lot of discussions about the announcements. That’s what makes this job exciting!
Speaking of which, we hire at Artefact! I’m sure you didn’t see it coming 😉