Daniel Avancini, Chief Data Officer, Indicium, on dbt as the most ubiquitous tool of the modern data platform.
I still remember the first time I came across dbt. It was sometime in mid-2019 and like most data consulting companies then, our work was mostly code-based ETL pipelines using lower-level cloud infrastructure that you could find in most startup tech engineering blogs.
As a relatively small consultancy in a very hot data engineering market, we found it difficult to scale our team without a different approach. Also, most of our team’s background was not in engineering at the time. Our CTO was adamant about proper software engineering best practices with data, drawing from his experience working with big-data pipelines in his last role. After reading some of dbt Labs’ CEO Tristan Handy’s blog posts about analytics engineering and dbt, it became clear to us that dbt could be this missing piece that would enable our analysts to work like engineers. Or, to put it simply, to become analytics engineers.
In hindsight, the geniality about the early versions of dbt (it was just called dbt back then, but now it would be dbt Core) was not the complexity of its code or features, but rather its simplicity. Most legacy ETL tools such as Informatica or Pentaho that cater to non-engineering professionals were clunky, full of distracting features, and worst, had almost 0% coverage of any SWE best practices that are a must for modern data work.
On the other hand, working with modern data platforms such as Snowflake and Databricks requires much deeper technical knowledge than any typical data analyst would have, making it a data engineer-only realm. That meant that for most companies, despite being able to build data pipeline orders of magnitudes faster than the previous technology allowed, there was a real constraint on how to scale the data org since there were so few professionals that could work on it. Worse, many data engineers dislike talking to business users or even writing SQL queries at all, so the data organization was kept inevitably far from the lines of business where the business value of data lives.
Despite dbt being in its early stages, we built an entirely new analytics engineering practice on top of it, made up of professionals without a software engineering background but with very good analytical skills. To accelerate that movement, we developed our analytics engineering course, open to the public, that has since trained more than 1000 analytics engineers who work for Indicium, our customers, or in multiple other companies. To date, we are among the top certified partners of dbt worldwide. There is no doubt dbt is a big thing for any modern data team.
But what about dbt Cloud? Well, for many early adopters like us, dbt Core was already good enough for our work. Also, many features launched with the first versions of dbt Cloud were already developed by our platform teams or by the open-source community. Until recently, there was little value for us to move to the Cloud. And don’t get me wrong, a lot of those features are needed by dbt Cloud to be a good tool in itself. The problem for dbt Labs was that for many companies adopting dbt, as they left Plato’s cave of modern data stack ignorance, there were so many possibilities to improve their data platform best practices with dbt that most platform teams became advanced users of dbt, which was not the main user persona of dbt Cloud. But then, who is?
I believe that there are three main personas for dbt Cloud: a) companies that are born into the modern data stack and don’t have/don’t want to keep a large data team, b) enterprise companies that want to scale their dbt Core implementation into the lines of businesses and want a tool that can let them implement data management and data governance best practices while keeping the complexity low for less technical LOB analytics teams and c), companies that are relatively late in adopting a cloud data warehouse and are just now migrating away from legacy data tech, such as Talend and Informatica. Until now, it wasn’t always compelling enough for some of these personas to adopt and implement dbt Cloud. So, why do I think that will change?
In my opinion, the new announcements from dbt labs at this year’s Coalesce conference are all in the right direction. First, dbt Labs is acknowledging that it has to do more than just the data transformation part if it is to be the single data tool for smaller organizations and/or other companies without a dedicated data platform team. Features like orchestration, data cataloging or even data ingestion are all necessary. They all currently need a set of different tools that may be hard to combine and also expensive. The vision of dbt becoming a data control panel is good and goes in tandem with the consolidation trend we at Indicium have seen in the modern data stack space in the past few years.
Arguably, the biggest announcement at Coalesce was the One dbt strategy. First, there is real value in a hybrid approach of dbt Core and Cloud, with the first being developed by platform or CoE-style teams, and the latter focused on less technical LOB teams. A first-class experience for this hybrid approach in dbt Cloud is a must-have for many of our enterprise customers. Second, while most advanced features of dbt Cloud had already been developed internally by dbt power users, this is not the case for hybrid cloud and data mesh architectures. There is no single tool or platform that can deal with this ever more common practice in the enterprise, even when using the same cloud provider (e.g. Databricks + Snowflake platforms).
With Iceberg becoming the de facto standard for modern data storage, there is a real opportunity for dbt to become the missing piece between those data platforms, allowing teams to develop their tools without losing governance and DataOps best practices.
Finally, while there is a long-time conundrum between code-based and no-code/low-code development for data transformation, this is a must-have feature for less technically minded engineers and a very common requirement for enterprises. Having this feature inside dbt Cloud and integrated with the dbt development lifecycle is a good move by dbt Labs.
I’m confident that dbt is the most ubiquitous tool of the modern data platform. More than just a tool, dbt allowed companies to close the gap between business and data with the rise of the Analytics Engineering role. With the new strategy and release announcements, dbt Cloud is solving real technical problems and serving business needs that dbt Core cannot serve alone – and I can see more and more use cases where dbt Cloud provides a compelling advantage over running dbt Core.