What is Data Mesh? How is it different from a data lake?
The original architect of the term is Zhamak Dehghani, a Thoughtworks consultant and evangelist for data decentralization. In simple terms, Data Mesh is a distributed architecture approach for managing analytical data. It allows end-users to easily access and query data where it resides, without first transporting it to a data lake or warehouse. A decentralized Data Mesh strategy treats data as a product and provides domain-specific teams with data ownership through a self-service platform that has embedded data governance.
Data Lakes are minimally governed storage areas for raw domain data. They were meant to provide unlimited access to data in an attempt to avoid the bottleneck of centralized, tightly-governed data warehouses, but they tended to suffer from poor data quality and discoverability issues. Certain governed data lake projects have addressed these issues with a modicum of success, but they tend to reduce the relative accessibility of the data as a result. Data Mesh aims to solve these challenges through decentralization, thereby avoiding these so-called “data swamps” entirely.
What is meant by “data as a product”?
I think about it a bit like the app store. You just download an app when you want to do something else. Why shouldn’t it be that way with data? Think about it structurally: what are the components of a data product?
It has to be discoverable: people need to be able to find the data product;
It needs to be addressable – people need to know how to interact with it;
It needs to be self-describing;
It needs to be secure and trustworthy; It needs to offer interoperability.
All of this suggests that a data product sits on a fabric that allows it to interact. It’s not in isolation. You can’t just throw some data together and stick it in an S3 bucket and call it a data product. You have to wrap ownership and governance around it.
What are the benefits for businesses?
There are many benefits Data Mesh can offer to organizations and cross-functional domain teams:
By decentralizing data, it improves speed and accessibility, so data is much more discoverable and consumable for every user in the company.
Because teams onboard their own data and manage their own data products, they can visualize it and operationalize it as they see fit, which drives innovation.
Decision and time to market will be accelerated, which will drive increased revenue and better customer engagement and retention – and ultimately, reduce costs.
And business agility in general is improved with it, as product capabilities are only set up where needed, not on an enterprise basis.
What are the challenges to Data Mesh adoption?
It’s important to remember that Data Mesh doesn’t just require a technological shift, it requires a mindset shift. Organizations have to learn to think about data as a product, about data governance and ownership. Shifting businesses from centralized to decentralized ownership and moving organizations from pipelines to product, where data domains are the first class concern, is going to take some doing.
A few other issues include these cited by Deloitte:
Duplication of data between different domains: when data is repurposed to meet the needs of a new domain that differs from the source domain, redundancies arise and can have a potential impact on resource utilization and the cost of data management.
Implementing federated data governance and quality compliance: with independent data products and pipelines coexisting, quality principles can easily be overlooked, resulting in extensive technical debt. These responsibilities and principles must be identified and federated appropriately.
Significant change management is required: to adopt decentralized Data Mesh operations, substantial change management efforts will be required.
Technology choices determine overall data platform capabilities. Therefore, technology choices that are both standardized across the organization and future-proofed for all necessary data capabilities must be practically addressed. Improper technology decisions can easily result in data products that increase technical debt over time.
Data Mesh isn’t designed to consolidate all enterprise-wide data into a single report: While the overarching goal is data accessibility, there should be freedom within a framework. In Data Mesh, data ownership and data skills are distributed among cross-functional domain teams, so key elements, such as a consistent metadata framework and common platforms remain part of a successful implementation of Data mesh.
When is a company ready to adopt a Data Mesh strategy?
It depends on how prepared the company is. But it also depends on who you’re talking to. A Chief Data Officer who’s built a massive central organization may not be ready for Data Mesh because they will need to first establish how to federate those functions. But most business leaders understand the need to democratize the data asset towards the edges and the business because they’re often frustrated with the centralized approach.
You also need to know what has to happen at an engineering level to be able to control and govern the mesh, because if you don’t set it out correctly, it can turn into the Wild West. So there’s a series of steps to follow.
The first step is to conduct an architecture review to define any core components of a potential Data Mesh architecture that the company already owns and how they can be leveraged to start empowering people to build product teams.
Is there a centralized team that can create the platform on which the Data Mesh will be built? That platform has to be there from the beginning. Infrastructure is what enables distributive capability.
Ensure that the project has the support and engagement of the business and the stakeholders in order to succeed at all levels.
Does the project have the necessary investment to build the Data Mesh as well as the capabilities to manage it? Because both are essential.
Once these steps have been completed, it’s time to start building the product teams.
Transitioning to a Data Mesh is an incremental journey because all the elements you already have – data lakes, data warehouses – need to connect to the Data Mesh, they can’t be discarded. People will want that information and the value and governance that’s already wrapped around them.
What kinds of companies are successfully deploying Data Mesh?
Right now, Data Mesh is being successfully adopted in the financial services sector. ING is a good example. It makes sense for banks to use Data Mesh – it supports stronger data governance, so it offers increased security. With Data Mesh, fraud detection systems don’t need to connect to other systems and pull the same data every day. Instead, organizations can create domain-focused data products that their anomaly detection experts can use to create better models and outcomes.
Zalando, which is Europe’s leading online platform for fashion, decentralized their data in 2020 and turned their massive data lake into a Data Mesh. As for other sectors, we’ll have to see how it goes on a case-by-case basis. Because any business case you create for Data Mesh will need to be tailored to the organization’s – and the sector’s – specific challenges, and those are in constant flux.
Data management strategies are always evolving and organizations need to be prepared to adapt to changes in order to stay competitive. Data Mesh is a way to break down the silos of unwieldy monolithic architecture systems and decentralize data for end-to-end accountability and scalability. Whether Data Mesh is right for your business – or not, or not yet – is the question.