Unpacking the future of data management (Part 1 of 3)

By Published On: May 5, 2023

The amount of data that we create, capture, copy and […]

The amount of data that we create, capture, copy and consume is increasing exponentially: in the last decade, it has doubled every three years or so. This makes the future management of big data one of the defining challenges for modern organisations, particularly those small to medium scale operations that may have previously leveraged commercial off-the-shelf products to meet their data needs.

In good news, the last decade has also seen major advances in data storage and processing compute. And one such advance – still nascent, but showing real promise – is ‘data mesh’.

Data mesh is a new approach to managing the challenges of traditional data warehouses and data lakes. The concept is a decentralised data architecture designed to solve issues caused by the proliferation of data sources, the diversity of data use cases and users, and the need for accelerated response to change.

In this series we’ll be answering a few key questions over the course of three articles: What is a data mesh? How do you create a data mesh? And should your organisation be implementing a data mesh?

But in our first article, let’s begin by gaining an understanding of the concept itself.

What is a data mesh? And why is it important?

A data mesh is a type of data architecture that sees each domain within your business – asset operations, Property, HR, Finance, customer service, etc. – controlling its own data. This decentralised approach grants each team the autonomy to perform cross-domain data analysis.

In 2019, Zhamak Dehghani, then the Principal Consultant at Thoughtworks, posted a thinkpiece about the need to move beyond monolithic data lakes. Her suggestion was a ‘data mesh‘ – a decentralised data architecture that leverages a domain-oriented and self-service design. Rather than relying on a central data team, a data mesh distributes the responsibility for data across domain teams.

Four years later the concept of data mesh remains largely that: a concept. This is because data mesh isn’t a technology, but an idea that draws on a number of technologies – some existing, some still developing – that can be unique to the data mesh being created. That said, data mesh implementation is now a possibility for many SMBs. 

It’s better to think of data mesh as a set of principles that look to address the challenges of managing big data, particularly for the areas in which centralised data lakes tend to fall short, including:

  • Where data sources are spread across the organisation with varying levels of quality.
  • Where a variety of ETL jobs are running in different systems to pull this information back to a central data warehouse.
  • The need for data warehouse teams to spend a huge amount of time fixing and cleaning data.

As business units of all shapes and sizes digitally transform themselves with low cost SaaS solutions, or by outsourcing the hosting and management of enterprise systems to a third party vendor, a decentralised approach will become ever more critical, as much of an organisation’s data will sit out of reach of the centralised system.

The best way to understand data mesh is through the four principles that govern the concept.

What are the 4 principles of data mesh?

The concept of data mesh is built upon four guiding principles:

  1. Domain-driven ownership of data
  2. Data as a product
  3. Data availability and self-service
  4. Data is governed where it is

Let’s take a closer look at each.

Principle I: Data ownership by domain

Data mesh is decentralised: each business unit or ‘domain’ owns its data and shares it by exchanging information through the mesh. This is perhaps the defining argument for data mesh: that business units deserve control over their own data, because they know it best.

This is in contrast to a data lake, where a centralised data warehouse team manages all data. Sure, this team will be experts in data warehousing, but they won’t be experts in the data itself.

Data mesh success is predicated on each business unit building a culture of data collaboration: each stakeholder in every domain must be a good data citizen. The good news is it’s in the citizen’s best interests to be just that.

Principle II: Data as a product

The concept of data as a product is the default for SaaS products and enterprises hosted by vendors as a managed service. Data ‘products’ can be thought of as nodes on the data mesh, and each node includes everything it needs for its function. By thinking of their data as a product, teams are encouraged to consider other business units as their customers or data consumers, which can help to ensure a high-quality data pipeline and prevent data chauvinism.

From a functional perspective, the process of getting data in and out can be managed through event streams, webhooks, APIs or, if necessary, traditional ETL.

Principle III: Data availability and self-service

The one centralised element of the otherwise decentralised data mesh is the need for a self-service data platform that allows users to autonomously and instantaneously retrieve and input data into their specific products. While the data itself remains decentralised, there does need to be a centralised service, sometimes referred to as a message exchange platform, to broker these transactions.

Data mesh requires a searchable catalogue of all the data products that allows users to quickly see all they are allowed to see (and potentially what they aren’t). This self-service functionality should encompass all historic data as well as new incoming messages.

Principle IV: Data is governed where it is

While a decentralised system is built on a foundation of independent and autonomous teams, centralised global data management standards should still be implemented across the mesh, as this gives the C-suite assurance that risk is being minimised. Teams can collaborate through data while applying governance and best practice basics, such as:

  • Maintaining canonical records and data schemas.
  • Detecting and recovering errors.
  • Tracking data lineage.

The application of best practices will be a bespoke and continually evolving process. As Tim Berglund of Confluent stresses, organisations need to “be pragmatic, no governance system is perfect”.

To further explore the topic, we recommend reading the next article, “What is the role of FME in a data mesh?” This article delves into the role of FME in enabling data mesh implementation and aligning with data mesh principles. Additionally, for a comprehensive understanding of data mesh, read the third article, “Frequently Asked Questions about Data Mesh & FME,” which provides insights into common questions and considerations related to data mesh implementation.

Want to be notified about what we’ve been up to?

    Sign up for our newsletter