Five data engineering and management challenges that tools alone won’t solve

Five data engineering and management challenges that tools alone won’t solve

Daniel Zagales, VP of Data Engineering at 66degrees.com talks through his top five key data engineering and management challenges that tools can’t solve on their own.

Businesses have no shortage of tools to choose from when planning a data engineering and management strategy.

Yet tools alone don’t solve the complexities of modern data management. To maximize the value of data engineering solutions, you need to pair your tools with the right processes and people.

To illustrate that point, I’d like to walk through five key data engineering and management challenges that tools can’t solve on their own.

1. Bringing data under one roof

Siloed data is not valuable data – and unfortunately, many businesses find that their data is highly siloed between different platforms and applications.

Finding a way to integrate data from multiple sources so that you can process and analyze it from a central location is paramount for transforming data into value.

Integration can be difficult, however, because every two data sources that you want to integrate often require a bespoke solution.

That means your engineers get stuck having to build and maintain myriad custom integrations – an approach that distracts them from other tasks and that is difficult to scale.

Managed data integration and replication platforms, which automate the process of moving data between disparate sources, can help solve this challenge. However, those platforms come with a cost, and they may not support every type of integration you need to build.

To ensure that you can integrate data in a cost-effective way you must leverage integration platforms in a strategic way.

Determine which types of integrations will deliver the most value, and which ones are most costly to build yourself,  then choose a data integration platform accordingly, while also ensuring that you have the engineering talent in place to build any critical integrations that your platform can’t handle.

The bottom line here is that while data integration tools can solve the data silo challenge to a significant extent, many organizations will find that data integration platforms can’t bring all of their data seamlessly under one roof – or, if they can, the cost of the tools may outweigh their benefits. Rather than blindly tossing tools at the challenge, businesses must be strategic about how they balance tools with costs, while also reserving the personnel required to handle integrations that tools can’t manage.

2.    DevOpsifying data management

By now, the typical organization has embraced agile DevOps processes as the basis for software delivery strategies. But in many cases, data management strategies haven’t kept pace. Organizations struggle to update data schema and pipelines as quickly as they can update application code, which means that slow data management can become a bottleneck for deploying applications that require changes to data resources.

To solve this challenge, organizations need to be able to automate, version-control and continuously manage their data just as they do application code.

The tools that they already use for DevOps – like the software you’ll find in a CI/CD pipeline – can help with this task, but only if you extend them to enable automated management of data resources in a way that is in sync with software delivery.

In addition, it’s important to establish metrics (like the speed at which data schema are persisted and updated) for monitoring the health of data pipelines, just as DevOps teams use metrics (like app release velocity) to manage software delivery pipelines. Without data pipeline metrics, you’ll be shooting in the dark when it comes to ensuring your data management processes effectively complement and enhance your software delivery processes.

3.    Make data management repeatable

Data management should also be similar to DevOps in the respect that data management should be grounded in common frameworks that enable consistency and reduce friction.

When you standardize your approach to data management, you can place your focus on analysis and the creation of value, because you’re no longer distracted by the work of having to manage data in inconsistent or ad hoc ways – just as a healthy DevOps strategy lets you manage application delivery in a repeatable, efficient manner.

4.    Data cost management

Ensuring that data resources and processes deliver maximum financial value requires deep visibility into what the cost of each data product is, and how that cost varies depending on factors such as how many people use the solution or which volume of data it ingests or outputs. With that information, you can determine whether the solutions and configurations you have in place are optimal from a cost perspective.

Data product providers offer basic cost monitoring tools that can help you gain some level of visibility into what you’re spending in situations like this. But for deep and granular visibility, you must pair data cost monitoring solutions with effective processes, such as tagging of data assets, that help you make more sense of the information produced by monitoring solutions.

5.    Making data accessible to users

Data isn’t worth much to your business if the people who need it struggle to access it.

Unfortunately, the more complex and sophisticated data products are, the more challenging it often becomes for non-technical users to consume data.

By building semantic layers on top of the platform that you use to manage data, you make it easy for users to consume data with less friction, and through the tools they prefer to work with.

Because each user’s data processing and sourcing needs are different, you can’t simply deploy automated solutions and call it a day, you need to assess the requirements of your data processes use cases and users, then implement solutions tailored to them.

You must also establish governance rules to ensure that data is managed securely as it flows across your business to serve the needs of your various users.

Conclusion

Tools are only part of the equation for modern data management – you can never fully remove people and processes from the picture.

In many cases you’ll find that the more sophisticated your data management tools are, the greater your need for complex processes and skilled people to help maximize the impact of those tools will be.

Browse our latest issue

Intelligent CIO North America

View Magazine Archive