Big data is seeing organisations becoming more mindful of aligning their data management practices with newer paradigms such as data lakes, elastic and cluster computing, and real-time data. Stefan Kruger and Peter Gaertner, senior data architects at Decision Inc, say that the cloud provides an environment capable of managing the volume and scale required to do so.
There has been a continued drive to improve the mechanisms for dealing with the processing and analysis of big data volumes, such as Machine Learning (ML).
However, despite these innovations, challenges such as bad or biased ML models are not easily solved and require continuous attention and improvement moving forward.
This is where an effective data management plan (DMP) is essential. It empowers the business stakeholders to consume and model the necessary data and generate business insights from it. A data management plan does not compromise on security and serves to establish a flexible, agile, and forward-thinking environment needed to cater for big data in a digital world.
Regulatory drive
An integral part of the DMP is to ensure the company remains compliant with current regulations. This requires an understanding of how the collected data is stored, used, and analysed within a complex regulatory environment.
Companies must ensure the affected data is anonymised and encrypted as necessary while applying transparent principles on how it is collected, and the use cases it provides for. All this should be applied in a legal framework suited to all relevant legislative guidelines as set out by governments (think South Africa’s POPI, the GDPR of the EU, and Australia’s CDR).
And while this regulatory context is key, companies who are adapting their DMPs must also be aware of the most impactful trends in data management. Things such as automation and DataOps, the continued hybrid and full cloud migration of organisations, data streaming from Internet of Things devices, and of course the compliance focus.
Subtle differences
Part of this ‘new’ data environment is DataOps. Contrary to popular belief, this is not just DevOps for data. It requires data management solutions to take into consideration key aspects that have traditionally been less prevalent in the data management space.
For example, ubiquitous version control and collaboration. This is necessary to carry the full solution in centralised source control and facilitate collaboration between stakeholders. It often has the nett effect of reducing the reliance on traditionally UI-driven tasks and changing these to automated scripts.
Secondly, DataOps sees infrastructure as code also playing a role. This defines the infrastructure needed by a solution and packaging it as a defined, dynamic, and versionable part of the data management process.
Underpinning these elements is the need to enable automated testing routines that run when a new version update is done as part of the DMP process. Automated feedback and flagging of testing outcomes to the required stakeholders form a core part of this.
Good data
Part of ensuring an effective DMP is put in place comes down to process engineering. Companies must embrace good process and change existing ones where necessary. Decision-makers must avoid the risk of simply digitising or regurgitating bad processes.
Also, it must be remembered that on-premise infrastructure does not necessarily always translate directly into cloud architecture concepts. Companies must ensure that any cloud-bound migration (whether hybrid or full) is designed to leverage the targeted cloud benefits where it makes the most sense and delivers the most impact from a functionality and a cost savings perspective.
Data management will continue to grow in impact as more businesses embark on exercises in digital transformation. The cloud-oriented focus will continue, but the legacy of on-premise solutions will still be around for some time, especially in enterprise environments.
The convergence of solutions catering for data management (such as master data, data quality, integration, storage and intelligence) will remain an interesting area to watch as technologies evolve. However, a DMP requires a considered approach that factors in both existing and future data requirements. Companies must be willing to embrace change in this new digital world.