As organisations attempt to optimise Data Virtualisation, understanding how to connect, combine and consume data is a critical initial step. Vincent Gaorekwe, CTO at BITanium, tells Intelligent CIO Africa about how BITanium’s practical approach differs from other vendors and why it is gaining so much traction now.
What is Data Virtualisation?
Data Virtualisation is a modern data integration capability that inserts a ‘virtual’ or ‘logical’ abstraction layer between the consumers of data and the data sources. This means we can connect to any data source regardless of type, format or location; combine any data regardless of type, format or location and consume any data regardless of type, format or location.
How does Data Virtualisation Work?
Unlike ETL solutions and the more traditional approach of replicating and storing data in a central repository, Data Virtualisation leaves the data in the source systems and rather uses the underlying metadata to create a ‘virtual view’ of the data. The virtual views can be created in minutes and users can then use these views to build virtual data sets, data marts, data warehouses or data products.
Many organisations are using Data Virtualisation as the ‘data delivery engine’ for Data Fabric and Data Mesh architectures or even a ‘universal data access layer’. Others are simply using Data Virtualisation to address tactical challenges like delivering urgent data sets quickly and easily or combining disparate data to build a Single View of Party. Whatever the use-case, Data Virtualisation is the quickest, easiest and safest way to deliver enterprise data and is being widely adopted by data teams big and small.
What are the main benefits of Data Virtualisation?
First of all Data Virtualisation is by far the most cost effective way to deliver data to the enterprise. Eliminating the need to transform, replicate and store data offers a significant saving over the traditional approach. Many users report well over 50% reduction in infrastructure and integration costs as a result of Data Virtualisation.
Next is how quickly data can be made available to the teams that need it. To connect a new data source is simply a matter of entering access credentials and settings to create a virtual view. This can be completed in minutes and the virtual view is then ready for consumption or combining with other virtual views. The demand for data is increasing daily along with the volume and complexity. The traditional approach simply cannot keep up and a new way is needed. In most legacy environments we’re seeing a hybrid approach where the traditional, embedded platforms run alongside the more agile virtual layer.
Another key benefit is the ability to integrate any data with any other data regardless of type, format or location. What would have taken months can now be achieved in hours or days. We’ve seen data from multiple sources combined into a reusable data set/product in minutes with zero ETL and zero replication.
While there are numerous other benefits, the final one that I’ll mention is security and governance. This is a major challenge for all data teams not only from a technical perspective but more importantly from a user adoption standpoint. How do we get our users to adhere to our security and governance policies? Simple! Make the easiest and quickest way to access the data you need also the safest way.
The main reason users create their own rogue data sets is because they can’t afford to wait for the data team to deliver. Business often has to wait for months for data and in some cases even longer. The result? They bypass the data team and do it themselves going directly to the data owners and making copies that either end up in the various reporting and analytics platforms or even in spreadsheets on analysts laptops… With Data Virtualisation, all data accessed through the ‘virtual layer’ is auditable and secured based on user roles with encryption and masking applied according to the sensitivity of the data.
What are the main use-cases for Data Virtualisation?
Fundamentally Data Virtualisation is used to deliver data to the organisation in the quickest, easiest and safest way possible which opens countless opportunities. One of Data Virtualisation’s key strengths is combining data from multiple sources into a reusable data set/product which also supports many potential use-cases. Here are some of the most common that we typically come across:
Logical Data Warehouse
Combine data from multiple sources to create a virtual data warehouse, lake, mart or data product.
Single View of Party
Combine data from multiple sources to gain an accurate record of customer, employee, supplier or entity.
Application or Cloud Migration
Use the virtual abstraction layer to shield the user from underlying complexity of the data environment. Data teams can migrate from one application to another with zero impact on the user. Same applies to cloud migration as the user experience remains consistent through the virtual layer while the underlying data platforms evolve.
Cross Border Data Sharing
With strict rules regulating how data is shared across regions and borders, Data Virtualisation provides the ability to connect, combine and consume data without the need to move the data and apply security controls dynamically as required.
Data Factory
Build and deliver reusable data products quickly where the ‘logic’ remains in the virtual abstraction layer and not the source or analytics and reporting platforms.
Data Fabric/ Data Mesh
Both of these architectures seek to decouple the complexity of the underlying data sources from the consumers and deliver reusable data product to the enterprise. Data Virtualisation plays a key role as the data delivery engine.
Why is Data Virtualisation gaining such traction now?
The main reason is because it’s a great way to connect, combine and consume data. Data teams are under increasing pressure to modernise their platforms to meet the increasing demands for data for analytics and insights. As more and more high-profile organisations adopt Data Virtualisation it gives others the confidence to adopt the new approach. In the past Data Virtualisation was seen as niche and new but now it’s mainstream and most data teams are becoming aware of and adopting because it works and they can’t continue with the old paradigm.
Where is the best place to start with Data Virtualisation?
Typically we identify a specific use-case or problem to solve and use that as the starting point. Once the rest of the organisation sees first-hand how it works and the impact then there’s no shortage of projects to take on thereafter. So the answer is start small and prove the capability and then expand from there. The goal should be a universal data fabric layer but the practical approach is one use-case at a time.