A Q&A with Komprise Co-founder, President and COO, Krishna Subramanian on hybrid cloud and data management.
Hybrid cloud environments are common today for enterprises but they bring about unique requirements for IT. What are the risks to be aware of when storing and moving data between clouds and on-premises?
Hybrid cloud is the most popular option for most organizations today because they can leverage the best of both worlds: the elasticity of the cloud along with the security of their data center. However, doing this correctly requires some intelligent automation, without which enterprises find they are either spending too much in the cloud by not turning off idle workloads or by keeping cold data at more expensive cloud storage tiers. They may also be compromising data security by not managing access controls properly as data moves across the hybrid cloud. Financial cost modeling, policy-based data lifecycle management and data governance across the hybrid cloud are key requirements for success.
And what are the benefits of hybrid cloud for data management – not just for storage but to support AI, compliance needs and more?
The biggest benefit of hybrid cloud for data management is the ability to spin up additional capacity in the cloud as needed for burst workloads such as AI processing, model training, and inferencing. And this can be done with strong data security by keeping the critical data on-premises and curating data to the cloud as needed. The cloud is also a good place to keep a cost-efficient third copy of data to protect against ransomware attacks at a fraction of the cost of keeping in a datacenter.
What tools or frameworks are emerging to automate routine processes in hybrid cloud management and how do they impact operational efficiency?
Many organizations are using hybrid cloud strategies to get elastic burst capacity in the cloud and optimize infrastructure costs. In both of these cases, organizations need an efficient way to mobilize data across their datacenters and the cloud. Since there are many tiers of performance and cost for storage in the cloud, especially for file and object data, it is important to have a systematic and efficient way to migrate and tier petabytes of unstructured data to maximize performance while minimizing costs. Data management solutions now exist that leverage analytics to intelligently handle all the thorny issues of migrating and tiering data – from moving billions of small files, dealing with spotty network availability, requirements for chain of custody reporting, or you want the fastest way to parallelize migrating petabytes of data. Also, storage independent data management solutions let you transparently tier data across file and object architectures so your users continue to access data from its original location even when the data may be residing on a cloud object storage such as Azure Blob Archive or Amazon S3 Glacier, which are over 20 times less expensive.
How can IT teams avoid vendor lock-in while maintaining flexibility and optimizing their cloud infrastructure?
For a large enterprise, vendor lock-in can be a tough one to solve if your organization has been entrenched with a few key vendors for many years. Big vendors can offer you great deals to continue doing a lot of business with them – but in the long term this is not a smart strategy because of the pace of technological change. Let’s take data storage, for instance. There is continual innovation with faster, more efficient, AI-ready storage platforms on-premises and in the cloud. You don’t want to be committed and unable to easily change your data infrastructure to meet new needs. As well, many storage solutions offer optimizations such as tiering which are proprietary to their filesystem. But if you want to switch vendors, you must rehydrate all the data and buy new high-performance storage from the vendor to move away from them. These downsides of single vendor solutions are why many organizations choose hybrid cloud and try to be storage independent. You need a data management solution that can right-place data across its lifecycle to higher performance storage such as GPU-ready storage or Flash storage and then tier it to cold storage such as the cloud when it is no longer accessed. IT teams want to leverage the best options across vendors.
What role does data fabric play in achieving unified data management across hybrid and multi-cloud ecosystems?
You can create a data fabric with a data management solution that uses standard protocols to interface across a variety of storage and cloud options. This provides a unified way to analyze, mobilize and manage data workflows without imposing a filesystem or a proprietary namespace on the environment and delivers several advantages. You can use a variety of vendors for storage while still getting visibility across those silos and unified management without creating vendor lock-in. The storage-independent data management approach to a data fabric also has the advantage of bringing structure to unstructured data which is crucial for AI. Since unstructured data is often in petabytes, moving all the data to each AI solution becomes untenable as this could take months. Instead, a storage-independent data fabric provides a global index of all the data so you can search, curate and cull down exactly the right data you need for a particular AI task such as RAG and only move that data. It can also tag and retain the results of AI processing, so you do not have to repeatedly run the same AI workflow on the same data, which optimizes costs.