Komprise Co-founder Krishna Subramanian on what data observability means in unstructured data management and how you can start to generate this intelligence.
In DevOps and IT circles, the word “observability” has been bandied about for the past few years. Observability is one of those hot and trendy terms which also means different things to different people. Yet the goal is generally the same: how can we observe our environment and then proactively and even automatically make fixes to things that aren’t working, are anomalous, suspicious and/or could potentially cause a disastrous outcome? Such outcomes could include a network failure, a security breach, a server reaching capacity, or in the unstructured data management world – something else entirely.
People managing unstructured data don’t often think about observability; they’re simply trying to maintain high performance file data storage systems, reliable, secure access to their data and control costs. They prefer not to hear from users (much less executives and department heads) that access time is sluggish or their data seems to be somehow “missing.” A data observability practice can aid those goals and protect data for long-term needs.
In this Q&A, Komprise COO Krishna Subramanian talks about what data observability means in unstructured data management and how you can start to generate this intelligence.
What do we mean by unstructured data observability?
Data observability is more than monitoring and alerts when it comes to unstructured data. It can provide a complete view of the files in an organization, regardless of where they are being stored, how they are being used and by whom, how fast data is growing, and any out-of-ordinary data storage and access patterns. This visibility gives organizations the means to reactively solve problems and hopefully, proactively prevent future problems from occurring. Unstructured data observability with analytics and reporting can also help increase collaboration across teams, improve planning, and help troubleshoot issues faster and more efficiently. Unstructured data observability gives insights into issues such as sensitive data being stored where it should not be. It may be useful to integrate data observability tools with IT service management software such as ServiceNow and Splunk.
What are some of those data observability metrics and findings that IT should be tracking?
This list is bound to grow with AI requirements, but here are a few points you can track on your unstructured data:
· Data growth rates
· Top file types and if they change or grow suddenly
· Top data owners and if they change or grow suddenly
· Zombie data amounts and changes
· Orphaned data amounts and changes
· Storage capacity metrics
· Data access speeds
· Percent hot data
· Percent cold data
· Percent data on source
· Percent data modified, moved, new, free and full, on storage
· Sensitive data
What emerging technology trends do you foresee shaping the future of unstructured data observability, and how can organizations leverage these trends to enhance their data monitoring and troubleshooting capabilities?
AI and data governance are two major areas that can aid unstructured data observability. AI can help by spotting anomalies and trends faster and by enriching the contextual and other information about your file and object data across hybrid cloud storage. Richer metadata can improve unstructured data observability by providing more dimensions for analytics, making it easier to spot trends, anomalies and issues. For instance, metadata tags for PII or IP can show IT if and where sensitive data is stored out of compliance.
Unstructured data observability goes hand-in-hand with data governance programs because security vulnerabilities, unusual activity such as a large amount of deletes or a user’s personal folders spiking quickly in size can indicate a security or compliance incident. Unstructured data management solutions allow IT users to drill down into directories and file shares to investigate potential problems that have been alerted by IT monitoring and cybersecurity systems.
Does data observability help with ransomware defense?
Absolutely. With the ability to get more insights on data in storage, IT teams can make better decisions for its management, and this includes reducing the on-premises footprint of attack. For instance, if you can see that the organization has 65% of data that is “cold” and hasn’t been touched in more than two years, then it’s an easy decision to move that data to object storage in the cloud. That way, it’s out of the data center attack surface and if stored in immutable object storage, it’s further protected as ransomware actors cannot modify or delete it.
What are some best practices for implementing a data observability strategy within an organization, particularly in complex environments with hybrid data pipelines and diverse data sources?
Many organizations say data observability is a modern moniker for data monitoring. If instead, they viewed unstructured data observability as a broader topic encompassing data analytics and reporting that feed into actionable unstructured data management, then observability becomes a much richer function that can not only help resolve issues faster but also proactively grow the value of the data that is growing the fastest and costing you the most. Since data is retained for long periods of time, it makes sense that data observability goes beyond alerting and near-term reporting to focus on longer term trends which help IT proactively manage and assess the data estate.