The challenge of managing digital clutter looms large. Terry Ray, SVP Data Security GTM and Field CTO at Imperva, explores the hidden dangers of dark data, urging businesses to embark on a comprehensive data lifecycle management strategy to unearth, analyse and eliminate redundant information, ultimately mitigating cyber-risks and freeing up valuable resources in the digital landscape.
We all face problems with clutter. Whether it’s souvenirs from trips with the family, equipment for a now abandoned hobby, or just stuff we haven’t gotten around to getting rid of, everyone has unnecessary bits lying around the home. The same is true for organisations – with the key difference being our clutter at home is unlikely to result in major regulatory fines or severe reputational damage if we don’t bother to get on with spring cleaning next year.
Digital clutter – otherwise known as dark data – is a significant issue, and one that’s only getting worse. Globally, around 328.77 million terabytes of data are created each day, and it’s estimated that around 120 zettabytes of data will be stored by the end of 2023, a rise of 24% compared to 2022.
This may seem like an irrelevance in the cloud era, where every business has access to virtually infinite storage. But storage isn’t free and a significant proportion of those 120 zettabytes is either sensitive information about customers, intellectual property, or partners or subject to regulatory oversight (often both). By hoarding all this information, businesses are incurring major data security risks, often for data they have no intention of ever using.
Are you ever going to use this?
The reason that organisations find themselves with such a glut of dark data is that, out of all the stages in the data management lifecycle, collecting information is the easy part. Leveraging, analysing and destroying data are often far trickier steps. This is why, for the average enterprise, around two-thirds of all stored data could be deleted without impacting business operations. Yet just because the business isn’t getting any benefits from holding it, doesn’t mean the risks go away. Dark data still costs money to store, it is still susceptible to being breached (either by internal or external actors), and it can still result in substantial regulatory fines if found to be improperly stored, protected and used.
There are hundreds of examples of dark data – some of which can be highly sensitive such as past employee records, outdated financial information and transaction logs, emails, internal presentations, download attachments, or even surveillance video footage. Sometimes dark data is known about but unused, although all too often it ends up being forgotten about. Yet it still exists, spread out across an organisation and in a myriad of data repositories, from data lakes to applications.
For modern businesses, dark data accounts for around two-thirds of all stored data, on average, and a large percentage of it can be deleted virtually without consequence because organisations aren’t using it and never plan to do so. So, what are the tangible steps that can be taken to address the issue?
Taking action
For any organisation struggling to get on top of its dark data, the first thing to do is develop a comprehensive data lifecycle management strategy, led by an experienced data protection officer. This means that the lifecycle of every information asset, regardless of where it comes from and what format it’s in, is determined before it is collected.
Who needs this data?
What do they need it for?
How will they be using it?
What protections do we need to put around this asset?
How long does it need to be kept for, once collected?
When will it be destroyed?
Unless organisations can answer these basic questions upfront, they will never be able to get to grips with dark data.
Then, once this strategy has been agreed upon, businesses can start to carry out discovery and classification efforts for all data assets in their environment. This can be done either by independent specialists who can review a data environment and conduct in-depth analyses of unused and uncatalogued data, or it can be done internally with automated tools to capture data wherever it resides. The internal approach is often preferable as it enables businesses to identify other issues such as regulatory violations or potentially malicious or negligent behaviour that could place confidential and private data in jeopardy.
Having developed a strategy and invested in the capabilities to hunt down dark data, businesses are then able to hunt down and destroy reams of useless but dangerous data assets. Destroying these assets simultaneously reduces an organisations cyber-risk while saving substantial sums through reducing the amount of on-prem or cloud storage required for day-to-day activities.
Coming into the light
Dark data may be unused, unimportant and unmonitored, but out of sight cannot be out of mind because when it comes to data security, what you don’t know can hurt you. Indeed, simply by taking up so much storage space it is hurting because it’s eating up budget that simply doesn’t need to be spent, not to mention the dangers if it is compromised.
Investment in good data lifecycle management strategies, alongside proper data discovery and classification capabilities can pay for themselves very quickly by allowing businesses to find and destroy reams of information that serve no purpose other than to consume resources and inflate the risk level. Let 2024 be the year that we all consider going on a data diet.