See the light: The challenges of dealing with dark data

When it comes to the problem of dealing with dark data within your organization, the main difficulty is often where to begin. And the best way to start a quest is to make sure you know what you’re looking for. To do this, you start by asking questions. What exactly is dark data? Is it the same thing as the dark web? How serious of a problem can the existence of dark data pose for a business? “Every organization, whether they realize it or not, is basically cultivating its own equivalent of a “corporate dark web” — not to be confused with the dark web,” explains Peter Baumann, CEO of ActiveNav, a team of data experts who have helped some of the world’s largest private and public organizations discover, classify, and manage their data. “This corporate dark web is a dirty, messy environment full of toxic, risky digital information buried deep within a business. If it’s ignored, dark data can lead to major privacy violations, security breaches, competitive disadvantages, reputational damage, and significant storage costs. Most companies are storing all this dark data unnecessarily, and it’s only causing them risk and lost opportunity. It’s a serious problem and one that’s only getting worse over time.”

dark data
Shutterstock

Unstructured data: Largest type of dark data

The largest category of data for most organizations is unstructured data, and unfortunately shining a light on that kind of data when it’s dark can be difficult. “Unstructured is typically the largest or most common form of dark data,” Peter says, “but it’s certainly not the only type. Market analysts such as Gartner estimate that upwards of 80 percent of an organization’s digital information is unstructured. This is certainly what we’ve seen from our work and experience during hundreds of engagements over the last decade. The problem with unstructured versus structured is largely in the name. They can both be opaque. However, what makes unstructured data particularly challenging is that it sits outside of a rigid environment or container, like a database, making it very hard to know how much data you’ve got or what it’s comprised of. How can you protect and secure something you don’t know you have?”

So finding your dark data is the real key here, and the main challenge companies face. How can you ferret out dark/unstructured data lurking within the depths of your company’s infrastructure? Peter explains: “Well, the good news is that you can simply ferret it out, to use your terminology — assuming it becomes a known priority. Also, given it was people and technology that got us into this mess in the first place, it also means that people and technology, sprinkled with good processes, can get us back out of it. As with any significant challenge organizations face, having the right tools and expertise in place is much of the battle.”

What about dark data in the cloud?

But for many companies today, much of their sensitive business data is stored in the cloud. Which raises another important question you need to consider: Is it easier or more difficult to identify dark data when it’s stored on-premises or in the cloud? The answer, of course, is just what you might expect: “That depends on several factors, but to a large extent, the amount of data you have, and the type of data is more important than the location. But also, are they large files or small files, rich media or text, what cloud hosting environment are you using? And of course, there’s accessibility and even how large the pipe is between your cloud to land environment. If your cloud infrastructure has been correctly thought out, and data management is supported with strong and pragmatic policies throughout your organization, then cloud should be both easier and faster — not least because it’s still relatively new. There’s an awful lot of data stored on-premises that could be 20 years old or more. And so, generally, the cloud is easier to manage because it’s newer and will typically have less data to begin with.”

dark data

This is probably a relief for you if your business is cloud-driven instead of more traditionally structured. But it still leaves the basic question of what kinds of steps your organization can take to find and eradicate or make better use of your dark data. Fortunately, well-defined procedures for undertaking this already exist. “Several external frameworks can help,” Peter says, “including governmental or international frameworks provided by different bodies that are specialists in this space. For example, you could use the National Institute of Standards and Technology (NIST) framework. Or you may want to look at some of the other expert advice available from records management groups including the Association for Intelligent Information Management (AIIM) or the Association of Records Managers & Administrators (ARMA). Ultimately, however, you need to do something to get your unstructured data into compliance, and you need to take it very, very seriously. In a typical organization, that means that you’re going to need executive support, which makes executive education a priority.”

Know how big the problem is

This all sounds complicated and a very big task to undertake for most organizations. The key, of course, is to start at the top and then build from the bottom. “Few executives understand the magnitude of their dark data problem,” Peter says. “And most of those that do think someone’s dealing with it already, or they don’t want to have to worry about it — but that’s no longer an option given the continued attacks against customer data. And so, you must get executive support that filters down typically to something like a steering committee where you have champions and sponsors across the business. Then, it’s all about small wins. Get success in one area, demonstrate the success, and turn it into a process. Then roll that out across your organization.”

Building from the bottom works better if you have the right tools to work with. So, a reasonable question to ask next is what tools are available that can make the job of identifying your dark data easier and more measurable in terms of how successful the inventory process is. “To my earlier points, it’s not a question of if there are tools — because there are plenty! The challenge is using the right tools. There’s a lot of over-promising, as is often the case with most enterprise software. This is the hidden risk with this challenge. Software vendors and consultancies over-promise — they think it’s going to be quicker, easier, and they think that it can be done by using the latest techniques such as AI or ML. Largely it can’t (without significant supervision/training). The most important solution attributes for tackling dark data are accessibility, speed, and scale. It’s also crucial to have a pragmatic, technical, and supporting workflow to achieve remediation of the data. From our experience at ActiveNav, what’s incredibly important is experience. Real-life, practical experience. Gartner has established a category called File Analysis, which generally describes the solution set we provide. There are lots of complementary technologies out there such as data loss prevention (DLP) and identity access management (IAM), but importantly, they aren’t purpose-built to tackle this dark data problem. Organizations would be wise to not try to over-leverage these discrete capabilities.”

And how to avoid failure in your question to locate, identify, and neutralize dark unstructured data within your organization? “If you don’t take this dark data problem seriously with the right tools, you’re almost certainly going to fail. Unfortunately, it might take you 12 to 18 months before you realize it, and then you have to go back to the beginning — wasting time, money, and energy that most simply don’t have. Don’t delay dealing with your unstructured data. We see that a lot of organizations go after the easy stuff, such as structured data, because they don’t know what to do with unstructured data or it’s a bigger headache than they’re ready to deal with.

Problem not going away

The plan then is this: “The best way forward is to plan a comprehensive pilot to prove the process before you roll it out across your organization. Don’t forget, this is for life, not for the holidays. It needs to be built into the fabric of your organization, into your IT infrastructure, into your decision-making, and your costs. The growth of data, not least unstructured data, is not slowing down anytime soon. Quite the opposite, in fact. This needs to become a new line item in your business, and you need to have people that live and breathe this every day, or quite simply, you will fail. And you’ll find yourself in the same dark data mess again and again.”

Featured image: Pixabay

Leave a Comment

Your email address will not be published.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Scroll to Top