Shedding light on dark data: Is it an opportunity or risk?

Big Data continues to gain more and more visibility with businesses and organizations of all sizes with a number of conferences on the subject planned for 2019. We’ve talked about it here on TechGenix by describing how Big Data analytics is driving big changes in the oil and gas industry and how combining Big Data analytics with sensor technologies creates the potential to change the landscape of just about everything under the sun. And we’ve also touched previously upon some of the security issues associated with harvesting, managing and maintaining Big Data. What we haven’t yet talked much about, however, is the other side of Big Data or what the industry calls “dark data.” In this article, I’m going to describe in some detail what dark data is all about, the opportunities and risks associated with the dark data a business has, and how a business can properly manage its dark data.

What is dark data?

Dark data is basically just Big Data that is hidden in plain sight and not being used or explored by the company or organization that collects, stores, and owns it. Which just pushes us back a step to defining once again what Big Data is all about.

Big Data, of course, is simply unstructured data that your business or organization generates through its activities. By unstructured we mean types of data like documents, spreadsheets, presentations, maps, photographs, audio recordings, video recordings, social media activities, online reviews of products, comments on websites, emails, instant messages, text messages including emojis, telephone recordings, logs from network devices, logs from servers, recordings from security cameras, data from sensors, data from Internet of Things (IoT) devices, and so on and so forth. Virtually anything that can’t or isn’t normally stored in a relational database can be called Big Data.

And guess what? Most of that data is dark.

What I mean is that most of these different kinds of Big Data that a company accumulates in vast quantities and stores in various locations is unappreciated, unexplored, and unused. Often because those who might make use of it aren’t even aware that it’s there.

Dark data opportunities

It’s that last point where the opportunity arises. Some examples:

  • If only our sales department had known we collected such data we could have created better customer sales profiles that would allow us to know what our customers might need next year so we could start developing it.
  • If only we had known such details concerning the buying behaviors of our customers we could reach out to them before our competitors do with products and services tailored especially for their needs.
  • If only we had known about that data we would have been able to create a kind of digital thread stretching along our value chain so we can better forecast the future supply needs of our business to optimize delivery of products and services to our customers.
  • If only we had been able to access and analyze all the customer interactivity data we had collected such as when they contacted us, what medium they used to reach us, who on our team they interacted with to answer their question or solve their problem, how frequently they have interacted with us, and so on. If only we had been able to use our Big Data analytics tools on that information we would have been able to provide our customers with better support to enhance customer retention. But we didn’t even know such data existed, and it was within the reach of our hands!

Dark data risks

There’s also a dark side, however, that’s associated with dark data. What I mean is that using (or misusing) dark data can result in your business or organization incurring possible risks. Let me give you a few examples to illustrate.

Let’s say that your company sells products for new mothers in addition to other kinds of products for young women, and you routinely collect mountains of structured and unstructured data from customers who shop on your website. By utilizing Big Data analytics technologies like Hadoop, DeepDive, Snorkel, Datumize, or one of the many Big Data solution providers popping up and disappearing as the whole Big Data analytics field continues to evolve and mature, your analytics tool algorithmically determines that one of your customers is probably pregnant to a certain level of statistical likelihood, and it automatically sends out a gift certificate to the customer, both using the customer’s email but also in hardcopy by regular mail.

Then the mother of the customer sees this in the mail and it’s the first inkling she has that her daughter is pregnant, so she gets upset with your company and files a privacy complaint against it. Oops! You’ve just stepped into the risk pond that’s associated with the dark data lake your company manages, or should be managing properly.

If you run a business, then I’m sure you could think up many other risk scenarios like this one that might happen to you if you plumb and utilize your pool of dark data unthinkingly. But there are other risks that companies may face because of dark data they collect and store. For example, in the ever-increasingly interconnected world we inhabit the risk of having your network hacked is steadily increasing. Dark data, of course, is data you don’t even know you’ve collected or have stored away somewhere, so if a breach occurs and the attacker steals some of your dark data you may not even know about it, ever. And yet the theft can have serious consequences for your business, including legal ramifications, compliance issues, and also possibly for your customers through violation of their privacy.

In other words, not only is identifying and bringing out into the light (analyzing) your company’s dark data a terrific opportunity for growing your business, it’s also a necessity if you want to minimize the possible risk your company may face to its future profitability and even its existence.

Managing dark data

It’s important for businesses to manage their dark data the way they manage any other aspect of their business that can provide benefit or incur risk. Finding and identifying any dark data you may have ferreted away is obviously the first step, and I’ve mentioned a few solutions you can investigate for doing this (or you can hire an analytics provider to do it for you).

Reviewing what’s been dredged up from the data pond is the next step, and this involves performing a cost-benefit analysis to determine whether the data you’ve identified may be of use or offer any kinds of risks you need to mitigate or offset against. This cost-benefit analysis should also help you rank the value of the different kinds of dark data you’ve managed to uncover within your organization.

Then based on how you have characterized this enlightened data, your next step should be to choose which data you’ll keep and which you’ll do away with. But you shouldn’t just delete any dark data you think you won’t be needing. Instead you need to justify why you’re deleting the data and then route it into your retention schedule for future disposition at an appropriate time. And you should verify later that the data has in fact been properly disposed of.

Finally, it’s a good idea at some point to periodically review whether the dark data you’ve dug up, retained, and utilized has in fact been of significant benefit to your business that it offsets the costs you’ve incurred in finding and analyzing it.

Featured image: Shutterstock

Mitch Tulloch

Mitch Tulloch is a widely recognized expert on Windows Server and cloud technologies who has written more than a thousand articles and has authored or been series editor for over 50 books for Microsoft Press. He is a twelve-time recipient of the Microsoft Most Valuable Professional (MVP) award in the technical category of Cloud and Datacenter Management.

Share
Published by
Mitch Tulloch

Recent Posts

Losing your edge? 7 free tools to keep you focused at work

Staying focused at work in an always-connected world is hard! Here’s how to use tech — and some free tools…

13 hours ago

What’s next in the evolution of biometrics and facial recognition technology?

Facial recognition technology has matured to the point of being reliable — for better or for worse. What does the…

17 hours ago

Locking down your Exchange server with cipher suites

Cipher suites are a set of algorithms you need to secure your environment, either by using SSL and TLS. Here’s…

20 hours ago

AI cyber risks: What to look out for when deploying AI technology

Artificial intelligence has greatly improved modern life. But businesses must recognize that AI cyber risks exist and take appropriate measures.

2 days ago

Review: Office 365 synchronizing and administration tool CiraSync

CiraSync offers an enterprise solution for syncing global address list contacts and calendars to smartphones and other mobile devices. Here’s…

2 days ago

HIPAA IT compliance: Privacy and security rules you must know

HIPAA is the mandatory health regulation that must be followed strictly. But if you’re an IT pro in the health-care…

2 days ago