Big Data continues to gain more and more visibility with businesses and organizations of all sizes with a number of conferences on the subject planned for 2019. We’ve talked about it here on TechGenix by describing how Big Data analytics is driving big changes in the oil and gas industry and how combining Big Data analytics with sensor technologies creates the potential to change the landscape of just about everything under the sun. And we’ve also touched previously upon some of the security issues associated with harvesting, managing and maintaining Big Data. What we haven’t yet talked much about, however, is the other side of Big Data or what the industry calls “dark data.” In this article, I’m going to describe in some detail what dark data is all about, the opportunities and risks associated with the dark data a business has, and how a business can properly manage its dark data.
What is dark data?
Dark data is basically just Big Data that is hidden in plain sight and not being used or explored by the company or organization that collects, stores, and owns it. Which just pushes us back a step to defining once again what Big Data is all about.
Big Data, of course, is simply unstructured data that your business or organization generates through its activities. By unstructured we mean types of data like documents, spreadsheets, presentations, maps, photographs, audio recordings, video recordings, social media activities, online reviews of products, comments on websites, emails, instant messages, text messages including emojis, telephone recordings, logs from network devices, logs from servers, recordings from security cameras, data from sensors, data from Internet of Things (IoT) devices, and so on and so forth. Virtually anything that can’t or isn’t normally stored in a relational database can be called Big Data.
And guess what? Most of that data is dark.
What I mean is that most of these different kinds of Big Data that a company accumulates in vast quantities and stores in various locations is unappreciated, unexplored, and unused. Often because those who might make use of it aren’t even aware that it’s there.
Dark data opportunities
It’s that last point where the opportunity arises. Some examples:
- If only our sales department had known we collected such data we could have created better customer sales profiles that would allow us to know what our customers might need next year so we could start developing it.
- If only we had known such details concerning the buying behaviors of our customers we could reach out to them before our competitors do with products and services tailored especially for their needs.
- If only we had known about that data we would have been able to create a kind of digital thread stretching along our value chain so we can better forecast the future supply needs of our business to optimize delivery of products and services to our customers.
- If only we had been able to access and analyze all the customer interactivity data we had collected such as when they contacted us, what medium they used to reach us, who on our team they interacted with to answer their question or solve their problem, how frequently they have interacted with us, and so on. If only we had been able to use our Big Data analytics tools on that information we would have been able to provide our customers with better support to enhance customer retention. But we didn’t even know such data existed, and it was within the reach of our hands!
Dark data risks
There’s also a dark side, however, that’s associated with dark data. What I mean is that using (or misusing) dark data can result in your business or organization incurring possible risks. Let me give you a few examples to illustrate.
Let’s say that your company sells products for new mothers in addition to other kinds of products for young women, and you routinely collect mountains of structured and unstructured data from customers who shop on your website. By utilizing Big Data analytics technologies like Hadoop, DeepDive, Snorkel, Datumize, or one of the many Big Data solution providers popping up and disappearing as the whole Big Data analytics field continues to evolve and mature, your analytics tool algorithmically determines that one of your customers is probably pregnant to a certain level of statistical likelihood, and it automatically sends out a gift certificate to the customer, both using the customer’s email but also in hardcopy by regular mail.
Then the mother of the customer sees this in the mail and it’s the first inkling she has that her daughter is pregnant, so she gets upset with your company and files a privacy complaint against it. Oops! You’ve just stepped into the risk pond that’s associated with the dark data lake your company manages, or should be managing properly.
If you run a business, then I’m sure you could think up many other risk scenarios like this one that might happen to you if you plumb and utilize your pool of dark data unthinkingly. But there are other risks that companies may face because of dark data they collect and store. For example, in the ever-increasingly interconnected world we inhabit the risk of having your network hacked is steadily increasing. Dark data, of course, is data you don’t even know you’ve collected or have stored away somewhere, so if a breach occurs and the attacker steals some of your dark data you may not even know about it, ever. And yet the theft can have serious consequences for your business, including legal ramifications, compliance issues, and also possibly for your customers through violation of their privacy.
In other words, not only is identifying and bringing out into the light (analyzing) your company’s dark data a terrific opportunity for growing your business, it’s also a necessity if you want to minimize the possible risk your company may face to its future profitability and even its existence.
Managing dark data
It’s important for businesses to manage their dark data the way they manage any other aspect of their business that can provide benefit or incur risk. Finding and identifying any dark data you may have ferreted away is obviously the first step, and I’ve mentioned a few solutions you can investigate for doing this (or you can hire an analytics provider to do it for you).
Reviewing what’s been dredged up from the data pond is the next step, and this involves performing a cost-benefit analysis to determine whether the data you’ve identified may be of use or offer any kinds of risks you need to mitigate or offset against. This cost-benefit analysis should also help you rank the value of the different kinds of dark data you’ve managed to uncover within your organization.
Then based on how you have characterized this enlightened data, your next step should be to choose which data you’ll keep and which you’ll do away with. But you shouldn’t just delete any dark data you think you won’t be needing. Instead you need to justify why you’re deleting the data and then route it into your retention schedule for future disposition at an appropriate time. And you should verify later that the data has in fact been properly disposed of.
Finally, it’s a good idea at some point to periodically review whether the dark data you’ve dug up, retained, and utilized has in fact been of significant benefit to your business that it offsets the costs you’ve incurred in finding and analyzing it.
Featured image: Shutterstock