News & Insights

Shedding Light on Dark Data


No, it’s not one side of the Force or the other, but Dark Data is a thing – and it’s likely affecting your databases, processes, decisions, and business! Dark data represents a trove of untapped insights but can also be a liability if unmanaged. But why does it exist? Why should you care about it? And what can you do about it?

Gartner coined the term “dark data” and defined it as “the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing).” (Gartner)


According to, the average American company is only leveraging 15% of its available data. 33% are redundant, obsolete, or trivia (ROT). A majority, 52% of the data however is unclassified or “dark”. Unlike ROT data, there is no structure or a process to analyze this bucket of data. With ROT data, you can at least clean it up. But with dark data, it’s more involved than that.

The reason dark data exists and has become so prevalent is that the pace of data generation outpaces the pace of data consumption. Due to capacity constraints, it is understandable that companies focus their resources on tapping “business-critical” data, such as key customer info or transaction data. However, it might be time to consider shining a light on your dark data to see if you can uncover some additional value to give you a competitive edge. Otherwise, if you leave it in the dark, data can be detrimental to your business.

Problems of Dark Data

Dark data is not just sitting innocuously in your internal systems. No, they will become the root cause of some undesirable costs and risks.

  • Security Risk – Not dealing with dark data can open your company to vulnerabilities. Unattended log files for example can become breadcrumbs for hackers to get into your system.

  • Reduced Responsiveness – This can also be considered a security risk. “86% of IT decision makers think the amount of data they store increases the time it takes to respond to a data breach.” (

  • Legal Risk – Regulations such as GDPR and HIPAA require certain types of data to be wiped or secured safely with robust internal controls. If you don’t know your dark data sets, you risk having sensitive data exposed and risk incurring hefty penalties.

  • Reputational Risk – The market will lose trust and confidence in a company that does not manage all its customers’ personal/sensitive data properly.

  • Storage Cost – Hanging on to unused data incurs storage cost, which can add up as your business grows.

  • Unrealized Value – If dark data incurs risks and storage cost, a simple solution is to identify and remove them. However, they might also contain insights that can help you be more competitive. estimated that leveraging all relevant data can give organizations a potential $430 billion in productivity gains.

Managing and Monetizing Dark Data

To manage dark data, you must first know how and where they exist in your data systems. Then you can develop the capability to transform the unstructured data into something more accessible before tapping the data sets for value.

Discovering and securing your dark data 

  • The first step is knowing that you have dark data and understanding how your company is generating them and what kind of data is being generated. This could be email correspondence, log files, extra customer information, etc. Periodic data audit is the only way to stay on top of all the data your company generates and ingests.

  • Design a workflow to capture and classify the data, for example, by adding metadata labels so they can be accessed easily. There are tools to deal with dark data, and Devopedia provides a few examples: DeepDive, Snorkel, and Dark Vision.

  • If you decide to keep the data, make sure you encrypt and store them securely to minimize downside risks.

  • Leverage automation technology so that your staff does not become buried in processing dark data. For example, once you understand how to capture different sets of dark data your company generates, you can train a “bot” to do the same at scale.

Finding ways to extract value from your dark data 

  • Hyper-personalization – You can use the additional data not deemed immediately relevant to customize your offering and deliver a better customer experience. For example, your server log will likely have a trove of information on the how visitors to your website behave. Analyzing their behavior can lead to better understanding on communication approaches and best ways to keep your existing customers engaged.

  • Process improvement – Internal data that is non-critical to your business can still be a relevant source of competitive edge. Here’s an example. A company’s internal email communication and calendar bookings for conference calls increase every time there is a data hand-off between two departments. The word “inaccurate data” or “missing data” consistently come up. This information supports the need for a streamlined data validation solution.

From discovery to value extraction, dealing with dark data will require investments in technology, training, and potentially additional staff. There will be no guarantee that your dark data initiative will immediately yield the benefits that outweigh the cost of managing it. In the long run however, not having a strategy to deal with dark data will most definitely lead to risks listed above.

About BaseCap Analytics

BaseCap Analytics help organizations clean the data they rely on so they can be confident in using their data to drive business decisions, demonstrate regulatory compliance, provide seamless customer experience, etc.

Contact us and let our team of data experts help you extract value from your dark data sets, ensuring they are fit for use by validating the data on the Data Quality Manager platform.