Stop data pollution from turning your company’s data lake into a swamp

The Future of Work Summit will feature CIOs, CTOs, and other C-level and senior execs on data and artificial intelligence strategies. You can learn more.

Kevin Campbell is the CEO of Syniti.

Every organization is a data organization. If you work for a tech company in Silicon Valley, an established manufacturer, a legacy financial services firm, or even a government agency, your company is collecting, storing, and aiming to use more data than ever before.

The total global volume of enterprise data is projected to double between 2020 and 2022, as we are in the middle of a data explosion. Many organizations are playing a game of catch-up, lacking the knowledge and tools to effectively manage the data they are collecting so it is actually useful.

Many enterprises use data lakes to handle the data deluge. Data lakes give businesses the upper hand in terms of flexibility and integration. Many organizations end up with something more like a stagnant data swamp, full of murky data pollution, rather than a pristine data lake. What can you do to stop the swamp and take full advantage of your data?

1. Pick the most important company data and get everyone to agree.

I love my kids the same as a dad. The same is not true for data. All of your company's data should be treated differently. Trust me, it doesn't.

Along with some key stakeholders, you need to decide what data is most important to your organization. Dumping all of your data into the data lake is the fastest way to create a swamp. So, come up with the data that is driving the company and delivering wider business value, and designate those to be your key performance indicators and success metrics.

Make sure you socialize it with key stakeholders so you have the buy-in to use it. There are some questions to ask.

What are our key metrics?
What metrics will we measure?
We don't know what the formulas for calculating these are.
How data gets pulled into these metrics is required.
What systems do we use to store our data?

A data charter that clearly states the above will help ground your overall data strategy.

2. Know the data.

You picked the most important data, and you got an agreement from key people in your organization. What are the next steps? You need to know how data is created. Where is it located? How is it being maintained?

Take a look at where your company's important data is coming from and how it's entered into your systems. Ensuring the data that you are storing is accurate is the first thing we need to do. You should include processes for merging and getting rid of duplicate data. One of the most important things in data is deduplication, and it can save you a lot of money and resources.

It is going to take time and work, but don't overlook this step. It is important to remove internal silos and create valuable data. Proper maintenance and point-of-entry implementations that keep duplicate records and bad addresses out are not negotiable. Your lake will become a swamp if you don't have these. This mistake is made far too often by organizations.

3. It's important for company data to have governance.

I know. Governance is seen as slow and limiting. It helps assign authority and control over data assets so that it can be used across the organization.

Customer success is one of the most important metrics. It goes all the way back to the first contact with the customer. Who created that customer record?

Without proper governance, we could have multiple numbers for the same customer, which could make it difficult for us to make smart data-driven decisions, and potentially make it difficult for us to deliver a great customer experience.

Good governance should support compliance with any regulation that affects your organization.

The data charter referenced earlier is the cornerstone of your governance strategy. It is easy to lose sight of your initial goals as a data program continues. Make sure you refer back to it frequently, so that they remain top-of-mind. If your organization's requirements change, then adjust your data charter accordingly.

It is crucial that transparency is present. This means clear communication between all stakeholders, allowing different departments to impart their knowledge, whilst driving transparency and accountability for maintaining data quality.

It is important to be completely transparent about what your company is collecting. The most obvious reason for this is to avoid falling foul of regulators, as has been the case with CaixaBank, which received multi-million-euro fines for violating the transparency clause. It isn't worth it.

The more data, the better. Not necessarily.

More data isn't always better. Companies should be careful about collecting and storing data for which they don't use much. Storage and managing such data is an unnecessary expense because of the security, privacy, and compliance risks. You probably have more than enough of it already, so focus on data that has value and utility.

Valuable data has the potential to foster new business growth, streamline operations, enhance customer relationships and boost agility. Who wouldn't want that?

Kevin Campbell has been driving innovation and growth for more than three decades. He is the CEO of Syniti.

The VentureBeat community is welcoming you!

Technical people doing data work can share their insights with experts at DataDecisionMakers.

At DataDecisionMakers, you can read about cutting-edge ideas, up-to-date information, and the future of data and data tech.

You could even contribute an article of your own.

Data decision makers have more to say.