Data is crucial to running a business; you need good reliable data to make well-informed decisions. There are a number of ways you can collect this data, for example cross-channel tracking, and you can even do it without gating off your content! But collecting data is only half the story, and how you store that data will have a significant impact on how you then use it.
When it comes to data storage, there are two predominant concepts: data lakes and data warehouses. They both amalgamate data but serve different purposes, and knowing which to use in your business context will make the process of using that data much easier when it’s time to analyse it.
What is a data lake?
The easiest way to understand a data lake is to think of it like that one drawer in your kitchen where you throw everything. There is no structure, and whatever you might tell yourself, it doesn’t have any organisation – everything is thrown in there together.
This approach to data storage allows you more flexibility, and lets you store all data types: structured, semi-structured and unstructured. This in turn means you can pull everything in, from social media data, text, images, server logs, and more. However, much like the kitchen drawer, this approach to storage can make things harder to find.
What is a data warehouse?
If the data lake is like your miscellaneous kitchen drawer, think of a data warehouse as a bookshelf, where the items on it are curated and organised in a logical format. This inherently means the data in the data warehouse is structured; easy to understand and use.
Data is only loaded into the data warehouse once it has been cleaned and transformed, to ensure that all the data gathered is in a consistent format. Examples of data might include transactional data, customer details, sales figures or other operational data. While this data is easier to use and understand, the warehouse can require a significant upfront investment to create, typically only gives you historic insights, and requires ongoing maintenance.
|All types of data, stored in its raw format. This include structured, semi-structured and unstructured data
|Structured and semi-structured data that has been cleaned and organised before storage
|Typically more useful for big data and machine learning projects, where AI is able to parse through large amounts of data
|Support business decision making and analysis through easy access to historical trends
|Most typical users
|Data scientists and analysts
|Marketing managers, business executives, key decision makers in a business
- More scalable
- Quicker to implement
- More flexible- Can give you more contemporaneous data
- Data has been processed before being loaded, so can be used straight away
- Easy to understand
- Requires transformation for the data to be usable- Lack of structure requires time consuming process to extract useful information
- Time consuming and expensive to first set up and maintain- Only gives you historic data trends Inflexible
How to collect meaningful data
Data, when used right, can be a significant unlock for a business. But data collection and storage can be time-consuming, expensive, and subject to laws and regulations, for example GDPR. All of this contributes to a need for your data collection to be relevant, and adding value to your business. The adage of ‘work smarter, not harder’ definitely applies here, and focusing your efforts on getting the right data is better than the firehose collection method that gives you little meaningful insight.
Everybody’s needs are different, which means there is no one-size-fits-all approach to data collection. Instead, there are some general principles that you should consider, and then apply to your own circumstances.
What do you need?
First, you want to establish what data you actually need. What are you hoping this data is going to tell you, how will it be used, and what are going to be the right channels to acquire that data? All of these questions have to be fully answered before you can begin the collection process.
How will you store it?
Once you know the data you need and how you intend to use it, you need to choose how you’ll store it. The likelihood is if you intend to use the data to make informed business decisions, you’re going to want to create a data warehouse that will easily allow you to discover insights and trends.
Maintain data quality
There is no point collecting data if it isn’t accurate, consistent, relevant and complete. You need to have processes in place to validate the integrity of data, and ensure that it is free of errors. This can be completed using automated tools, or via a sense-check manually. Either method will highlight missing data or duplicates.
Presuming you are using a data warehouse, you will also need to make sure that data is formatted correctly. That means it needs to be consistent, regardless of its origin. That might require some time to transform certain datasets into standardised formats, but the process will ensure maximum utility.
Data privacy and security
Security is vitally important, and failure to adequately protect data can quickly become a legal issue. Any identifying or personal data needs to be stored in compliance with all data protection regulations, and unless you have a legitimate interest in being able to identify customers, efforts should be made to anonymise the data.
Create a data culture
Make data a core tenant of your team culture. Spend the time training your employees to understand data, and emphasise the value of good data collection. Support them in identifying, transforming, and utilising data to inform decision making. This will help elevate your data practises, and move the responsibility from one person or team, to everyone in the organisation.
Enjoyed reading this article? Stay in the loop! Sign up for our newsletter and get more insightful articles, exclusive content, and updates right in your inbox.