Lakes, Warehouses, Marts, and Kits

August 23, 2022
Zack Warren

We just completed a project for a large operator to help them understand how their peers use data, and ended up spending a lot of time thinking about Data Lakes, Data Warehouses, and Data Marts.

Whether those terms are old hat or completely new to you, stick with me – knowing which to use in which situation is really useful. Plus, we’ll introduce what I think E&P companies should really be shooting for – Data Kits.

What the heck do all these terms mean?!

In (very) general terms, these are listed in order of increasing levels of sophistication and curation. Call it Digital Maturity if you want to be fancy.

A Data Lake is the raw set of source datasets in a company, left in their original, difficult-to-use state. Think of it as a hole in the ground, with piles of data dumped in without organization (thus the Lake term). Data Lakes can be cheap to build, but difficult to use (derogatorily known as Data Swamps!).

a swamp with trees and grass

A Data Warehouse is a well-organized central store of data, often with table relationships defined and integrated across systems. Think of this one like an AWS distribution center, with products stacked to the ceiling and forklift robots whizzing around to gather up your late-night impulse buys. Data Warehouses are often expensive to build and maintain, but incredibly powerful for enterprise use cases.

a long warehouse with shelves and boxes

A Data Mart is a curated group of data products to solve general business problems.Think of it like your local King Soopers (or HEB, for you Texans), with good signage, freshly washed produce, and easy-to-reach shelves. Data Marts require someone to do that curation and maintenance though, so they’re often very expensive (in terms of soft costs) to maintain. But once they’re running, it makes it easy for people across a company to access useful data.

people in a grocery store

We see E&P companies using blends of these three concepts in how they deliver data to consumers. Larger operators are more likely to be building robust Data Warehouses and Data Marts, maintained by central Data Analytics or IT teams. Smaller operators with little-to-no IT resources are often stuck in Data Swamp land, with spreadsheets on shared network drives or emailed from person to person.

Data analytics nerds (like me, apparently) can spill gallons of ink on the pros and cons of these different approaches, and we at VI have our own opinions on which generates the best ROI. We think the thing that most business users want is even a bit more curated than a Data Mart, so let’s add to the word salad!

A Data Kit is a fully engineered set of data tables to be used for a specific set of business workflows. I think I may have invented the term, although I see a lot of people who are already building these to solve problems today. Think of this as a meal kit from Blue Apron or Hello Fresh, where the ingredients are already portioned, chopped, and delivered to your front porch. These are similar in cost to a Data Mart, but still more curated and thus easier to use as a report author.

a group of grocery service boxes on a wall

One common example we see at E&P companies is a Finance/Accounting Data Kit. This would typically consist of a well list, chart of accounts, general ledger, and related tables. It’s probably already gone through a lot of cleaning and data modeling, ensuring primary/secondary key consistency, refresh frequency, and fault tolerance have already been handled. It might be delivered using technology like a Power BI Dataflow or Spotfire Information Link, or simply be a set of “source of truth” tables on a SQL Server.

 As a data consumer, a Data Kit should be a One Stop Shop – no need to go sift through hundreds of tables in the Warehouse or Mart, just grab the whole batch of relevant data in one pull. A Reservoir Engineer, Finance Analyst, Landman, or JIB Accountant can all leverage the exact same Data Kit to do their respective analyses. By centralizing the data management and governance work, the company gets better data consistency at lower effort. Sounds nice, right?

Do these terms make sense to you? How does your company deliver data today, and how do you want to be delivering data 5 years from now? We hope this framework is useful, give us a call if you ever want to chat!

Let's discuss it further.

We love to hear your thoughts. Drop us a line or schedule a time to talk.

Learn With Us

The Oil and Gas data marketplace is constantly changing. Stay up-to-date, learn the latest trends and plan for the future with us.