Insights

Sunshine is the Best Disinfectant

July 22, 2024
Zack Warren

Bad data usually doesn’t directly lead to bad decisions, but mostly a loss of speed.

It makes us double-check our work, doubt our processes, and question each other instead of just getting the job done.  Slow, slow, slow.

To me, that’s a common thread across E&Ps: the need for speed without sacrificing quality.

a still image from the movie Top Gun: two pilots walk toward the camera, looking at one another and talking

RIP, Goose.

 

If you want to get serious about benefitting from your data, start with some sunshine.  By making your data easier for people to see, there’s a MUCH better chance it gets cleaned.  Whether you’re trying to use AI, LLM’s, or even just old fashioned reports, without clean data, you’re probably wasting your time.  But how do you start moving in that direction efficiently?  I’d argue the best start is simply showing people what you have: in other words, shine some light on it.

Why Data Quality Matters

A good meal requires good ingredients.  A good photograph requires good lighting.  And good data analytics requires good data.

Most of the time, when I see a data analytics project fail, bad data quality is a major factor.  Seeing bad data on a report shakes people’s confidence – it makes us doubt the validity of a conclusion.  That leads us to double-check our work, rethink our approach, and even point fingers at our co-workers.  If you’re worrying that the results of an algorithm may just be a consequence of noise, it freezes you from making decisions and moving forward.

I’m not sure exactly where we are on the Hype Cycle, but LLM’s and AI tools are certainly a big topic of conversation for our clients and friends.  Alongside the excitement, it’s easy to see how poor training data quality will undermine confidence in those tools and potentially cause a lot of wasted effort.  Birds are in fact real, but if you generate enough content that claims otherwise and feed it into a model, LLM’s will at least consider that a possibility birds are drones sent by our evil overlords.

Shine Some Light On It

The good news is that most professionals know bad data when they see it.  Inconsistent formats, missing data, values wildly high or low – after a few years of our careers, we can see these quickly and intuitively. IF we can see them.

However, with even modestly-sized databases, it’s unreasonable to lay eyes on every data point.  We need to use tools to help us surface the problems – data profiling tools are great for this (one of the many reasons I love Power Query).  Over time, we can usually define a handful of data quality rules that are worth enforcing.  Don’t clean everything – just focus on the things that actually impact your business process!  See a previous post on why you should Clean Your Kitchens, Not Your Basements for more on that.

A Cookbook for Cleanliness

My favorite path to success in data quality is a simple one.  It’s not that hard or very fancy, but it works.

Show examples of bad data to people who can fix them.  Reward improvement.  Repeat.

In more detail, we usually do something like this:

  1. Build a report that would be useful for a decision maker – Actual vs. Budget, Profit and Loss (aka LOS), Downtime, whatever.
  2. Look for data quality problems that prevent people from trusting the report – inconsistent Well ID numbers, improperly coded expenses, etc.
  3. Build a “Data Quality” report that shows the data quality transparently and get it into people’s hands – here’s an example for CAPEX data.
  4. Applaud efforts to fix those problems.
  5. Do it again!

Cleaning data doesn’t happen overnight and it’s never really finished – that’s why this process tends to work.  People see progress and feel good that their decisions are easier and faster, so they’re willing to invest more time in it.  This is a place where a bit of gamification can work – make a scorecard like you might for Lost Time Incidents or Near Misses, and give a shout-out to someone who’s made a difference at your next team meeting.  It’s worth it, because the business value of getting this stuff right is very, very real.

Work in Parallel

We often hear business users tell us “let’s wait until we clean up our data before we start that project”, but the truth is, the data is never fully clean.  That’s okay!  It doesn’t have to be perfect to be useful.

In fact, we usually see that the cleaning work doesn’t really get going until there’s clear evidence that fixing it will make the business better.  No one cares that unseen, unsmelled garbage is sitting contained in a dumpster.  But scatter it all over the road, and people care.  Just ask New York City.

a pile of garbage on a city street

Unseen, your data quality trash will never get picked up.  It’s painful, but you have to look at it to justify fixing the problem.

The title of this blog is a paraphrase from Supreme Court Justice Louis Brandeis over 100 years ago, but the wisdom is worth our time.  We aren’t fighting corruption and fighting crime like he was, but his instincts are true for us.

As a guy who’s lived in Colorado for awhile, I’ve gotten addicted to sunny days.  As a reservoir engineer and data nerd, I’m equally addicted to clean, well-formed datasets.  If you’re struggling with data quality problems, give us a call.  It might be easier to clean them up than you think!

Let's discuss it further.

We love to hear your thoughts. Drop us a line or schedule a time to talk.

Learn With Us

The Oil and Gas data marketplace is constantly changing. Stay up-to-date, learn the latest trends and plan for the future with us.