Volume, Velocity, and Variety.
These three components are often used to characterize the scale of data problems in a business or workflow, but I’ve never seen anyone explicitly put them into an Oil & Gas context.
The earliest reference I can find to this framework is a post by Doug Laney from 2001 – read it for a bit of a time warp through long-lost names of Internet Bubble startups. But while the power of hardware continues to accelerate, we often continue to struggle with the scale of our data. Even with two decades of progress, using the Three V’s can clarify the root cause of the problem, and better understand where we can find solutions.
Let’s get the punch line up front – I believe that Variety is the primary data size problem that O&G companies are struggling with today, not either Volume nor Velocity. Now let’s dive in for why!
I often hear clients and friends complain about the size of their datasets. “Man, we just have so much data, I don’t know what we’re going to do!” But then when we (at Velocity Insight) start working with their data, we almost always find that the biggest table we’re dealing with is maybe a few gigabytes (GB) for, say, a General Ledger in Accounting. Daily production allocation tables are even smaller – rarely do we see an allocation table that’s larger than 1 GB. While these file sizes will crush an Excel spreadsheet, moderately modern technology like SQL Server and BI tools like Spotfire and Power BI can deal with these pretty easily. A regular old laptop can usually bring these datasets into RAM, no sweat.
Outside of O&G, there are a lot of businesses whose key data tables run into the terabytes if not larger. Click tracking for websites, data logs for computer gaming, and video processing businesses have orders of magnitude more data that needs to be stored, scanned, indexed, and eventually analyzed to bring value. These datasets aren’t afterthoughts – what is Vimeo without rapid video storage and recall? Or Call of Duty without being able to manage a zillion simultaneous players?
There are certainly exceptions in O&G – namely, 3D seismic data gathering. Raw trace data from geophysical shoots gets very large indeed. But in almost all cases, the high-frequency, broad spectrum time series data from the geophones gets reduced before anyone uses it. However, as a final 3D structure volume, the actual datasets used by non-geophysicists ends up being quite reasonable in size.
The vast majority of key datasets in O&G are collected over relatively long time periods. Production histories and mineral ownership in particular are datasets that run in decades – even in centuries in many cases! The incremental learnings going from annual to monthly to daily data frequency is often modest at best.
Furthermore, many key datasets in O&G are almost never gathered more than daily. Consider daily tank level gauging, well chronologies of drilling and completion work., and weekly Even in a very active drilling program, you’re lucky to get a new data point of well logs more than once a week.
The big exception in this area for O&G is the world of SCADA and IoT. Devices are increasingly capable of gathering sub-second data frequencies, but when is this ever done? Most SCADA systems I’ve seen gather one-minute data at best, and usually set deadbands so that the true data frequency is much slower than that. The value proposition of higher-frequency data is marginal in most cases.
Now we’re talking. Variety is a serious issue for every E&P we know of. As we discussed in a previous piece, most E&P companies have a dozen or more key functions spread across accounting, finance, midstream, land/legal, petroleum engineering, geoscience, etc., etc., etc. These functions adopt discipline-specific software tools with data models and workflows that are all foreign to each other.
Consider the source of much pain and suffering at E&P companies – the working interest! In onshore US/Canadian assets with private land ownership, every well can have its own working interest (WI) and net revenue interest (NRI). And even worse, they change over time! Unit participation agreements, force pooling, and the continual buying/selling of assets result in interest factor decks that change often. Precision is crucial – most companies calculate these interests to 6 digits, if not more.
Imagine an operator with 1,000 wells in a U.S. basin with a lot of deal activity and heavily subdivided interests – perhaps the DJ Basin in Colorado, the Williston Basin in North Dakota, or the Permian. Each well can have dozens, if not hundreds of separate owners, and is probably getting revised interest factors once a year if not more often. When new wells are proposed, the Land department often has to make assumptions to calculate an “anticipated” working interest based on what they believe non-operated partners will choose to do – elect to participate, agree to an acreage trade, go non-consent, or be force pooled? Building a capital budget is an exercise in reading tea leaves.
But those “anticipated” interests aren’t useful for disciplines like Accounting or Reserves. They need interest factors that they can bank on, so they require “contractual” interest factors, typically based on a title opinion written by an attorney. Both contractual and anticipated interests are dynamic, as owners die, estates are split, assets are bought and sold and traded. If owners went non-consent or were force pooled, they’ll often have reversionary interests to factor in as well.
The final complication for interest factors – many, many software systems record WI and NRI for calculating net production, net expenses, future net capital and valuation. Synchronizing those numbers is well-nigh impossible, leaving individual contributors constantly checking and rechecking their assumptions, often making innocent errors. I’ve seen more than one E&P management team get knocked sideways by WI/NRI data quality problems.
So how to deal with Variety? Many companies adopt Master Data Management (MDM) solutions, although anecdotally, we’ve seen these fall out of favor at most operators due to high costs and time requirements. There are other, more fit-for-purpose solutions that we see a lot, like Data Quality Dashboards and what we call “MDM Lite”. Perhaps a topic for a future post….
That was a long and winding tale, but hopefully we’ve convinced you that if you’re interested in the “Big Data” problems that most E&Ps are struggling with, you should focus on addressing Variety first and foremost. There’s gold on them thar hills, go find it. Volume and Velocity can present problems too, but they aren’t nearly as common.