I was talking to a client the other day about how modern data tools “smash and blend” data together, and ended up using the analogy of a trash compactor.
It’s a useful analogy when thinking about data management and data analytics, so forgive me as I stretch this metaphor to its (absurd) limit in the next few minutes.
Trash compactors get a bad rap, because they’re the ultimate in Garbage In, Garbage Out. Take loose trash that fills a lot of space, smash it into a smaller space, and it’s still trash. Just in a slightly better format. Whether it’s WALL-E building towers of trash or Luke, Leia, and friends in the garbage chute, trash compactors aren’t real appealing.
We talked about this in The Data Pyramid awhile back, but if you use modern data analytics tools on Garbage Data, you’re going to end up with Garbage Conclusions. Cleaning, organizing, and classifying data is a necessary step if you’re going to do anything useful like predict well performance, optimize production, or streamline business processes.
To really make a difference with data, we don’t need trash compactors. We need forges.
Forges make amazing things – swords, drill bits, scalpels, and pistons! But to get these amazing products, forges require purity. The feedstock to a forge is constantly measured for quality and consistency, using expert judgement and expensive technologies like x-ray fluorescence. Inaccuracies lead to failure – unwanted brittleness or ductility can lead to waste and even disaster. Precision enables power – strength, durability, toughness.
We all know in our heads that data analysis requires that same attention to detail, and it’s critical to find ways to justify that cleaning and monitoring. The investment in clean API numbers, well-documented data models, and a properly maintained Data Mart pay off over and over again. It ain’t cool or appealing, so everyone in an organization from bottom to the top has to be committed.
Now let’s stretch the metaphor a bit more. Forges come in two basic forms:
hand-operated and machine-operated
A hand-operated forge is as much art as science. Metallurgists make products like jewelry and timepieces with expert attention to detail. They learn to see, smell, and feel quality based on decades of experience.
Statisticians and data scientists often work in a similar way – QC’ing data by hand, writing custom Python code, and carefully testing AI/ML algorithms is a similarly artistic process.
It’s iterative: Try, fail. Try, fail. Try, succeed!
Industrial, automated machine-operated forges are the other end of a spectrum. Quality control and blending of the metals are done by computer, tweaking the mix on the fly based on real-time data streams. Pouring, quenching, and hammer-forging are done in a powerful, fast, computer-aided dance. Properly implemented, an automated forge achieves both high quality and low cost at unbelievable scales. They enable gigawatt-scale power plants, 100-story skyscrapers, and drillbits that make 1000′ of hole per hour.
This is where operationalized data science ends up going. Call it DevOps, call it AutoML, call it what you want – enormous value can be created when data management and analysis is conducted at scale. For a computer to drive a car or read a PDF, geosteer a horizontal or predict well downtime, quality is crucial. AI and ML at scale are only possible if cleansing is built in to every step. Drop some banana peels (or trash monster) into the mix, and your expensive forge is back to just being a trash compactor.
I should clarify – I don’t mean to imply that the expert, hand-made data science doesn’t have value. To the contrary, I think it’s a crucial step and often the final one for infrequent workflows. Just recognize that it isn’t always the end of the road.
If you want to move in the direction of advanced data analysis, you need to be aware of these nuances.
- Are you building forges or trash compactors, and why?
- Are you building hand forges or industrial forges, and why?
Make these choices intentionally and thoughtfully – otherwise, you’re probably throwing your money down that garbage chute.