Data quality problems are rarely loud. They tend to quietly spread through an organization, influencing decisions long before anyone questions the data itself.
When bad data is discovered
What matters most is not really how wrong the data is, but when the problem is discovered. In practice, one of three things typically happens:
| Issue is discovered | Observed behavior | Outcome |
| Early | The issue is detected before the data is used in decisions. | Some friction and delays, trust unaffected. |
| Late | The data has already been used in decisions. The issue is discovered through unexpected results, broken reports, or downstream failures. | Damage control, rework, and reduced trust. |
| Never | The data is never questioned and continues to influence decisions unnoticed. | ?? |
The third case is the most common, the most scary, and the one we're going to focus on.
Many data quality issues go undetected
If you are not actively checking data quality, your data is both correct and incorrect at the same time. That may sound strange, but without active observation you only learn the true state when something breaks or a result is questioned.
Most organizations have some kind of data quality issues. Some are aware of them and spend time reacting and fixing problems as they arise. Others are not aware, which often places them in the third outcome from the table above.
We don't really know whether that's inherently good or bad (you can probably make excellent decisions on bad data), but assuming your data is fine simply because nothing has failed is not really a strategy.
Automation is a low-effort starting point
Automated data quality checks will not solve all problems. They are, however, relatively low-effort and very effective at establishing a simple baseline.
They help validate basic expectations about the data: that values are reasonable, data arrives when it should, records are not duplicated, structures remain stable, and that the data is fresh. They detect anomalies early, before data is widely consumed. They also act as regression checks for issues that have already been discovered, preventing known problems from quietly reappearing.
Do they eliminate Schrödinger-like data quality issues? No. You are not observing everything. But each check you add reduces the remaining, unobserved problem space.
If you want to make data quality observable in your own systems, we are happy to discuss how to get started.