If you have ever shipped a dashboard you were proud of, only to watch it get questioned in a meeting because a key number “looks off,” you already understand the real problem. It is rarely the visualization. It is the data underneath it.
In healthcare, data quality is not just a reporting headache. It can distort operational decisions, quality measurement, care management outreach, and even patient safety work when signals get buried in missingness, duplicates, or timing issues. And because healthcare data moves through many systems, EHR, lab, billing, scheduling, registries, data quality failures tend to repeat until someone builds a way to prevent them.
This article gives you a step by step framework that informatics and analytics teams can apply immediately. You will learn how to define critical data elements, set data contracts, validate at ingestion, monitor drift, and run a weekly triage workflow with clear escalation paths. Along the way, I will ground the framework in healthcare specific pain points like missing problem lists, inconsistent encounter types, duplicate patients, and late arriving lab results.
Why Healthcare Data Quality Failures Keep Happening
Healthcare data quality tends to break for predictable reasons.
Healthcare Data Is Reused Far Beyond Its Original Purpose
A lot of health data is captured for clinical care, billing, scheduling, and regulatory needs. Later, the same data is reused for analytics, quality reporting, population health, and research. When you reuse data for a new purpose, you inherit every limitation of how it was collected, coded, and timed in the original workflow. Frameworks for secondary use explicitly emphasize assessing whether data is fit for use, not just whether it exists. (PubMed)
Data Quality is Multidimensional
In practice, teams often talk about “bad data” as if it is one thing. The informatics literature breaks data quality into dimensions such as completeness, correctness, concordance, plausibility, and currency. (PMC)
Those words matter because they point to different root causes and different fixes.
- Completeness problems look like missing problem lists or missing encounter types.
- Concordance problems look like the same patient attribute disagreeing across sources.
- Currency problems look like late arriving results or backfilled claims.
Your Downstream Dashboards Become the Quality Control System by Accident
If the first time anyone notices a data problem is when an executive sees a chart, you are doing quality control too late. You need earlier guardrails, and you need clear definitions for what “good enough” means.
A harmonized terminology frequently used in EHR data reuse frames data quality assessment into 3 categories: conformance, completeness, and plausibility. It also separates verification (does the dataset match expectations) from validation (does it represent the real world accurately for the intended purpose). (PubMed)
That distinction is useful for practical work:
- Verification catches schema changes, invalid values, and missing fields quickly.
- Validation is where you ask whether clinical reality is being captured correctly enough to act on.
The Practical Framework:
Here is the full loop:
- Define critical data elements (CDEs) tied to decisions
- Set data contracts between producers and consumers
- Validate at ingestion, before data lands in analytics layers
- Monitor drift and timeliness over time
- Triage issues weekly with clear escalation and ownership
Each step is simple on purpose. The power comes from doing all of them consistently.
Step 1: Define Critical Data Elements that Actually Matter
You do not need perfect data quality everywhere. You need dependable data quality for the elements that drive decisions.
What Counts as a Critical Data Element
A CDE (Critical Data Element) is a field where a defect creates real harm, such as:
- wrong outreach list
- wrong denominator in a quality measure
- delayed escalation in an operational workflow
- misleading trend line that triggers a bad decision
A practical way to pick CDEs is to start from outcomes and work backward. Ask, “What decisions will this dataset power?” Then list the fields you cannot afford to get wrong.
Use a Standard List as a Starting Point
If your work involves interoperability or exchanging core clinical data, it helps to reference established sets of data elements. In the US, the United States Core Data for Interoperability (USCDI) is a standardized set of health data classes and data elements intended to support nationwide interoperable exchange. (isp.healthit.gov)
You do not need to adopt USCDI wholesale to benefit. You can use it as a checklist for the kinds of elements that often become critical in real workflows, such as problems, medications, allergies, lab results, and vital signs. (National Library of Medicine)
Healthcare Examples of CDEs That Frequently Break Dashboards
Below are common patterns that show up across orgs.
Missing or Under Documented Problem Lists
Problem lists often drive registries, risk stratification cohorts, and care gap workflows. When problems are missing or inconsistently maintained, downstream cohorts become unstable. This is a completeness issue. (PMC)
Inconsistent Encounter Types
Operational dashboards frequently group visits by encounter type. If encounter type mappings drift, your trends will look like volume shifts even when the clinic did not change. This is usually a conformance plus plausibility issue. (PubMed)
Duplicate Patients and Identity Mismatches
Duplicate records fragment history, inflate counts, and cause measures to misfire. The data quality dimension here is concordance, the same real world entity is represented inconsistently across sources. (PMC)
Late Arriving Lab Results
Lab results are often represented as observations in interoperability standards, with timestamps that capture when the observation was recorded. (FHIR Build)
When results arrive late, dashboards can show false drops, and alerting systems can miss windows. This is a currency and timeliness issue. (PMC)
Template: CDE Definition Card
Copy this and use it in a doc or ticketing system.
CDE Definition Card
- CDE name:
- Business purpose (what decision it supports):
- Data sources of record:
- Field location (table and column, or resource path):
- Definition in plain language:
- Allowed values or value set:
- Requiredness (required, conditionally required, optional):
- Timeliness expectation (how fresh it must be):
- Primary consumer (team or product):
- Owner (data producer accountable for meaning):
- Steward (data team accountable for monitoring):
- Quality checks required (conformance, completeness, plausibility):
- Known failure modes:
- Escalation path:
Step 2: Set Data Contracts So Changes Do Not Surprise You
Most healthcare data problems feel sudden. In reality, many start as a change upstream that nobody communicated. Data contracts are a way to force clarity.
Thoughtworks describes data contracts as similar to APIs for data, intended to make data transfers stable and reliable. (Thoughtworks)
In practice, a data contract is an agreement between a producer and a consumer that answers:
- What fields exist and what they mean
- What formats and value constraints apply
- What timeliness and backfill behavior to expect
- How changes will be communicated and versioned
- What happens when quality drops below a threshold
What to Include in a Healthcare Data Contract
For healthcare, contracts should explicitly include clinical workflow realities:
- Nullability rules that match documentation behavior (what can be blank, when)
- Source of truth rules (which system wins on conflicts)
- Backfill policy (do you expect late updates, and for how long)
- Identity rules (patient matching keys and merge behavior)
- Event timing semantics (order time, result time, posted time)
Template: Data Contract Starter (copyable)
Data Contract
- Data product name:
- Producer system and owner:
- Consumers and use cases:
- Contract version:
- Change notification channel:
Schema and meaning
- Field list with definitions:
- Data types and formats:
- Required fields:
Value constraints
- Allowed value sets:
- Range rules:
- Referential integrity rules (must match reference tables):
Timeliness and updates
- Expected latency:
- Backfill window policy:
- Late arriving data handling:
Quality SLOs (service level objectives)
- Completeness thresholds for key fields:
- Conformance thresholds:
- Plausibility rules:
Operational expectations
- Monitoring owner:
- Incident severity levels:
- Escalation contacts and response expectations:
Step 3: Validate at Ingestion
Ingestion is your best leverage point. If you catch issues before data is used, you prevent broken dashboards and reduce firefighting.
The harmonized framework categories are helpful here because they map cleanly to automated checks:
- Conformance: does the data follow expected structure and format
- Completeness: are required values present
- Plausibility: do values look believable and consistent with expected relationships (PubMed)
Build a Minimum Viable Ingestion Validation Suite
You do not need 100 checks on day 1. Start with 10 that cover your CDEs.
Conformance Checks (structure and format)
- Column exists, correct data type
- Date fields parse correctly
- Code fields match expected coding system patterns
- Encounter type values match expected reference list
Completeness Checks (presence)
- Required fields non null rate
- Key join fields populated (patient identifier, encounter identifier)
- No sudden drop in record counts compared to baseline
Plausibility Checks (believability and relationships)
- Vital sign ranges within plausible bounds (use clinical input)
- Encounter discharge time not earlier than admit time
- Lab result timestamps not in the future relative to ingestion time (define your rule)
Healthcare Specific Ingestion Example: Encounter Types
Problem: A scheduling or registration system adds a new encounter type value, or reuses an old one with different meaning. Dashboards that group by encounter type break silently.
Ingestion checks to add:
- Conformance: encounter_type must be in the approved encounter type mapping table
- Completeness: encounter_type must be present for 100 percent of billable encounters (set your threshold)
- Plausibility: sudden shift in distribution triggers a warning for review
Healthcare Specific Ingestion Example: Duplicate Patients
You can rarely “solve” patient identity completely inside analytics, but you can detect risk.
Ingestion checks to add:
- Identify potential duplicates using a defined rule set (for example, same name and date of birth with different identifiers)
- Track duplicate rate trend over time
- Flag when merge events spike (if your source provides merge indicators)
Be careful with language here. Duplicate detection rules require local tuning and governance, but monitoring the trend gives you early signal.
Step 4: Monitor Drift, Timeliness, and Stability Over Time
Even strong ingestion checks do not protect you from gradual drift. Drift happens when distributions, missingness, or timing patterns change slowly until the dashboard is no longer comparable month to month.
The EHR data quality literature explicitly includes currency as a core dimension, meaning whether data is up to date enough for its intended use. (PMC)
What To Monitor Weekly
For each dataset that powers decisions, track:
- Volume: record counts by day and by source
- Completeness: non null rates for CDEs
- Conformance: schema changes, invalid codes, failed constraints
- Plausibility: outlier rates, impossible relationships
- Timeliness: latency from event time to availability in analytics
- Stability: distribution shifts in key categoricals (encounter type, location, department)
Key Numbers: Data Completeness Is a Real Compliance Concept in Healthcare
Even if your organization is not doing the specific programs below, they are useful examples of how healthcare defines “complete enough” data in practice.
- CMS requires Home Health Agencies to achieve a quality reporting compliance rate of 90 percent or more (as calculated using the QAO metric described by CMS). (CMS)
- In the CMS Quality Payment Program, the quality requirements note reporting performance data for at least 75% of denominator eligible cases for each measure (data completeness). (Quality Payment Program)
The point is not to copy these thresholds. The point is that healthcare already treats completeness as a measurable requirement, and your internal data products should too.
Step 5: Create Escalation Paths So Issues Get Resolved, Not Admired
A lot of informatics teams can detect problems. Fewer can get them fixed consistently. That is usually an ownership and escalation issue, not a technical one.
AHIMA emphasizes that healthcare quality and safety require the right information at the right time, and that continuous quality management of data standards and content is key to usable information. (Journal of AHIMA)
Set Severity Levels Tied to Impact
Use a simple severity model.
- Severity 1: patient safety risk or compliance reporting risk, immediate response
- Severity 2: executive dashboard or operational workflow materially wrong
- Severity 3: localized reporting issue with workaround
- Severity 4: cosmetic or low impact issue
Then define for each severity:
- who is paged or notified
- expected response time
- expected mitigation plan
Put a Name on the Accountable Role
Teams often confuse “the data team” with accountability. Data teams monitor and coordinate, but the producer system owner must own meaning, workflow, and fixes.
A simple model:
- Data owner: accountable for definitions and upstream workflow
- Data steward: accountable for monitoring, triage, coordination
- Subject matter expert: validates plausibility and clinical semantics
- Engineering owner: implements pipeline fixes
Real World Application: Fixing a Broken Lab Results Dashboard
Let’s walk through a realistic scenario.
The Situation:
A hospital quality team uses a dashboard to monitor a lab based metric. For several days, the dashboard shows a drop in completed results. Leaders start asking whether the lab is understaffed.
What is Actually Happening:
Results are arriving late into the analytics warehouse due to an upstream interface backlog. The clinical system still has the results, but your analytics layer is behind. This is a currency issue. (PMC)
How the Framework Resolves It:
- CDEs: mark lab result timestamp, result status, and posting time as critical
- Data contract: specify expected latency and backfill behavior for lab results
- Ingestion validation: add a timeliness check that compares event time to availability time
- Drift monitoring: trend latency daily, alert when it exceeds threshold
- Escalation: severity 2 incident routed to the interface or integration owner with clear expectations
Notice what changed. You did not just “fix the query.” You made lateness observable and operationalized who owns the fix.
Templates you can use today
Template 1: Data Quality Scorecard
Use this as a weekly or monthly scorecard for each data product.
Data Quality Scorecard (per dataset)
Dataset metadata
- Dataset name:
- Owner:
- Primary use cases:
- Reporting tier (executive, operational, analyst, exploratory):
Quality dimensions and checks
- Conformance
- Schema change incidents (count):
- Invalid value rate (percent):
- Reference table match rate (percent):
- Completeness
- Overall record completeness (percent):
- CDE non null rates (list top 10 CDEs):
- Missing key joins (count):
- Plausibility
- Outlier rate for key measures (percent):
- Relationship violations (count):
- Currency and timeliness
- Median latency:
- 95th percentile latency:
- Late arriving records (count):
- Incidents and resolution
- Severity 1 incidents (count):
- Severity 2 incidents (count):
- Median time to mitigation:
- Open issues older than 14 days (count):
Narrative
- What changed this period:
- Top risks:
- Actions this week:
If you want this to be truly useful, keep it boring. Use the same measures every week so you can see trend, not noise.
Template 2: Weekly Triage Workflow
This is a lightweight operating rhythm for data quality.
Weekly Data Quality Triage (60 minutes)
Before the meeting (async)
- Data steward publishes:
- Scorecard deltas
- New alerts triggered
- Top 5 failed checks by impact
- Open incidents list with owners
Meeting agenda
- Review severity 1 and severity 2 issues first
- Confirm impact and whether dashboards need banners or pauses
- Assign owner for root cause and remediation
- Confirm next update time and communication channel
- Close or downgrade issues with evidence
Roles
- Facilitator: data steward
- Decision maker: analytics lead or informatics lead
- Producer owners: application or integration owners
- Clinical reviewer: validates plausibility, especially for clinical semantics
Outputs (written every week)
- Incident log updates
- Owner and due date for each issue
- Any changes needed to data contracts
- Any new ingestion checks to add
Common Mistakes and How to Avoid Them
Mistake 1: Trying to measure everything equally
If you treat every field as equally important, your monitoring becomes noisy and no one trusts alerts. Start with CDEs tied to decisions, then expand.
Mistake 2: Only doing validation in the BI layer
Dashboards are the last place to discover issues. Put checks at ingestion, and monitor drift in the pipeline layer.
Mistake 3: Confusing a data issue with a measurement issue
Sometimes the data is fine, but the metric definition is unclear or inconsistent. This is why data contracts and CDE definition cards must include plain language meaning and use case.
Mistake 4: Ignoring clinical workflow context
A “missing value” might be expected if the workflow does not capture it at that point in care. Use clinical review to separate true defects from documentation reality. This aligns with the idea that validation depends on intended use and real world representation. (PubMed)
Mistake 5: Building alerts without escalation
Alerts without ownership become background noise. Every alert needs an owner, a severity, and an escalation path.
Mistake 6: Treating coded data as inherently reliable
Administrative and coded data can vary in accuracy across systems and contexts. AHRQ notes concerns about variability and inaccuracy of diagnosis codes across and within systems, and the risk of false positives and false negatives in some administrative data based approaches. (PSNet)
That does not mean coded data is useless. It means you should monitor plausibility and concordance, and validate against clinical workflows when stakes are high.
Closing Remarks
Broken dashboards are usually a symptom, not the disease. The disease is unmanaged data quality across systems that were never designed to serve analytics by default.
The fix is not magical. It is operational. Define the few critical data elements that drive decisions. Put a data contract in writing so upstream changes are visible. Validate at ingestion so defects do not spread. Monitor drift and timeliness so you catch slow failures early. Then run a weekly triage process with real ownership and escalation so issues actually get resolved.
If you want a next step that is immediately actionable, pick 1 dashboard that leaders rely on and build the full loop around it. Start with 10 ingestion checks tied to its CDEs, publish a simple scorecard weekly, and require producers to sign off on a basic contract. You will be surprised how fast trust returns when quality becomes measurable and owned.
References
- Kahn, M. G., Callahan, T. J., Barnard, J., et al. A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data. 2016. Journal of the American Medical Informatics Association (via PubMed Central). (PMC)
- Weiskopf, N. G., Weng, C. Methods and Dimensions of Electronic Health Record Data Quality Assessment: Enabling Reuse for Clinical Research. 2013. Journal of the American Medical Informatics Association (via PubMed Central). (PMC)
- Assistant Secretary for Technology Policy and Office of the National Coordinator for Health Information Technology. United States Core Data for Interoperability (USCDI). September 30, 2025. Interoperability Standards Advisory website. (isp.healthit.gov)
- Office of Disease Prevention and Health Promotion. United States Core Data for Interoperability resource description. (Publication date not listed on the cited page). U.S. Department of Health and Human Services website. (Health.gov)
- CMS. Home Health Quality Reporting Requirements (quality reporting compliance rate requirement). January 16, 2025. Centers for Medicare and Medicaid Services website. (CMS)
- CMS. MIPS Quality Requirements (data completeness requirement). (Publication date not listed on the cited page). Quality Payment Program website. (Quality Payment Program)
- AHIMA. Quality Healthcare Data and Information. November 21, 2024. AHIMA Journal PDF. (Journal of AHIMA)
- AHRQ. Measurement of Patient Safety (limitations of administrative data and coding variability). (Publication date not listed on the cited page). PSNet website. (PSNet)
- Thoughtworks. Data contracts: What are they and why do they matter? November 14, 2024. Thoughtworks Insights. (Thoughtworks)
- HL7. US Core Laboratory Result Observation Profile definitions. (Continuous build page, publication date varies). HL7 FHIR implementation guide site. (FHIR Build)

