Data Curation in Healthcare: Challenges, Definitions, and Examples

What is data curation in healthcare?

Electronic Health Record (EHR) systems help healthcare providers treat and care for patients by storing crucial clinical data. However, there are no uniform standards nor a common language they all share. This leads to inconsistent data collection, documentation, and reporting of clinical information that is important for healthcare organizations related to quality care, metrics, etc., and ultimately for patients.

Data curation is a process that improves data that doesn’t meet a quality standard due to missing or incorrect values, thereby reducing the amount of unusable data. This process includes activities like data selection, classification, validation, and remediation of disparate data that comes from multiple sources.

EHRs and EMRs alone can present data quality problems.

For example, there are upwards of 1,000 different EHRs in existence, and the average health system uses 18 different EMR vendors across affiliated providers.
Data curation makes patient data more usable and more powerful while:

  • Reducing information management burden with frictionless deployment
  • Helping integrate data across HIT vendors
  • Saving manual hours spent obtaining clean clinical data
  • Decreasing administrative burden to translate data into a usable format for data analysis, population health, care management, etc.

Data curation improves the quality, completeness, and usability of healthcare data from extraction to data cleansing to enrichment.

5 reasons healthcare data must be curated

  1. There is no single system. Healthcare data originates from multiple sources—from different EHRs, EMRs, and to/from different departments or organizations.
  2. Healthcare data exists in myriad formats: paper, digital, images, videos, text, numeric, and more, with little or no standardization.
  3. Data structure (or lack thereof) varies. Some of the data in a health record is entered and captured into fields that can be validated and aggregated, but other information like free text and notes cannot be easily categorized.
  4. The data is variable and complex. Information from claims data is more standardized; however, not complete as it does not tell the full patient story. But clinical data is more variable and subjective to provider interpretation.
  5. Regulatory requirements are constantly changing. Reporting requirements for agencies like CMS continue to evolve and increase, making some data or transmission modes obsolete or less valuable.

What are the most important types of healthcare data?

Healthcare data collection and assessment uses several different categories. 

Administrative Data

Administrative data helps health organizations better understand the specific needs of the populations they serve and properly distribute resources.

Electronic Health Records

EHRs contain an individual’s entire clinical history, including past diagnoses, treatments, and outcomes. 


Clinical Data

Many health organizations have standardized clinical data. This type of data may be used by Medicare or other regulatory agencies. It can also enable continuous performance improvement (CPI).

Insurance Claims Data

Patient care organizations and health plans analyze information and trends in insurance claims data for disease management, reimbursement, risk management, and to investigate fraud, waste, and abuse. 

Patient Surveys

Healthcare organizations have more access than ever before to patient-generated data, from wearable technology and mobile health apps. Having access to this data is crucial for monitoring chronic conditions. Social Determinants of Health (SDoH) and genomic data can also provide a fuller picture of the patient’s status.

Who uses data curation in healthcare?

Capturing data that is clean, complete, accurate, and formatted correctly for use from multiple systems is an ongoing challenge for healthcare organizations as they strive for interoperability. Sharing data with external partners is essential, especially as the industry moves toward population health management, value-based care, and interoperability.

Curating data in healthcare is an important service that helps teams in both health systems and health plan organizations. Use case data curation and enrichment is one example where both health care providers and payers benefit from having clean and organized patient data.

Data curation is used by health systems and health plan organizations (payers and providers) like:

  • Providers, nurses, and other patient care providers
  • Hospital information management teams
  • Health Plans
  • Third party administrators
  • Health information technology vendors

Types of shared data include, but are not limited to:

  • Diagnosis
  • Procedures
  • Allergies
  • Medications
  • Visits
  • Labs and Orders
  • Vitals
  • Immunizations

Benefits of data curation in healthcare

Data curation allows healthcare organizations to integrate and analyze clinical data to make patient care more efficient, and extract insights that can improve clinical outcomes and support business objectives and requirements. Select the topic below to explore more about the role of data curation for these types of initiatives:

Verinovum has a proprietary Data Curation Platform that can help healthcare organizations make data more usable, more powerful, and more valuable.