Record Detail

Advanced Search

Text

Generating high-fidelity privacy-conscious synthetic patient data for causal effect estimation with multiple treatments

Dong Wang - Personal Name
Jingpu Shi - Personal Name
Gino Tesei - Personal Name
Beau Norgeot - Personal Name
Alejandro F. Frangi - Personal Name

In the past decade, there has been exponentially growing interest in the use of observational data collected as a part of routine healthcare practice to determine the effect of a treatment with causal inference models. Validation of these models, however, has been a challenge because the ground truth is unknown: only one treatment-outcome pair for each person can be observed. There have been multiple efforts to fill this void using synthetic data where the ground truth can be generated. However, to date, these datasets have been severely limited in their utility either by being modeled after small non-representative patient populations, being dissimilar to real target populations, or only providing known effects for two cohorts (treated vs. control). In this work, we produced a large-scale and realistic synthetic dataset that provides ground truth effects for over 10 hypertension treatments on blood pressure outcomes. The synthetic dataset was created by modeling a nationwide cohort of more than 580, 000 hypertension patient data including each person’s multi-year history of diagnoses, medications, and laboratory values. We designed a data generation process by combining an adapted ADS-GAN model for fictitious patient information generation and a neural network for treatment outcome generation. Wasserstein distance of 0.35 demonstrates that our synthetic data follows a nearly identical joint distribution to the patient cohort used to generate the data. Patient privacy was a primary concern for this study; the ǫ-identifiability metric, which estimates the probability of actual patients being identified, is 0.008%, ensuring that our synthetic data cannot be used to identify any actual patients. To demonstrate its usage, we tested the bias in causal e ect estimation of four well-established models using this dataset. The approach we used can be readily extended to other types of diseases in the clinical domain, and to datasets in other domains as well.

Availability

No copy data

Detail Information

Series Title	-
Call Number	-
Publisher	Frontiers in Artificial Intelligence : Switzerland., 2022
Collation	006
Language	English
ISBN/ISSN	2624-8212
Classification	NONE
Content Type	-

Media Type	-
Carrier Type	-
Edition	-
Subject(s)	Artificial Intelligence Model Validation hypertension electronic health records causal inference observational data treatment effects potential outcomes
Specific Detail Info	-
Statement of Responsibility	-

Other Information

Accreditation	Scopus Q3

Other version/related

No other version available

File Attachment

Generating high-fidelity privacy-conscious synthetic patient data for causal effect estimation with multiple treatments

Information

Web Online Public Access Catalog - Use the search options to find documents quickly