Method Article
When randomized controlled trials are not feasible, a comprehensive health care data source like the Military Health System Data Repository provides an attractive alternative for retrospective analyses. Incorporating mortality data from the national death index and balancing differences between groups using propensity weighting helps reduce biases inherent in retrospective designs.
When randomized controlled trials are not feasible, retrospective studies using big data provide an efficient and cost-effective alternative, though they are at risk for treatment selection bias. Treatment selection bias occurs in a non-randomized study when treatment selection is based on pre-treatment characteristics that are also associated with the outcome. These pre-treatment characteristics, or confounders, can influence evaluation of a treatment's effect on the outcome. Propensity scores minimize this bias by balancing the known confounders between treatment groups. There are a few approaches to performing propensity score analyses, including stratifying by the propensity score, propensity matching, and inverse probability of treatment weighting (IPTW). Described here is the use of IPTW to balance baseline comorbidities in a cohort of patients within the US Military Health System Data Repository (MDR). The MDR is a relatively optimal data source, as it provides a contained cohort in which nearly complete information on inpatient and outpatient services is available for eligible beneficiaries. Outlined below is the use of the MDR supplemented with information from the national death index to provide robust mortality data. Also provided are suggestions for using administrative data. Finally, the protocol shares an SAS code for using IPTW to balance known confounders and plot the cumulative incidence function for the outcome of interest.
Randomized, placebo-controlled trials are the strongest study design to quantify efficacy of treatment, but they are not always feasible due to cost and time requirements or a lack of equipoise between treatment groups1. In these instances, a retrospective cohort design using large-scale administrative data ("big data") often provides an efficient and cost-effective alternative, though the lack of randomization introduces treatment selection bias2. Treatment selection bias occurs in non-randomized studies when the treatment decision is dependent on pre-treatment characteristics that are associated with the outcome of interest. These characteristics are known as confounding factors.
Because propensity scores minimize this bias by balancing the known confounders between treatment groups, they have become increasingly popular3. Propensity scores have been used to compare surgical approaches4 and medical regimens5. Recently, we have used a propensity analysis of data from the United States Military Health System Data Repository (MDR) to assess the effect of statins in primary prevention of cardiovascular outcomes based on the presence and severity of coronary artery calcium6.
The MDR, utilized less frequently than the Medicare and VA data sets for research purposes, contains comprehensive administrative and medical claims information from inpatient and outpatient services provided for active duty military, retirees, and other Department of Defense (DoD) healthcare beneficiaries and their dependents. The database includes services provided worldwide at US military treatment facilities or at civilian facilities billed to the DoD. The database includes complete pharmacy data since October 1, 2001. Laboratory data is available from 2009 but is only limited to military treatment facilities. Within the MDR, cohorts have been defined with methods including use of diagnoses codes (e.g., diabetes mellitus7) or procedure codes (e.g., arthroscopic surgery8). Alternatively, an externally defined cohort of eligible beneficiaries, such as a registry, can be matched to the MDR to obtain baseline and follow-up data9. Unlike Medicare, the MDR includes patients of all ages. It is also less biased towards males than the VA database since it includes dependents. Access to the MDR is limited, however. Generally, only investigators that are members of the Military Health System can request access, analogous to requirements for use of the VA database. Non-government researchers seeking access to Military Health Systems data must do so through a data sharing agreement under the supervision of a government sponsor.
When using any administrative data set, it is important to bear in mind the limitations as well as strengths of administrative coding. The sensitivity and specificity of the code can vary based on the related diagnosis, whether it is a primary or secondary diagnosis, or whether it is an inpatient or outpatient file. Inpatient codes for acute myocardial infarction are generally accurately reported with positive predictive values over 90%10, but tobacco use is often undercoded11. Such undercoding may or may not have a meaningful effect on a study's results12. Additionally, several codes for a given condition may exist with varying levels of correlation to the disease in question13. An investigative team should perform a comprehensive literature search and review of the International Classification of Disease, Ninth Revision, Clinical Modification (ICD-9-CM) and/or ICD-10-CM coding manuals to ensure that the appropriate codes are included in the study.
Several methods can be employed to improve the sensitivity and accuracy of the diagnostic codes to define comorbid conditions. An appropriate "look-back" period should be included to establish baseline comorbidities. The look-back period includes the inpatient and outpatient services provided prior to study entry. A period of one year may be optimal14. Additionally, requiring two separate claims instead of a single claim can increase specificity, while supplementing coding data with pharmaceutical data can improve sensitivity15. Select manual chart audits on a portion of the data can be used to verify accuracy of the coding strategy.
Once comorbidities have been defined and assessed for the cohort in question, a propensity score may be used to balance differences in covariates between treatment groups. The propensity score is derived from the probability that a patient is assigned to a treatment based on known covariates. Accounting for this propensity treatment reduces the effect that the covariates have on treatment assignment and helps generate a truer estimate of the treatment effect on the outcome. While propensity scores do not necessarily provide superior results to multivariate models, they do allow for assessment of whether the treated and untreated groups are comparable after applying the propensity score3. Study investigators can analyze the absolute standardized differences in covariates before and after propensity matching or inverse probability of treatment weighting (IPTW) to ensure known confounders have been balanced between groups. Importantly, unknown confounders may not be balanced, and one should be aware of the potential for residual confounding.
When executed properly, though, propensity scores are a powerful tool that can predict and replicate results of randomized controlled trials16. Of the available propensity-score techniques, matching and IPTW are generally preferred17. Within IPTW, patients are weighted by their propensity or probability for treatment. Stabilizing weights are generally recommended over raw weights, while trimming of the weights can also be considered18,19,20,21.
Once study groups are balanced, they may be followed until the outcome of interest. Studies utilizing administrative data may be interested in outcomes such as readmission rates and time-to-event analyses. In studies interested in mortality, the Military Health System Data Repository includes a field for vital status that can be further augmented using the national death index (NDI)22,23. The NDI is a centralized database of death record information from state offices that is managed by the Center for Disease Control. Investigators can request basic vital status and/or specific cause of death based on the death certificate.
The following protocol details the process of conducting an administrative database study using the MDR augmented with mortality information from the NDI. It details the use of IPTW to balance baseline differences between two treatment groups including SAS code and example output.
The following protocol follows the guidelines of our institutional human ethics committees.
1. Defining the cohort
2. Defining covariates and outcomes
3. Submitting a request for the MDR
4. Accessing the MDR and extracting relevant data
5. Merging data and constructing summative files
6. Match to the national death index (NDI)
7. De-identifying data
8. Computing the propensity score18,19,26
9. Creating the outcome model and generating a plot of cumulative incidence function
Upon completion of IPTW, tables or plots of the absolute standardized differences can be generated using the stddiff macro code or the asdplot macro code, respectively. Figure 1 shows an example of appropriate balancing in a large cohort of 10,000 participants using the asdplot macro. After application of the propensity score, the absolute standardized differences were reduced significantly. The cutoff used for the absolute standardized difference is somewhat arbitrary, though 0.1 is often used and denotes negligible difference between the two groups. In a small cohort, proper balancing is more difficult to achieve. Figure 2 shows the unsuccessful results of attempting to balance covariates in a cohort of 100 participants.
Once the standardized propensity score is generated, the study team can proceed with outcome analysis. Survival analysis is often employed due to the need to censor participants with uneven follow-up information, and Figure 3 depicts an example of the use of proc phreg with standardized propensity score weights to generate a cumulative incidence function (CIF) plot. The CIF plot depicts the increasing number of events over time. In this case, the untreated, or control, group (No Rx) has a larger number of events and is comparatively worse than the treated group (Rx).
Figure 1: Example of successful balancing. In a large cohort (n = 10,000), IPTW achieved balancing of the covariates with all absolute standardized differences reducing to less than 0.1. Please click here to view a larger version of this figure.
Figure 2: Example of unsuccessful balancing. In a small cohort (n = 100), IPTW was unable to achieve balancing of the covariates with many absolute standardized differences remaining greater than 0.1. Please click here to view a larger version of this figure.
Figure 3: Example of cumulative incidence function plot comparing treatment groups. Over time, the cumulative incidence of mortality increases in both groups, though it is higher in the untreated group (No Rx). Thus, in this example, the treated group has improved survival. Please click here to view a larger version of this figure.
Supplementary Materials. Please click here to view this file (Right click to download).
Retrospective analyses using large administrative datasets provide an efficient and cost-effective alternative when randomized controlled trials are not feasible. The appropriate data set will depend on the population and variables of interest, but the MDR is an attractive option that does not have the age restrictions seen with Medicare data. With any data set, it is important to be intimately familiar with its layout and data dictionary. Care should be taken along the way to ensure that complete data are captured, and data are accurately matched and merged.
Codes for diagnoses should be defined using existing literature and a thorough understanding of the ICD-9-CM and ICD-10-CM coding system to maximize the value of the assigned diagnoses. Existing sets of comorbidity codes, including the Elixhauser27 or refined Charlson comorbidity index28,29, can be used to define comorbid conditions that may influence the outcome of interest. Likewise, validated coding algorithms in administrative data and should be leveraged. Validation should remain an area of active research, as there is continued learning on the optimal use of ICD-9-CM and ICD-10-CM coding algorithms to maximize accurate classification of a wide-range of diseases.
Propensity scores can be used to address the bias inherent in any retrospective analysis. Effective propensity score weighting or matching should reduce the absolute standardize difference (ASD) below the desired threshold, generally set at 0.1. Appropriate balancing helps ensure comparability of the treatment groups with regard to known confounders, and appropriately employed propensity score techniques have been used to successfully replicate randomized trial results. Once properly balanced, the treatment groups can be compared with univariate time-to-event or other analysis.
Even with appropriate balancing, there is potential for residual confounding3, so the investigative team should limit the effect of unmeasured confounders. Additionally, if the effects of the covariates on treatment selection are strong, bias may still remain30. In small cohorts, the propensity scores are unlikely to fully reduce the ASD below 0.1 for all variables and regression adjustment can be employed to help remove residual imbalance31. Regression adjustment can also be used in subgroup analysis when appropriate balance is no longer assured.
When done correctly, research with administrative data provides timely answers to important clinical questions in the absence of randomized clinical trials. While it is impossible to remove all bias from observational studies, bias can be limited by using propensity scores and remaining meticulous analyses.
The authors have nothing to disclose.
Research reported in this publication was supported by the National Center for Advancing Translational Sciences of the National Institutes of Health under Award Number UL1 TR002345. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Disclaimer: Additionally, the views expressed in this article are those of the author only and should not be construed to represent in any way those of the United States Government, the United States Department of Defense (DoD), or the United States Department of the Army. The identification of specific products or scientific instrumentation is considered an integral part of the scientific endeavor and does not constitute endorsement or implied endorsement on the part of the author, DoD, or any component agency.
Name | Company | Catalog Number | Comments |
CD Burner (for NDI Request) | |||
Computer | |||
Putty.exe | Putty.org | ||
SAS 9.4 | SAS Institute Cary, NC | ||
WinSCP or other FTP software | https://winscp.net/eng/index.php |
Request permission to reuse the text or figures of this JoVE article
Request PermissionThis article has been published
Video Coming Soon
Copyright © 2025 MyJoVE Corporation. All rights reserved