Closed claims are the workhorse for serious HEOR work, and for good reason. Enrollment continuity is the precondition for longitudinal analysis. If you can't see when a patient enters and leaves a payer, you can't follow them. Closed-only is the defensible default.
It is also incomplete. The gaps don't show up in your dashboards, because closed-only data, by definition, doesn't tell you what it isn't capturing.
To quantify those losses, we took 9.3 million patients with full continuous closed enrollment in 2024 and pulled their open-claims records over the same period. Same patients, same window, two views. The procedures, diagnoses, and drug-level signals visible in open but absent in closed are exactly what closed-only misses, even for fully enrolled patients. They are also what hybrid claims data is designed to recover by layering open data onto closed data for the same patients over the same time period.
This blog is part of a series on hybrid claims data and real-world evidence. Read the first post here.
We compared procedures present in the open data but absent from the closed data, patient by patient, in the overlap cohort. Four categories accounted for the majority of the gap. Each category corresponds to a clinical signal HEOR teams routinely build cohorts around.
Hybrid Recovery From a Closed-Only Cohort
Top procedures present in open claims but absent from closed for the same patients, by category
Top procedures recovered by adding open claims to a closed-enrolled cohort, by category
The pattern is consistent: closed claims record what got paid and adjudicated. Open claims record what happened. For closed-enrolled patients, hybrid data shows both, and the gap between them is doing more work than most HEOR analyses acknowledge.
Drug-level analysis on a J-code line requires the NDC, the eleven-digit identifier that distinguishes one drug from another. Closed claims do not carry the NDC on J-code lines. Drug-level identification is therefore an open-claims exercise, full stop. For closed-enrolled patients, the only way to get there is to add the open layer.
This matters most on the unspecified J-codes, where the procedure code itself doesn't identify the drug:
These codes are used for new drug launches before a permanent code is assigned, for compounded products, and when the biller cannot map a drug to a specific code. The procedure code conveys almost no clinical information on its own, which is precisely why the NDC is essential.
In our open-claims data, NDCs are present on these codes at fill rates of 80% or higher, with one exception (J7999, compounded drugs, at 33%). In closed claims, the fill rate is zero across all six.
Drug-Level Identification on Unspecified J-codes
NDC fill rate, open vs. closed. Closed claims carry no NDCs on these lines, so any drug-level analysis on a closed-only cohort cannot proceed.
NDC fill rate by code, open vs closed (left). Per-line charge range, median to maximum, log scale (right). The same procedure code spans four to seven orders of magnitude in charge, leaving any aggregate analysis without an NDC vulnerable to a single mis-mapped outlier.
The right panel of the chart shows why the missing NDC matters for cost work, not only for drug attribution. On J3490, per-line charges range from $0.01 to $3.83 million, with a median of $32.40 and a coefficient of variation of 56. J3590 reaches $824,500. J7999 reaches $161,856. J9999 reaches $130,575. The same code is being used for trivial generic injections, compounded specialty preparations, and high-cost biologics or gene therapies, all of which collapse together in any cut that aggregates by procedure code. Without an NDC, a closed-only analysis cannot separate the $32 generic from the $3.8M outlier. For the same patients in a hybrid cohort, the open layer makes that separation possible.
For comparative effectiveness, biosimilar uptake, real-world regimen mapping, and market-share work on specialty drugs, an unspecified J-code is not a usable proxy for a drug. Without the open layer, the analysis cannot proceed. With it, the same closed-enrolled patients become available for drug-level inference.
Closed-only analysis is defensible, and the enrollment continuity it provides is not negotiable for longitudinal work. The argument here is not to give that up. It is to keep the closed-enrolled cohort and add the open-claims layer for the same patients over the same time window. Same denominator, more signal. That is what hybrid claims data is: a more complete view of the same patients.
Every HEOR team that takes claims data seriously should run this comparison on the codes their own work depends on, and ask what closed-only is missing. The answer will likely change how the question is scoped.