Epic's AI flunks the real world 📉, AI scribes save 13 minutes 🤷, EFF sues Medicare's AI 🔍

A Northwell Health meta-analysis shows Epic's predictive models don't hold up outside the lab — and the implications for every clinician-builder trusting vendor AI.

Apr 03, 2026

🔬 The Big Thing

Epic’s Predictive AI Models Underperform in Real-World Clinical Settings

A meta-analysis by Northwell Health researchers, published in SpringerNature’s Journal of General Internal Medicine, reviewed five of Epic’s out-of-the-box AI tools — the Deterioration Index, Sepsis Model, Unplanned Readmission Model, End of Life Care Index, and Risk of No Show Model. None exceeded an AUROC of 0.79. Confidence intervals for three models fell below Epic’s own published performance stats. The clinical consequence: false positives driving unnecessary diagnostics, antimicrobial overuse, and alert fatigue across the 42% of U.S. acute care systems running Epic.

Epic responded that the study examined first-generation models and pointed to second-generation versions with local fitting capabilities. That’s a fair clarification — but it sidesteps the real problem. Most health systems deployed first-gen models because Epic shipped them as ready-to-use. Local validation requires clinical informatics teams, data infrastructure, and protected time that most hospitals don’t have. The gap between “available” and “validated” is where patients sit.

😤 Haters

“This is just academic researchers dunking on industry. Every new model underperforms at first.” The researchers weren’t testing bleeding-edge tools — they reviewed models that have been deployed in live clinical workflows for years. Underperformance at launch is expected. Underperformance after widespread deployment is a patient safety issue.

“Epic already fixed this with second-gen models.” Maybe. But there’s no independent validation of the second-gen models yet, and no consensus on how health systems should even run those validations. The ONC transparency rules that would have required this kind of disclosure may be sunsetted under the current administration. So the fix is: trust Epic.

“AUROC of 0.79 isn’t that bad.” In a controlled setting, no. In a sepsis alert that fires on a patient who doesn’t have sepsis, waking up a resident, triggering an antibiotic cascade — the clinical cost of false positives compounds fast.

💡 80/20: If you’re building on top of EHR-native AI predictions, don’t inherit their confidence scores uncritically. Build your own validation layer — even a simple confusion matrix on your patient population tells you more than the vendor spec sheet. Try: pull 100 recent alerts from your institution’s Deterioration Index and chart how many led to a meaningful clinical intervention.

→ Full write-up

EFF Sues CMS for Transparency on Medicare’s AI Prior Authorization Pilot

The Electronic Frontier Foundation filed a FOIA lawsuit against CMS on March 25 — and it’s only now getting coverage outside the legal press. The suit demands records about WISeR — an AI system evaluating prior authorization requests across six states, affecting roughly 6.4 million Medicare beneficiaries. Nobody outside CMS knows which vendors built it, which models it uses, or what safeguards exist. The most troubling detail: vendors are compensated partly through denial rates, with up to 20% of savings from rejected authorizations flowing back to them. That’s a financial incentive baked directly into the algorithm’s decision function.

😤 Haters

“Prior authorization has always been opaque. This isn’t new.” It’s not new that prior auth is a black box. It is new that the black box is now an AI making automated decisions at scale, with no disclosed audit mechanism and a financial incentive to deny. The EFF’s involvement signals this is crossing from health IT complaint to civil liberties issue.

“CMS has the right to pilot new programs without full disclosure.” They do. But FOIA exists precisely for programs that affect millions of people and lack transparency. If the model is sound, transparency should be easy.

💡 80/20: If you’re building anything in the prior authorization space, the WISeR pilot is your regulatory weather vane. Whatever transparency and audit requirements emerge from this lawsuit will likely shape the standard for all AI-driven auth tools. Reframe: build your audit trail now, not after regulators require it.

→ Full write-up

Qualified Health Raises $125M Series B — Anthropic Invested

Qualified Health announced a $125M Series B last week, led by NEA, with participation from Menlo Ventures’ Anthology Fund (created with Anthropic), Transformation Capital, and others. The public benefit corporation builds an enterprise AI platform for health systems — workflow automation, agent development, clinical safeguards, and governance infrastructure. Customers include Emory, Jefferson Health, and the entire UT System (8 institutions). UTMB reported $15M+ in measurable impact within six months. The platform now supports 500,000+ users across systems representing roughly 7% of U.S. hospital revenue.

😤 Haters

“Another AI platform raise. What makes this different?” Anthropic’s direct investment through Menlo’s Anthology Fund. That’s not just a check — it’s a signal that the model provider sees governance and clinical safeguards as a necessary layer, not a nice-to-have. Most AI startups sell the model. Qualified is selling the operational wrapper.

“$125M for governance software? Health systems won’t pay for guardrails.” UTMB generated $15M in run-rate impact in six months. The governance isn’t the product — it’s what makes the product deployable in a regulated environment.

💡 80/20: The “AI platform for health systems” category is consolidating fast. If you’re a clinician-builder shipping tools into health systems, study Qualified’s approach to governance and auditability — that’s the bar your tool will be measured against. Try: map your tool’s audit trail today and identify the gaps.

→ Full write-up

Healthcare’s Data Quality Crisis — “We Don’t Know What We Don’t Record”

Emergency physician and Epic consultant John Lee published the sharpest essay I’ve read on healthcare’s data quality problem. A stroke patient arrives with an empty medication list — no data from other systems. Clinicians guess whether to give thrombolytics. Medication lists track what was ordered, not what patients actually take. Safety reporting takes 10+ minutes per incident, so nobody files reports. The only data healthcare collects with genuine fidelity is billing and coding. Lee’s line that sticks: “An algorithmically confident recommendation generated from a dirty medication list is still a recommendation you should not trust.”

😤 Haters

“This is a known problem. Everyone in health IT knows the data is messy.” Knowing it and quantifying the clinical risk are different things. The stroke scenario isn’t hypothetical — it’s what happens every night in every ED. The question isn’t whether you know the data is bad. It’s whether you’re building systems that assume it’s good.

“AI can clean the data. That’s the whole point.” Lee actually agrees — he points to Epic’s Agent Factory as a potential tool for automated medication reconciliation. The catch is that the AI cleaning the data needs to be validated against the same data it’s trying to fix. It’s a bootstrapping problem.

💡 80/20: Before you build any clinical AI tool, spend one shift manually auditing the data it will consume. Pull 20 medication lists and compare them to what patients actually report taking. The gap will reshape your architecture. Reframe: your AI’s ceiling is your data’s floor.

→ Full write-up

🎯 Clinician-Builder Tip of the Day

When you’re prototyping a clinical tool, don’t start with the model. Start with the data audit. Pick the single most important data element your tool depends on — a medication list, a problem list, a lab trend — and manually review 20 patient records for accuracy. Time yourself. If it takes you more than 30 seconds per record to spot a discrepancy, your users will never catch the AI’s mistakes either. That 30-minute exercise will save you weeks of building on a foundation that doesn’t hold.

What are you building this week? Reply and tell me — I read every one.

— Kevin

John Lee

Apr 4

I knew there was a reason I liked you. 😁

More seriously though, your first point about sepsis is closely related to my piece about data quality. Even if data quality stinks, these predictive tools can work but only if the data stinks in the same way as the location where the model was sourced.

1 reply by Kevin Maloy

1 more comment...

clinicians.build

Discussion about this post

Ready for more?