⏪ Blast from the Past

(Jan 2026 was an eternity ago)

Mar 23, 2026

“Evaluation of electronic health record–integrated artificial intelligence chart review” (npj Health Systems, 2026 — ~1 month old)

Nicolas Kahl, MD — a clinical informaticist and attending emergency physician at UC San Diego Health — published a prospective evaluation of physician feedback on an EHR-integrated LLM chart review tool used on real patients. The finding that stopped me: of 147 AI-generated summaries reviewed by 10 physicians, feedback identified 46 omissions, 20 confusing items, 27 token limitation issues — and just 5 hallucinations.

This is the real-world inversion of what most people think the AI safety risk is in clinical documentation. We’ve been focused on hallucinations — the AI making things up. This paper says that in deployed EHR tools, omission is the more common failure: the AI leaving out information the physician needs. Not lying, just not including the potassium of 7.2 that changed management. That’s not a hallucination problem. It’s a completeness problem. And completeness is harder to eval for, because absence doesn’t announce itself.

The paper is from earlier this year, but given the Trilliant coding intensity study (ambient scribes pushing billing codes up) and the Mount Sinai research on ChatGPT Health undertriaging emergencies both circulating this week, this adds the third dimension: accuracy at the individual note level. Useful data if you’re building anything in the documentation space. The physicians found the tool acceptable — but they also flagged that token limits in particular were driving systematic gaps. That’s an architectural constraint worth building around, not hoping the model eventually handles.

npj Health Systems — Kahl et al. · Nicolas Kahl, MD on LinkedIn

clinicians.build

Discussion about this post

Ready for more?