Frontier LLMs beat the clinical tools 🥊, Medicare Advantage's denial machine 🚫, When AI memory becomes the chart 🧠

Jun 13, 2026

[Clinical shifts have me running light this week, so this is the compact cut.]

Frontier LLMs just beat the specialized clinical tools at their own game. A new Nature Medicine evaluation from Eric Oermann’s group at NYU Langone pitted OpenEvidence and UpToDate Expert AI against GPT-5.2, Gemini 3.1 Pro, and Claude Opus 4.6 — and the general models won across medical knowledge, clinician alignment, and real clinical queries. On the real-query stage, the specialized tools barely matched Google’s auto-enabled AI Overview. If you’re building a “medical-grade” wrapper, the moat isn’t fine-tuning — it’s independent, real-world eval.

The federal fraud watchdog put hard numbers on the Medicare Advantage denial machine. Two new HHS OIG reports found MA plans overturned 95% of appealed skilled-nursing denials (UnitedHealthcare 99.7%, Aetna 98.2%) — and naviHealth, UnitedHealth’s own subsidiary, denied 14% of requests and got 97% reversed on appeal. Only 18% of patients ever appealed. If your tool touches prior auth, this is the incentive water it’s swimming in.

CMS quietly stood up a new Office of Health Technology and Products. Effective June 9, OHTP (Federal Register) folds open source, data and interoperability platforms, interop policy, and the National Provider Directory under one CIO-governed roof. It’s plumbing, not a press release — but it’s the room where the next round of interop rules and directory fixes will get written.

Sam Ashoo asked the question nobody’s answered: when does AI memory become the medical record? In The Hidden Medical Record, the EM physician notes that as scribes move from documenting an encounter to remembering the patient across visits, an AI-generated tag like “progressive cognitive decline and medication nonadherence” can color every future decision — with no patient portal, no correction mechanism, and unclear HIPAA designated-record-set status. “Documentation captures what happened. Memory influences what happens next.”

Open-weight coding models keep shipping on a weekly cadence. Moonshot open-sourced Kimi-K2.7-Code (1T total / 32B active, native INT4, day-0 on vLLM and SGLang) and MiniMax released M3 (1M-token context, GGUF on release). Neither beats the closed frontier yet — but for a clinician-builder who wants a model that’s ownable and never phones home, the gap is closing fast. Track these the way you track Gemma.

🎙️ From the Pods

🎙️ The 229 Podcast — “Houston Methodist’s Approach to Innovation and Physician Champions“ (with Michelle Stansbury)

The virtual nursing rollout at Houston Methodist didn’t scale because of the tech — it scaled because they turned skeptical bedside nurses into advocates, and they treat token cost as a real budget line, not a rounding error.

💡 Builder take: When you co-develop with Epic, the move is “tell them what they need to hear, not what they want to hear” — and put the token bill in the business case on day one, not after the pilot blows past budget.

🔇 Speaker Blindspot: Survivorship bias — the episode profiles the bets that worked (the nurses who came around, the deployments that scaled) without the denominator of pilots that quietly died in committee or never left the lab at Cypress.

🎙️ HIMSSCast — “What are hospitals’ obligations for sharing cybersecurity info with the FBI?“ (with Amy Worley, BRG)

When the FBI asks hospitals to “go on offense” and share threat intel, you can share indicators of compromise — IP addresses, attack TTPs, log files — without tripping HIPAA, but only if you’ve written the what-we-share playbook in advance and stripped PHI first (watch for URLs with patient identifiers baked in).

🔇 Speaker Blindspot: Appeal to authority / absence of evidence — the guest reassures that the FBI “treats the hospital as a victim, not a regulator,” then admits that’s an unofficial, unwritten, unenforceable observation. Comforting until the day it isn’t.

What are you building this week? Email and tell me (kevin@clinicians.build) — I read every one.

— Kevin

clinicians.build

Discussion about this post

Ready for more?