12M tokens fit a whole patient chart 📋, CMS drags prior auth into 2027 ⚡, Epic's boring ROI wins 🏆

May 07, 2026

The Context Window Fits a Whole Patient Chart. The Model Still Loses the Needle.

Subquadratic emerged from stealth with $29M in seed funding and a model called SubQ that holds 12 million tokens natively — no RAG, no chunking, no retrieval workarounds. The architecture uses sub-quadratic sparse attention (SSA) that scales linearly instead of quadratically, running 52x faster and roughly 1/5 the cost of frontier models at 1M-token contexts. Founded by CEO Justin Dangel and CTO Alex Whedon (former Head of Generative AI at Meta), the company is targeting 100M tokens by Q4 2026.

Twelve million tokens is a complete electronic health record. Multi-year clinical documentation, lab results, imaging reports, medication histories, encounter notes — all in a single inference call. No more splitting patient charts across chunks and hoping the retrieval layer finds the right one.

But the same week, clinical AI researcher David Tang published a needle-in-haystack benchmark on medical tasks across DeepSeek-V4 and Gemini models. The result: all three models held near-100% accuracy through ~400K tokens, then fell off a cliff to ~20% around 600K-800K tokens. The marketed context window and the usable context window are not the same thing on clinical retrieval tasks.

Both findings are true simultaneously. The window got big enough to hold the chart. The retrieval didn’t get good enough to find the potassium from page 47 when it matters at 2 AM. SubQ’s SSA architecture may or may not solve this — independent benchmarks on clinical retrieval tasks don’t exist yet. Until they do, the 12M number is a capability ceiling, not a reliability floor.

😤 Haters

“RAG isn’t going anywhere — this is just a larger haystack.” Half right. RAG solves a real problem: precision retrieval from structured knowledge. But RAG on an EHR is painful because clinical data is messy, semi-structured, and the relationships between a med change on page 12 and a lab trend on page 38 are exactly the kind of long-range dependencies that chunking destroys. A native 12M window doesn’t make RAG obsolete. It makes hybrid architectures — long context for the full picture, RAG for precision retrieval on specific sub-queries — the obvious next pattern.

“Nobody has validated this on clinical data.” Correct, and that’s the point. SubQ’s benchmarks are RULER 128K (97%) and MRCR v2 (83) — general-purpose evals. The David Tang benchmark shows what happens when you test long-context on medical tasks specifically. Until SubQ runs a clinical needle-in-haystack eval, the 12M number is an architecture claim, not a clinical utility claim. Clinician-builders should be the ones designing those evals.

💡 80/20: The context window is now big enough. The eval is not. The clinician who builds the benchmark — the needle-in-haystack test using real clinical retrieval tasks (find the drug interaction across a 200-page chart, surface the abnormal trend buried in 3 years of labs) — builds the eval that every long-context model will have to pass before it touches a patient chart.

CMS Pushes Prior Auth into the 21st Century — Electronic PA Interfaces Go Live January 2027

CMS Administrator Mehmet Oz authored a piece on moving prior authorization into the 21st century, and the numbers are already landing. Leading health plans eliminated 6.5 million prior authorizations — 11% of the total — through the HHS/CMS industry pledge. Electronic PA interfaces from impacted payers go live January 1, 2027, with use eventually tied to Promoting Interoperability and MIPS scoring. CMS projects $15 billion in savings over 10 years.

😤 Haters

“CMS has been promising PA reform for a decade.” They have. The difference this time: the Jan 2027 e-PA deadline is regulation, not aspiration. Payers who don’t comply face Promoting Interoperability and MIPS consequences. The enforcement mechanism finally has teeth.

“11% reduction is a rounding error in PA misery.” Fair. But 6.5 million fewer PAs is 6.5 million fewer phone calls, faxes, and portal logins. The remaining 89% is the build opportunity — and FHIR-based PA APIs (DaVinci PAS implementation guides) are the infrastructure that makes automated PA workflows possible.

💡 80/20: The e-PA mandate creates a new API surface that didn’t exist before. If you’re building anything that touches ordering workflows — prior auth automation, denial management, care pathway tools — start learning the DaVinci Prior Authorization Support (PAS) Implementation Guide now. The payers have until Jan 2027. Your prototype should be ready before they are. Try: spin up the Inferno test suite and run a PAS workflow against a sandbox payer endpoint.

→ Full write-up

Epic XGM 2026: The Boring Workflows That Actually Pay

From the fantastic write up by Dr John Lee’s substack: The real story from Epic XGM wasn’t in the AI keynotes — it was in the implementation sessions. Agent Factory now lets Care Path criteria be expressed as markdown documents that AI interprets at runtime, cutting build timelines from months to days. Cosmos predicted median length of stay using just 4 variables — matching models built on 400+. Reid Health’s econsult workflow for cardiac risk stratification produced 2,400+ econsults, a 20% reduction in surgery cancellations, and $276K in annual revenue. OLVG Amsterdam ran 17 live Care Paths across 26K patients, achieving 50% fewer physical visits for type 1 diabetes and 66% fewer for HIV management.

😤 Haters

“Agent Factory is just a wrapper on existing Epic build tools.” It’s a wrapper that turns months into days. The abstraction matters because it shifts who can iterate on clinical logic — markdown is readable by clinical leads, not just build analysts. That’s a workflow change, not just a speed change.

“These are cherry-picked ROI numbers from conference presentations.” Conference presentations are always cherry-picked. The useful signal isn’t the magnitude — it’s the pattern. Econsult triage, pre-op documentation, chronic disease Care Paths. The ROI comes from boring workflows with high volume, not from impressive demos.

💡 80/20: The multiplier effect at XGM is clear: Epic’s AI advantage isn’t the flashiest model — it’s the ability to turn clinical logic into running workflows fast.

💯 Subscribe to John Lee’s substack, he’s a doc and an epic guru who knows his stuff.

OpenMed: 1 Million Clinical Notes Redacted — AWS $25,000 vs. MacBook $0

Maziyar Panahi benchmarked OpenMed — his open-source clinical PII redaction toolkit — against AWS Comprehend Medical on 1 million clinical notes across 50+ PII categories. The claim: same accuracy, zero marginal compute cost when run locally via MLX on Apple Silicon. All 18 HIPAA Safe Harbor identifiers detected. Supports English plus 8 additional languages. Apache-2.0 licensed.

⚠️ Verify: “Same accuracy” is a vendor claim against a specific benchmark. Class-imbalance failures (addresses caught well, dates poorly) may hide in aggregate F1 scores. Production throughput at health-system scale (100M+ notes/year) is different from a MacBook benchmark. And a managed service like AWS silently updates models — a local deployment requires someone who owns model versioning and re-evaluation. The $25K → $0 comparison is real on day one; review the maintenance cost trajectory before committing.

😤 Haters

“Open-source healthcare AI with no BAA is a liability.” Correct — OpenMed is a toolkit, not a service. There’s no BAA because there’s no vendor handling your data. That’s the feature. Your data never leaves your machine.

🧰 Builder’s Tip

The edge moves. Chase the layer it’s on.

Every few months, the layer where clinician expertise matters shifts up. Last year the edge was knowing how to code a FHIR query. Then Cursor and Claude Code made that a weekend skill. The edge moved to knowing which FHIR resource to query and what the result means clinically. Now SubQ ships a 12M context window, and the edge moves again — from “can you chunk and retrieve from a patient chart?” to “do you know which needle in the chart actually matters for this patient at this moment?”

The pattern: every time a technical barrier falls, the advantage moves up one layer toward clinical judgment. The builder who chases the commoditized layer is always six months behind. The builder who identifies the new scarce layer — and builds for it — stays ahead.

Concrete exercise: look at your current project and ask “which part of this would a non-clinician get wrong?” Not the code. Not the UI. The clinical decision logic. The edge case that only shows up at 2 AM. The ordering sequence that matters for the sick patient but not the stable one. That’s the layer your edge is on right now. Build there.

What are you building this week? Reply and tell me — I read every one.

— Kevin

clinicians.build

Discussion about this post

Ready for more?