Ontario audits 20 AI scribes — all failed 🔬, Commure hits $7B 💰, Google AMIE beats PCPs 🤖

May 20, 2026

Ontario Tested 20 AI Scribes. All 20 Failed.

Ontario’s auditor general released the first government-scale audit of AI scribe vendors, and the results should make every clinician-builder pay attention.

All 20 approved vendors showed errors during procurement testing — hallucinations, incorrect information, or missing details.

Sixty percent recorded a different drug than what was prescribed. Nine of 20 fabricated clinical actions — referring patients for therapy or ordering blood tests that were never discussed in the simulated encounter.

And eleven of 20 approved vendors never submitted third-party security audits or certifications during procurement. They got approved anyway.

Meanwhile, approximately 5,000 Ontario physicians are using these tools in production. No patient harm has been reported — but the government’s own audit admits the system wasn’t designed to catch it.

😤 “This is Ontario. US procurement is different.” The failure pattern isn’t jurisdictional — it’s structural. US health systems running ambient scribe pilots are evaluating the same vendor pool with procurement processes that weight integration and pricing over clinical accuracy. If your system’s scribe evaluation didn’t include adversarial clinical scenarios with known-correct answers, you have the same blind spot Ontario just documented.

😤 “No patients were harmed.” No patients were reported harmed. The audit explicitly notes there’s no system in place to detect harm from AI-generated note errors. A fabricated therapy referral that a physician catches and deletes is invisible to the safety system. One that a physician signs without reading isn’t.

😤 “Physicians review every note before signing.” Physicians review ambient scribe notes the way they review auto-populated medication lists — selectively, under time pressure, with decreasing attention as trust builds. The 4-out-of-5 sign-off rate that gets cited as a feature is also the mechanism by which errors propagate.

💡 80/20: If you’re building clinical AI, build the eval before you build the product. Ontario’s audit is a template: two simulated patient encounters, structured scoring for hallucination, omission, and fabrication.

Commure Raises $70M at $7B — 85% of RCM Now Runs Without Humans

Commure announced $70M in new financing at a $7 billion post-money valuation, led by General Catalyst with Sequoia and Morgan Stanley.

The number that matters: their revenue cycle platform processes tens of billions in annual payments and completes 85% of work without human intervention. Across 500+ healthcare organizations and 3,000+ sites of care.

At $7B, Commure is now the most valuable private health tech AI company, and their thesis is that the trillion-dollar administrative layer is mostly automatable.

😤 “$7B for an RCM company seems steep.” Commure isn’t just RCM (Rev Cycle Management) — they’re positioning as the full administrative operating system (clinical workflow + ambient scribe + revenue cycle). The 85% autonomous figure is the moat. Competitors doing 40-50% automation are selling a tool. Commure is selling a replacement.

😤 “Health systems won’t hand over revenue cycle to one vendor.” They already did with the EHR. The question is whether Commure’s 130+ health system client base proves the model at scale or whether the complexity of payer-provider negotiations creates a ceiling.

💡 80/20: If you’re building anything that touches the administrative layer — prior auth, scheduling, referral management, coding — Commure at $7B resets your competitive landscape. Your product either integrates with their stack or competes with a company that has $890M in total funding and relationships with the nation’s largest health systems. Know which one you are.

Google AMIE Outperformed PCPs on 29 of 32 Evaluation Axes

A new Nature Medicine study evaluated Google’s multimodal AMIE (built on Gemini 2.0 Flash) against board-certified primary care physicians in simulated diagnostic encounters.

105 OSCE-style visits. 210 consultations. 18 specialist evaluators. AMIE won on top-1 differential diagnosis accuracy, 29 of 32 clinical quality axes, and 7 of 9 multimodal reasoning metrics. Patient actors rated AMIE similar or higher on listening, empathy, and trust.

The caveats matter: simulated patients, no physical exam, limited modalities. But the patients are already uploading images and text into chatbots for real consults.

😤 “Simulated patients aren’t real patients.” True. But as Stanford’s Ethan Goh noted, patients are already uploading both image and text into chatbots. The simulated setting is closer to reality than most text-only benchmarks. The gap between this study and the real world is narrowing faster than policy.

😤 “Outperforming PCPs on a test doesn’t mean it can replace them.” Nobody said replace. But when an AI system matches physician-level empathy ratings and beats them on diagnostic accuracy in multimodal encounters, the “augmentation only” framing starts to feel like the talking point it always was.

💡 80/20: If you’re building clinical decision support, AMIE’s architecture matters more than its scores. State-aware dialogue framework + multimodal input + structured reasoning is the design pattern that works. Build your eval to include images, documents, and multi-turn conversation — text-only benchmarks are now table stakes.

Aledade Picks Doximity for Clinical AI in Value-Based Care

Doximity will integrate its ambient scribe and clinical AI assistant into Aledade Assist, the EHR overlay used by Aledade’s network of independent primary care practices. The integration includes Scribe for ambient documentation and Ask (formerly DoxGPT) with PeerCheck — 10,000+ physician authors validating AI answers for accuracy.

The signal: the largest independent practice network in VBC just chose a physician-network AI vendor over the ambient scribe startups. Distribution through a trusted peer network may matter more than model quality.

Nourish Raises $100M for AI-Native Metabolic Clinic

Nourish closed a $100M Series C (Menlo Ventures) to scale its dietitian-led metabolic health platform. 10,000 RDs in network. Tripled appointments YoY. Covering 200M+ Americans through health plan partnerships. Clinical outcomes: 8% weight loss, 1.3-point A1C reduction, $2,000/patient annual cost savings.

Nourish proves the non-physician clinician + AI model at scale. 10,000 RDs doing metabolic care with AI support is the template for every allied health vertical: PT, pharmacy, behavioral health.

AI Layoffs Tank Stocks — Gartner Says Amplify, Don’t Cut

Companies announcing AI-related layoffs saw stock declines averaging 25%, with 56% of S&P 500 companies using AI layoffs experiencing negative market reaction. A Gartner study found the highest-ROI companies used AI for amplification, not headcount reduction. 49,135 workers have lost jobs to AI in 2026 alone. The builder takeaway: health systems that pitch AI as a workforce multiplier will get further than those pitching it as a cost cutter.

🎙️ From the Pods

🎙️ Lifers with Christina Farr — “Why Price Transparency Took a Decade to Crack” (Heather Fernandez, Solv)

Solv CEO Heather Fernandez described building individual AI agents for each step of the price transparency problem — insurance matching, plan verification, deductible tracking, and real-time patient cost estimation. ClearPay AI focuses on the consumer’s out-of-pocket cost, not what the insurer pays the provider — the side everyone else ignores.

💡 Builder take: The bottoms-up approach (one agent per data silo, clinic-level model training) is the architecture pattern for any healthcare AI that touches billing data. Top-down price estimation fails because the source of truth lives at the individual clinic level.

🎙️ Radio Advisory — “Is the Nursing Workforce Stabilizing?”

Turnover is back to pre-pandemic levels (18%, down from 27% peak). Time to fill dropped below 2019 levels. But half of bedside nurses surveyed plan to leave within three years. The CNIO role is emerging as the bridge between nursing leadership and tech deployment — nurse leaders need to be at the table when AI workflow decisions get made.

💡 Builder take: If you’re building clinical workflow tools, your deployment champion is increasingly the CNIO, not the CIO. Design your pilot pitch for someone who thinks in shifts and ratios, not APIs and SLAs.

🧰 Builder’s Tip

Tool Spotlight: Headroom — Agent Context Compression

Headroom is an open-source library that compresses what your coding agent reads into a fraction of the original tokens — keeping the semantic content while slashing the context window cost. If you’re running Claude Code, Codex, or any agentic coding tool against a healthcare codebase, your token spend on file reads is probably 60-70% of total cost.

Headroom sits between the agent and the files. It reads, compresses, and passes through the meaningful content. Works with any LLM-based agent that reads files. MIT-licensed.

Try it: Clone the repo, point it at a medium-sized project (your FHIR viewer, your Synthea pipeline), and compare the token count before and after. If it cuts your reads by 40%+, you just reduced your development cost proportionally. Works entirely on your local machine - no cloud dependency.

What are you building this week? Reply and tell me — I read every one.

— Kevin

clinicians.build

Discussion about this post

Ready for more?