What Counts as “Treatment” When the Thing Asking for Records Is a Bot⁉️, UpToDate + OpenAI 📚, 80% of Code Written by Bots 🤖.

Jun 06, 2026

What Counts as “Treatment” When the Thing Asking for Records Is a Bot?

Two stories, one fault line. Amazon’s new health AI bot, launched through One Medical, is raising the question of whether an AI can request patient records “as someone involved in treatment” — and whether that precedent lets any bot pose as a telehealth provider to pull data. Meanwhile Brendan Keeler frames the Epic v. Health Gorilla data-access fight as a prisoner’s dilemma: every co-defendant shares a collective defense and an individual incentive to betray it.

Same unresolved word underneath both: treatment. Our entire data-access model assumes a licensed human on the other end. Autonomous agents break that assumption quietly.

“Who or what counts as a treating provider for data access” is about to be one of the most consequential undefined terms in health IT — and builders touching patient data should watch where it lands.

💡 80/20: If your tool requests records on a patient’s behalf, write down today exactly whose treatment relationship authorizes it. If the honest answer is “the bot’s,” you’re building on sand that’s about to shift.

UpToDate & OpenAI — Grounded, Not Open-Web

Wolters Kluwer is using OpenAI’s API to power UpToDate Expert AI — interactive clinical reasoning grounded in UpToDate’s curated content rather than the open internet. More than half of its ~2,000 U.S. enterprise hospitals had signed on for UpToDate Expert AI by April; they expect ~70% by midyear.

To be clear: this is an API customer relationship, not a joint product. Wolters Kluwer runs its own model-agnostic platform (FAB) and retains full ownership of product design and data governance. OpenAI is the infrastructure, not the co-builder.

The strategic tell is grounding. The pitch isn’t “a smarter chatbot,” it’s “a chatbot that can only answer from a source your risk office already trusts.” The model is a commodity; the curated, liability-bounded corpus is the product.

Retrieval-grounding-as-trust is becoming the default architecture for clinical AI — and the moat is the corpus, not the model.

😤 “Isn’t this just RAG with a famous brand?” Pretty much — and that’s the point. The brand is the moat, because in clinical AI the unglamorous question “where did this answer come from and who’s liable for it” beats raw model quality every time.

💡 80/20: If you’re building clinical AI, stop competing on model and start competing on corpus. The defensible question is “what trusted source is this grounded in, and can the buyer live with it?”

[anyone want to test uptodate expert ai vs openevidence vs others with evals? email me at kevin@clinicians.build or reply to this email]

The Clinicians Are Just… Shipping Now

Two posts that are the thesis in the wild. An internist built a browser-based PHI de-identifier — redacting protected info locally before any text touches an AI tool — an idea he’d carried for a decade with no way to build it. His line: the constraint was never the supply of ideas, it was the distance between an idea and its existence, and AI just collapsed it.

An orthopaedic surgeon argues the health-system AI retreat is the private-practice opening: big systems are cautious for rational reasons — scale, governance, legacy EHRs — and that caution is exactly the gap where an aligned, fast-iterating practice can build the admin tool nobody at the mothership will prioritize. He shipped his own DME/prior-auth tracker.

The barrier to building fell for clinicians specifically — and the ones who know where the friction lives are turning a decade of “someday” into a weekend.

💡 80/20: Start in the recoverable administrative layer, on synthetic or non-PHI data, where a mistake is an annoyance and not a harm. That’s where a clinician-builder’s first shipped thing should live.

Ultra-short:

An AI is “validated and reimbursed” — and someone’s saying it out loud. At a federal health-AI event this week, Michael Abramoff framed autonomous AI as an already-validated, reimbursed clinical service. True — autonomous diabetic-retinopathy screening has a national Medicare code. Worth knowing the catch: that code’s payment eroded as the tech got cheaper. “Get your own code” is not the gold rush it sounds like.
The Transformer is eating drug discovery. Alnylam signed a deal worth up to $2B with Inceptive, the startup from a co-inventor of the Transformer, to design RNAi therapeutics with “foundation models of life.” The architecture behind your chatbot is now designing medicines.
RFK Jr. is seeking federal access to most Americans’ medical records for autism/vaccine research, per CNN — a reminder that the same interoperability rails builders cheer for are dual-use, and “who can pull the data and why” is a live political question, not a settled one.
“NP + AI = MD?” An EM physician reframes the AI-in-medicine question from “will it replace doctors” to “how much physician involvement is actually necessary for a good outcome” — AI as a way to extend scarce expertise across an expanding NP workforce, not replicate it.

🎙️ From the Pods

🎙️ HIMSSCast — “Women in health IT discuss ways to drive the industry forward”

Four EHR Association leaders — including a former emergency physician now a medical director — keep circling the same skill: translating between the clinicians who say one thing and the engineers who hear another. In the early days, what came out the other side “was not necessarily mutually satisfactory to either.”

💡 Builder take: The clinician-builder is that translation layer, in one head. That’s not a soft skill — after this week’s Nature Medicine result, it’s the skill that decides whether the model’s answer ever reaches the patient.

🎙️ The 229 Podcast — “Rewriting and Overcoming the Burnout Narrative” (Bree Bacon)

A leader who “crashed and burned” makes a point that lands hard for solo builders: the leadership shadow. If you never shut off, your team learns they can’t either — even when you tell them otherwise.

💡 Builder take: When you’re the founder, the engineer, and the clinician, the leadership shadow falls on you. The build that survives is the one whose pace you could actually sustain past month three.

The Interface Is the Intervention

Start with the number everyone’s quoting: Anthropic now says more than 80% of the code it merged into its own codebase in May was authored by its model, up from low single digits a year ago. The engineering barrier didn’t lower. It fell over.

Most people look at this as a self recursion story. But I wonder if the story is this: if anyone can build the model, the model isn’t the moat.

Hold that thought against the best evidence we have.

A randomized study in Nature Medicine put 1,298 ordinary people through ten medical scenarios — some with a frontier LLM, some with whatever they’d normally use. The models, tested alone, were excellent: they identified the right condition 94.9% of the time.

The humans using those same models identified it less than 35% of the time — no better than the people left with Google and a hunch.

The knowledge was sitting right there and the interface dropped it on the floor. The model knew; the human didn’t get it out.

The scarce input was never the model — it’s the judgment about how a real, scared, distracted human meets the model in the two minutes that matter. People fed the LLM half the story, anthropomorphized its confidence, and walked past the correct answer when it appeared.

Note what this does to benchmark worship. MedQA scores north of 80% still produced human-plus-LLM accuracy under 20% in places. A brilliant model behind a bad interface is a bad product — and “bad product” here means a missed diagnosis. The exam score is not the outcome. The interaction is the outcome.

😤 “This just proves AI isn’t ready for medicine.” Wrong lesson. It proves the model is ready and the interface isn’t — which is the most builder-friendly finding imaginable, because the interface is the part you can actually build. The gap isn’t a wall, it’s a job opening.

😤 “That study used last-gen models — the new ones are better.” The models were already at 95% alone. Making them 97% doesn’t touch the failure, because the failure was downstream of the model. You can’t patch a human-factors problem with a better benchmark.

😤 “So UX consultants win? Great.” No — clinical UX wins. Knowing which wrong turn a frightened parent takes at 11 p.m., which symptom gets pattern-matched into a benign bucket, where the handoff fails. A generic designer can’t see those. You can.

💡 80/20: Your product’s UX is your clinical outcome. Take one workflow you know cold, generate a handful of synthetic cases, and watch a non-expert use your tool end-to-end — measure how often the right answer the model produced actually changed what the human did. That delta, not the model’s accuracy, is your product.

AI Is Now Writing Most of Its Own Code — and the Moat Moved to Verification

Anthropic’s recursive-self-improvement report isn’t just the 80% stat. METR’s task-horizon doubling sped from ~7 months to ~4; open-ended coding success jumped 50 points in six months; the company frames three futures, from “stalls and diffuses” to full self-improvement.

For a clinician-builder the takeaway is uncomfortable and freeing: writing the code is no longer the constraint. Knowing whether the code does the right thing is.

An internist auditing his own app this week found 8 of 9 “finished” features were hollow stubs — behind 106 passing tests. Green dashboard, empty room. He called verification discipline the real moat, and he’s right.

When the model writes everything, the scarce skill is clinical-grade skepticism: testing against what the thing is supposed to do, not whether it compiled.

😤 “If AI writes the code, what’s left for me?” The part that was always yours — deciding what “correct” means for a patient, and proving the thing meets it. The model writes the feature. You’re the only one who can tell whether it’s safe.

[one way to kinda do this now a days is use /goal]

💡 80/20: Before you trust any AI-built clinical feature, write the acceptance test first — the specific wrong answer it must never give on a synthetic case. A passing test suite proves compilation; your acceptance criteria prove completeness.

💡 BTW: Doug Fullington — the internist who shipped a browser-based PHI de-identifier this week — also built a musical-theater app. The same collapse of the idea-to-existence distance that turns a decade-old clinical tool into a weekend project doesn’t care whether you point it at de-identification or show tunes. (dfullington.substack.com)

What are you building this week? Reply and tell me — I read every one (hattip Raj for letting me know this was kinda broken - it is now fixed).

— Kevin

clinicians.build

Discussion about this post

Ready for more?