Proven AI gathers dust while unproven AI runs the floor 🏥, JAMA wants to license AI like physicians 📋, Starbucks spent $600M learning what clinicians already know ☕

May 04, 2026

The Paradox of Medical AI: 44 RCTs gather dust while 72% of docs use unproven LLMs.

Eric Topol laid out the implementation paradox on Saturday. Deep learning for medical imaging — mammography, colonoscopy, retinal screening, CT — has more rigorous evidence than almost any technology in medicine. A new retinal foundation model called Reti-Pioneer adds detection for thyroid disease, gout, and osteoporosis at roughly a dollar per scan. Mayo’s AI system detects pancreatic cancer 475 days before radiologists. None of it is standard practice. Meanwhile, the AMA’s March 2026 survey shows 72% of physicians use generative AI, 35% for direct patient care decisions. The tools with the evidence aren’t deployed. The tools without the evidence are everywhere.

😤 Haters

“This is just an implementation lag — the imaging tools will catch up.” It’s been a decade for some of these. 44 RCTs for colonoscopy AI. The lag isn’t implementation — it’s economics. Every imaging AI deployment requires capital equipment integration, radiology workflow redesign, IT procurement, and a reimbursement model that rewards the extra detection. A ChatGPT tab requires nothing. Deployment friction is a structural property, not a temporary delay.

“Physicians using ChatGPT isn’t the same as clinical AI deployment.” It is functionally the same. When 35% of physicians are using generative AI for treatment and diagnostic decisions — not documentation, not admin, but patient care — that’s deployment. It’s just informal deployment without institutional oversight, validation, or liability frameworks. Which is worse.

“The evidence will sort it out eventually.” Eventually is doing a lot of work. The Nature Medicine editorial is the first institutional attempt to force the issue, but prospective LLM trials are years away. In the meantime, 40 million Americans are using chatbots daily for medical support. The evidence isn’t going to sort out what’s already happening.

💡 80/20: Your tool’s clinical evidence matters, but your tool’s deployment friction determines whether the evidence gets a chance to matter.

📡 Builder’s Radar

JAMA proposes licensing AI like licensing physicians — and the infrastructure doesn’t exist yet.

Bergman, Wachter, and Emanuel published a JAMA Viewpoint mapping autonomous clinical AI regulation onto physician credentialing: standardized exams, supervised deployment, scope of practice, time-limited certification, layered accountability, federal preemption. The framework is intellectually clean. The institutional substrate — exam boards, oversight agencies, verification infrastructure — does not exist. Two failure modes: regulatory capture by incumbents (USMLE/ABMS extending into AI governance) or paper credentialing that looks good but doesn’t constrain. The builder move is infrastructure: whoever builds the testing and validation layer builds the picks-and-shovels business underneath every autonomous clinical AI.

😤 Haters

“We don’t need another regulatory framework — we need the existing ones to work.” The existing frameworks weren’t designed for this. FDA device clearance assumes a fixed product; LLMs update continuously. State medical licensing assumes a human practitioner. The licensure analogy isn’t perfect, but the existing regulatory tools are worse.

“This will take a decade to implement.” Probably. Which is exactly why building the institutional substrate now — testing infrastructure, validation frameworks, deployment monitoring — is the opportunity. The regulation will eventually need these tools. Whoever has them built and proven when the regulation hardens owns the category.

Starbucks spent $600M learning what clinicians already know: the human is the scarce asset.

Brian Niccol invested $600M to put workers back in stores, calling AI “co-pilot, not replacement.” First positive US same-store sales in over a year. UChicago economist Alex Imas published the underlying economics: when AI drives commodity production to zero marginal cost, spending shifts to relationships and exclusivity. In experiments, people paid ~2x for identical items when others would be excluded. AI-generated art got half the exclusivity premium of human-made art. A “relational sector” emerges where the human IS the product — teachers, nurses, therapists.

😤 Haters

“Starbucks isn’t healthcare.” The economic mechanism is identical. Automate the visible work, and the invisible relational work becomes the scarce asset. The barista’s warmth. The physician’s presence. The 30 seconds of eye contact during a terrifying diagnosis. The question is whether health systems deploy AI to give clinicians more time for that, or to fill those 13 saved minutes with two more RVU-generating visits.

“This is just an argument against efficiency.” It’s an argument for knowing which kind of efficiency you’re optimizing. Throughput efficiency (more patients per hour) and relational efficiency (more trust per visit) are different metrics. The Starbucks lesson: optimizing for the first at the expense of the second costs you the customer.

💡 80/20: Build tools that give clinicians more time for the relational work, not tools that replace the relational work. Try: for every feature on your roadmap, ask — does this give 5 minutes back to the patient, or does this give 5 minutes back to the schedule? The answer determines whether you’re building for the relational sector or automating it away.

🛠️ Builders Tip

Invert your build’s assumptions before you ship the next feature.

Richard Hamming’s 1986 Bell Labs lecture identified a pattern that separates productive careers from busy ones: the willingness to invert constraints. Instead of asking “how do I solve this problem?” ask “what if the opposite were true?”

Here’s a copy-paste prompt you can run against any feature on your roadmap. Paste your feature spec (or a paragraph describing it) as [FEATURE_DESCRIPTION]:

You are a clinical informaticist reviewing a product feature for a clinician-built health tech tool. The feature is described below.

For each of these inversions, give me ONE concrete alternative that would be worth testing:

1. INVERT THE USER: If the primary user were the patient instead of the clinician (or vice versa), what would this feature look like?
2. INVERT THE WORKFLOW: If this feature ran BEFORE the visit instead of during/after it, what would change?
3. INVERT THE DATA DIRECTION: If this feature pushed information TO the clinician instead of pulling FROM them (or vice versa), what would be different?
4. INVERT THE AUTOMATION: If the part you're automating were kept manual, and the part that's currently manual were automated, would the tool be more valuable?
5. INVERT THE EVIDENCE: What would you build if you assumed the opposite clinical assumption were true?

Feature description:
[FEATURE_DESCRIPTION]

For each inversion, provide:
- The inverted version in one sentence
- Whether it's worth a 2-hour prototype (yes/no)
- Why or why not (2 sentences max)

Output as a markdown table.

What are you building this week? Reply and tell me — I read every one.

— Kevin

clinicians.build

Discussion about this post

Ready for more?