Your agent is 80% plumbing 🔧, Anthropic wants physician-devs 🩺, Another Gemma to download to LM Studio/Ollama 😉

The Claude Code leak reveals what production agent systems actually look like — and it's not the LLM call.

Apr 04, 2026

🔬 The Big Thing

Your Agent Is 80% Plumbing — The Claude Code Leak Reveals What Production Clinical AI Actually Requires

A detailed analysis of the accidentally published Claude Code source (1,902 files, 512K+ lines) mapped 12 infrastructure primitives that make up the 80% of the system that isn’t the LLM call. Session persistence. Permission pipelines. Context budget management. An 18-module security stack for a single shell command. Error recovery. Developers have already ported the harness to Python and Rust — the patterns are structural, not Anthropic-specific.

😤 Haters

“This is just good engineering 101 — anyone shipping production software knows you need session management and error handling.” Fair. But the gap between “knows you need it” and “has actually built it for an agent system” is where every clinical AI demo dies. Most tutorials stop at the prompt. These patterns are documented now. Use them.

“Clinician-builders don’t need to worry about 18-module security stacks — that’s for enterprise teams.” If your agent touches a medication list, you need permission scoping. If it runs across a multi-patient session, you need context budgets. The scale is different. The primitives aren’t optional.

“The leak was irresponsible and Anthropic should be embarrassed.” Two leaks in one week from a company shipping at Anthropic’s velocity. Development speed outrunning operational discipline is a pattern every fast-moving builder should learn from, not just observe.

💡 80/20: The LLM is the easy part. Your next build session: audit your agent for crash recovery, permission scoping, and context overflow handling. If any of those are missing, that’s why it works in the demo but not in the clinic. Try: run the architecture audit prompt against your own agent stack.

→ Full write-up

📡 Builder’s Radar

Anthropic Launches “Claude Code for Healthcare” Webinar — First Major AI Lab Targeting Physician-Developers

Anthropic announced a webinar for April 23 focused on physicians building with Claude Code — live demos, safety verification, compliance traceability, and Q&A with the Claude Code team. Graham Walker (MDCalc founder) amplified it. This is the first time a major AI lab has created a developer event specifically for clinicians who build. Not executives learning about AI. Physicians writing code with agents.

😤 Haters

“It’s a marketing webinar, not a product launch.” Probably. But the framing matters — Anthropic is acknowledging physician-developers as a category worth targeting. That’s a market signal even if the content is surface-level.

“One webinar doesn’t mean they’ll build healthcare-specific features.” True. Watch for follow-through: FHIR-aware tools, HIPAA-scoped permissions, clinical audit logging. The webinar is the signal. The product roadmap is what matters.

💡 80/20: If you’re building with Claude Code for clinical use cases, register and come with specific technical questions about auditability and compliance traceability. The Q&A is where the real value lives. Try: prepare one question about how Claude Code handles PHI in agent workflows.

→ Full write-up

Gemma 4 Ships Under Apache 2.0 — Four Sizes, Edge to 31B

Google released Gemma 4 in four sizes: an edge model for Raspberry Pi, up to a 31B dense model ranked #3 among open models. First Gemma under Apache 2.0. Optimized for reasoning and agent workflows. NVIDIA announced RTX acceleration for local deployment.

😤 Haters

“Another open model release — the benchmarks all look the same.” The license matters more than the benchmarks here. Apache 2.0 means health systems can fine-tune and deploy without legal review of restrictive license terms. That’s a real barrier removed for clinical use.

“MedGemma still isn’t clinical grade — Google said so themselves.” Right. But Gemma 4’s improved base capabilities should translate to better clinical fine-tunes. And the edge model running on a Raspberry Pi opens up point-of-care possibilities that need zero cloud connectivity and zero data exfiltration.

💡 80/20: If you’re running Ollama or LM Studio, watch for GGUF quantizations of the 31B model. The edge model (E2B) is worth testing for simple clinical NLP tasks — triage classification, symptom extraction — running entirely on-device. Try: pull the 31B when available and benchmark it against your current local model on a clinical task you care about.

→ Full write-up

FDA Raises the Bar: AI “Breakthroughs” Must Solve Problems Physicians Can’t

A STAT analysis shows the FDA is shifting what qualifies as a breakthrough AI device. Early designations went to tools improving physician performance. Now the agency favors algorithms that solve problems physicians cannot address independently — detecting multiple cancers from single images, predicting mortality from signals no human could integrate.

😤 Haters

“This just makes the breakthrough pathway harder for startups.” It clarifies the pathway. Your clinical decision support tool probably doesn’t belong in breakthrough designation anyway — it belongs in the 510(k) lane or outside device regulation entirely.

“The FDA is moving goalposts.” The FDA is doing what regulators should do — raising the bar as the technology matures. “Better than a doctor at this one thing” was appropriate when AI radiology was novel. “Does something no doctor can do” is the right bar now.

💡 80/20: If you’re building a clinician-facing tool, this clarifies your regulatory strategy. Most clinician-built tools (documentation, decision support, workflow automation) don’t need breakthrough designation. The FDA is telling you where the high bar is so you can plan accordingly. Reframe: regulatory clarity is a gift, not a barrier.

→ Full write-up

Happy spring.

— Kevin

clinicians.build

Discussion about this post

Ready for more?