Automated Doubt 👀, the people layer raises $55M 🧑⚕️, Biometrics
Automated Doubt
“When code is free, saying no is our last defense.” — Wes McKinney
Wes McKinney — the person who wrote pandas — drew a hard line this week: vibe coding (one prompt, don’t read the code, ship it) is “dangerous and irresponsible.” Agentic engineering — heavy spec work, continuous review, a human who stays accountable — isn’t.
The disciplined version has a name now. Alex Self calls his “automated doubt”: a swarm of specialized critique agents that audit an artifact from different vantage points — “the way two eyes give you depth.”
When code is free to write, the scarce thing is the doubt — the structured, automated, never-skipped scrutiny that catches what the model got confidently wrong.
Here’s why this is the most clinical idea in software right now. You can vibe-code a prototype on synthetic data this weekend. You cannot vibe-code something that touches a patient.
The eval and the review aren’t overhead bolted onto the product. They are the product — which is the same conclusion the whole field keeps arriving at from every direction: the durable thing isn’t the model, it’s the test that proves the code is safe to trust.
😤 “This is just code review with a buzzword.” Partly. The novelty isn’t the idea of review — it’s that the review is automated, role-specialized, and run before you trust the output instead of after it breaks. A security reviewer, a test architect, and an “assumption excavator” catch different failures; one general reviewer catches fewer. That’s not a buzzword, that’s parallax.
😤 “You’re fear-mongering to slow down clinicians who finally got unblocked.” No — build fast. Just don’t confuse a working demo with a trustworthy tool. The whole point is that agentic engineering lets you move faster than careful hand-coding while keeping the part that matters. Speed isn’t the risk. Skipping the doubt is.
😤 “Forty critique agents? Nobody has that token budget.” McKinney runs ~$20k/month in tokens, sure. The solo version is one agent — Self says the single most universally useful one is the Assumption Excavator. You can run that on a synthetic case tonight for the price of a coffee.
[Every clinician already knows this one. The hard part of medicine was never doing the test — it was not doing the test that won’t change anything. Software just caught up to the stewardship problem we’ve had for a century.]
💡 80/20: Before you add the next feature to your AI-built tool, add one critique agent — the Assumption Excavator. Point it at a single synthetic case (Synthea, never PHI) and ask it to list every assumption your code is making that nobody wrote down. The findings are your spec for what to test next.
The People Layer Just Raised $55M
Stepful raised a $55M Series C led by Oak HC/FT to scale AI-powered training for allied-health roles — medical assistants, pharmacy techs, the workforce that actually runs a clinic.
They’ve graduated 32,000 practice-ready workers and work with 35+ systems including Mount Sinai, Ochsner, and Providence. The AI here doesn’t replace the worker — it makes the training cheaper, faster, and debt-free.
The bottleneck in care delivery was never the model. It’s the trained human standing next to the patient — and that’s where the capital is finally going.
😤 “Training isn’t a tech story.” It is when the constraint on every AI-in-the-clinic rollout is a staffed front desk and a rooming MA. You can automate the note and still not see the patient if nobody’s there to take vitals. Build for the people layer and you’re building for the actual rate-limiter.
A “Shopify for Independent Healthcare” Raises $24M
Klinic raised $24M for a behavioral-health and specialty-provider enablement platform — billing, intake, and patient acquisition as a stack the solo provider rents instead of builds.
The enablement layer is the quiet land grab: own the rails the independent clinician runs on and you own their practice without employing them.
😤 “Every health tech company calls itself the Shopify of something.” Fair, and most aren’t. The test is whether a solo therapist can actually launch on it in a week. If the answer is yes, the analogy earns itself; if it’s a six-month implementation, it’s an EHR wearing a hoodie.
💡 80/20: If your buyer is an independent provider, your competition isn’t the hospital — it’s their current duct-tape of Calendly, a fax line, and a billing service. Beat the duct tape, not the enterprise.
Ultra-short:
GoHealth filed for Chapter 11. The health-insurance broker is restructuring, handing ownership to lenders with 100% of them already voting yes. The Medicare-Advantage-broker model is getting squeezed — worth watching if your tool sells into that channel.
GitHub Copilot is moving to usage-based billing. Pay per token, not per seat. The era of flat-rate AI coding is ending — which makes the “route to the cheapest model that clears your eval” discipline a budget line, not a nicety.
Snowflake and Anthropic expanded their partnership. Claude is now wired across Snowflake’s Cortex AI. If your health system’s analytics already live in Snowflake, the model is moving to the data instead of the other way around — a quieter path to “AI on the warehouse” than standing up your own stack.
🛠️ From the Workbench
Roborev + Open Code Review — two takes on continuous, automated code review for AI-written code.
Roborev (McKinney’s) hooks your git repo so every commit gets auto-reviewed and graded low/medium/high, with role-specialized reviewers. Alibaba’s Open Code Review is an open-source line-level AI diff reviewer you can self-host.
This is the first story made concrete: the doubt as a tool, running on every commit, not a vibe you summon when you remember.
⚠️ Verify: Both are general-purpose dev tools, not clinical-grade and not validated on PHI workflows. Use them on your code, on synthetic data, in a personal repo — not as a substitute for clinical validation, and never pointed at a repo containing real patient data without security review.
💡 80/20: Add one auto-reviewer to a throwaway repo this week and watch it flag your own AI’s mistakes before you do. The point isn’t the tool — it’s feeling how much the model gets confidently wrong when nobody’s checking.
🎙️ From the Pods
🎙️ This Week Health — Newsday — “Major Biometric Breach, HIPAA Deadline Falls Flat, and the Microsoft AI Budget Blowout”
A breach at one of the country’s largest public health systems ran from November to February — and among the usual stolen records, the attackers took biometric data: fingerprints, palm prints, geotagged photos.
You can rotate a password. You cannot reissue a fingerprint.
💡 BTW: Wes McKinney built pandas in 2008 while researching credit and macro strategies at a hedge fund — AQR Capital Management — to wrangle financial data Python couldn’t handle yet. The name isn’t about the animal: it’s from “panel data,” the econometrics term, with a wink at “Python data analysis.” The most-used tool in data science started as one quant’s workaround for a spreadsheet problem.
What are you building this week? Email and tell me (kevin@clinicians.build) — I read every one.
— Kevin


