Harvard physicist, 2 weeks, 1 paper 🔭, XO Health goes national 🏥, GraftMD enters public beta 🛠️
🔬 The Big Thing
Harvard physicist Matthew Schwartz published a frontier physics paper in two weeks using only text prompts to Claude. The bottleneck was his taste.
Anthropic launched a Science Blog this week, and the inaugural post is the clearest external validation I’ve seen yet of the clinicians.build thesis. Matthew Schwartz, a professor of quantum field theory at Harvard and principal investigator at the NSF Institute for Artificial Intelligence and Fundamental Interactions, ran a controlled experiment: could he supervise Claude through a complete, publishable physics paper without ever editing a file himself? Text prompts only. No cutting and pasting his own calculations. The problem was what he called a “G2-level” project — the kind a second-year physics grad student would tackle, where the methods are established and the endpoint is clear, even if reaching it is technically brutal.
The results are worth sitting with. Claude Opus 4.5 completed the work in two weeks: 110 draft versions, 270 separate sessions, roughly 51,000 messages exchanged. What Schwartz estimates would have taken one to two years with a grad student, or three to five months working alone, came out the other side as a published factorization theorem in quantum chromodynamics. He calls it “the most important paper I’ve ever written — not for the physics, but for the method.”
But the part that matters for this newsletter is his honest accounting of what Claude got wrong. It said “verified” when it hadn’t actually checked. It adjusted parameters to make plots look right instead of fixing the underlying error. It invented plausible-sounding justifications for results it hadn’t derived. It stopped looking for errors after finding one. Left unsupervised, it would have produced something that looked correct and wasn’t. Schwartz caught all of it — and he could only catch it because he has spent decades in the field and knows the difference between a real derivation and a confident-sounding approximation.
His word for what made the difference is taste. “When solving problems is hard, the solution gets the glory,” he writes, “but when knowledge and technical strength are ubiquitous, it’s the taste to come up with good ideas that distinguishes great work.” He means the intangible sense of which research directions might lead somewhere — the judgment that separates a physicist who knows the field from an AI that has read everything about it. For clinicians, this maps precisely: the part of practice that’s hard to delegate isn’t the execution of a care plan. It’s knowing which patient is sicker than they look. Which chief complaint is masking something else. Which refill needs a second glance and which is genuinely routine. Claude can run the calculation. It can’t tell you which calculation matters.
Schwartz’s honest benchmark puts current AI at the G2 level — capable second-year-grad-student work. He thinks G3 (original, creative research) is roughly a year away. That’s not forever. But it does mean that the transition period — the one we’re in now — belongs to domain experts who understand how to direct these tools toward the right problems. The taste is yours. For now, that’s enough.
Anthropic Science Blog — Vibe physics: The AI grad student
XO Health is taking episode-based pricing national
Endpoints reported this week that XO Health is expanding its alternative health plan to self-insured employers across the country. The model is built around prospective episode-driven “care packages” — fixed-price bundles covering medical and pharmacy costs for defined episodes of care. CEO Swati Mathai told Endpoints the company has seen strong provider interest and expects to hit thousands of members by year-end.
For builders, episode-based payment creates integration surfaces that fee-for-service doesn’t. When care is priced as a bundle, you need tools that can track episode attribution, coordinate across providers within the episode, and surface real-time cost and quality data against the contract. That’s not what most EHRs were designed to do. There’s a tooling layer that needs to sit between the payment model and the clinical workflow, and right now most of it lives in spreadsheets. If you’re building at the intersection of RCM and clinical operations, XO’s national expansion is worth tracking — not because XO is a market-defining company yet, but because the payment model they’re scaling creates real demand for software that doesn’t exist in mature form.
Endpoints News — XO Health to expand its alternative health plan nationwide
🛠️ From the Workbench
GraftMD: HIPAA-compliant healthcare app builder, now in public beta
GraftMD is in public beta — a Lovable-style build tool built specifically for healthcare. Describe what you need in plain English; GraftMD generates production-ready React code with automatic HIPAA compliance scanning. One-click deployment, custom domains, version history, and built-in integrations with Epic, Athena, eClinicalWorks, Canvas, FHIR servers, Stedi for eligibility, and lab platforms. Built by the Cara team. Three thousand free credits to start, no credit card required.
This is the category of tool I’ve been expecting to arrive. The generic vibe-coding platforms — Lovable, Bolt, v0 — can get you to a prototype fast, but they don’t know what a business associate agreement is, and their default deployment targets don’t have healthcare-grade security built in. A healthcare-specific builder that handles compliance by default changes the calculus for what a clinician can build without becoming an expert in security architecture. The integration list is the tell: if you can build against a FHIR server and Stedi in the same tool without leaving the browser, the surface area of what a non-engineer can actually ship meaningfully expands. Worth an account. Obviously, who knows how well is works security-wise.
GraftMD — Build HIPAA-Compliant Healthcare Apps with AI
What are you building this week? Reply and tell me — I read every one.
— Kevin

