Another AI writes billing codes 🤔

Mar 28, 2026

OpenEvidence launches AI medical coding that writes your billing for you.

OpenEvidence launched Coding Intelligence this week — an AI tool that automatically generates ICD-10 diagnoses, E/M level recommendations with MDM rationale written into the note, and CPT code suggestions with RVU values for proper sequencing. It activates at the end of every visit in OpenEvidence Visits. The company closed a $250 million Series D in January at a $12 billion valuation. Axios called it the latest move in the race to become the dominant medical AI platform.

😤 Haters

“AI coding will just lead to more upcoding.” The sepsis-billing tripling story that lit up HTN Slack two weeks ago shows that coding incentives are already misaligned — with or without AI. The question is whether AI coding tools make the problem worse or make it more transparent. OpenEvidence ties its suggestions to clinical evidence, not documentation patterns. That’s a different approach than pattern-matching from notes.

“Physicians should understand their own billing.” They should, and most don’t — because the system has 70,000+ ICD-10 codes and reimbursement rules that change quarterly. The physician who built that emergency airway kit in her ED isn’t going to memorize CPT sequencing rules. The tool should handle it.

💡 80/20: If you’re building clinical tools, billing integration is where the money literally is. OpenEvidence’s approach — deriving codes from clinical evidence rather than note patterns — is worth studying. Try: look at your product’s output and ask: “Is there a billing code attached to this workflow?” If yes, surfacing it automatically creates immediate ROI for your users.

🛠️ From the Workbench

LangChain Deep Agents — open-source Claude Code, model-agnostic

LangChain released Deep Agents, an MIT-licensed agent harness that replicates the core architecture of Claude Code: planning tools, filesystem access, subagent spawning, and context management. The key difference: it works with any LLM that supports tool calling, including local models via Ollama. Install with pip install deepagents. The GitHub repo hit 9.9k stars within hours of the March update.

⚠️ Verify: Deep Agents is a development framework, not a healthcare product. It has no HIPAA compliance, no BAA, no audit logging out of the box. If you’re using it with patient data — even synthetic data for prototyping — you need to add those layers yourself. Don’t let “MIT licensed” make you forget about data governance.

😤 Haters

“Why not just use Claude Code directly?” If your workflow is entirely within the Anthropic ecosystem, Claude Code is probably better. Deep Agents matters for clinician-builders who want to use local models (MedGemma via Ollama, for instance) or need to run agents on infrastructure they fully control — which is everyone who touches PHI.

“Open-source agent frameworks are a dime a dozen.” Most of them are wrappers around API calls. Deep Agents ships with the architectural patterns that actually matter for long-running work: planning, context isolation, subagent delegation. It’s the plumbing, not the prompt.

💡 80/20: If you’ve been building CLI-based clinical tools with Claude Code, Deep Agents lets you swap in a local model for the same workflow — which matters the moment patient data enters the picture. Try: install it (pip install deepagents), point it at an Ollama instance, and give it a structured clinical task (generate a discharge summary from these notes). Compare the output to Claude Code.

🎯 Clinician-Builder Tip of the Day

Anthropic published a blog post this week on harness design for long-running apps — how they structure multi-agent systems for autonomous coding sessions. The core insight that translates directly to clinical AI: separate the generator from the evaluator. Their system uses a GAN-inspired architecture where one agent builds and a different agent critiques, because AI models are terrible at evaluating their own output. They tend to praise their own work even when it’s mediocre. Sound familiar? It’s the same failure mode as an AI scribe that generates a note and then “validates” it against itself. If you’re building any clinical tool where AI produces output that needs checking, use two models — or at minimum, two separate prompts with different instructions and different temperature settings. The critic should never share context with the creator.

What are you building this week? Reply and tell me — I read every one.

— Kevin

clinicians.build

Discussion about this post

Ready for more?