NVIDIA opens surgery data 🔬

Mar 18, 2026

📡 Builder’s Radar

NVIDIA opened the first large-scale surgical robotics dataset — 700+ hours of OR video, public on HuggingFace.

At GTC this week, NVIDIA released Open-H-Embodiment, the first large-scale healthcare robotics dataset for training physical AI in clinical environments. The dataset includes 700+ hours of surgical video and procedural data, released publicly on HuggingFace. The announced goal is to accelerate surgical robotics and OR workflow AI at the foundation model layer — to give researchers building in this space the training data that previously only existed inside robot vendors’ proprietary systems.

The immediate clinician-builder relevance is narrow (most of us aren’t training surgical foundation models). But the pattern is important. What NVIDIA is doing for surgical robotics is what ImageNet did for computer vision and what The Pile did for language: creating a shared substrate that removes one of the major barriers to entry for an entire research category. If you’re doing research adjacent to surgical workflow, procedural documentation, or OR efficiency tools, this dataset is worth knowing about. And for everyone else: the fact that this kind of dataset is now public and on HuggingFace — not locked inside a vendor — signals something about the direction of healthcare AI infrastructure generally.

Open-H-Embodiment on HuggingFace

Stack Overflow data: domain expertise still decides whether developers trust AI-generated code.

A survey piece from the Stack Overflow blog surfaced yesterday and it’s worth reading alongside the Big Thing above. The finding: developers don’t uniformly trust AI-generated code. Trust correlates strongly with the developer’s own domain expertise — they trust AI output when they can evaluate it, and they distrust it when they can’t. In specialized domains where the reviewer lacks deep background, AI-generated code is essentially unverifiable to the person using it.

Run that finding through the clinician-builder lens: when a clinician-builder uses an AI coding agent to build a clinical tool, they are simultaneously the most qualified person to evaluate whether the clinical logic is correct and one of the least qualified to evaluate whether the software architecture is sound. The inverse is true for an engineer building the same tool. This is the trust gap that produces the failures Nate documented earlier this week — not because the tools are bad, but because domain expertise doesn’t transfer across the builder/evaluator split. It’s a design constraint, not a problem that goes away as the tools get better.

Stack Overflow — Domain Expertise Still Wanted

🛠️ From the Workbench

Mistral Small 4 — 119B MoE, GGUF available, runs locally on high-end consumer hardware.

Mistral released Mistral Small 4 (formally Mistral-Small-4-119B-2603) yesterday: a 119-billion-parameter mixture-of-experts model with 128 experts and 4 active at inference time, which means the effective compute cost per token is much smaller than the parameter count implies. It’s open-weight, Apache 2.0 licensed, and GGUF quantized versions are already on HuggingFace via the lmstudio-community repository — meaning it runs through LM Studio or llama.cpp on local hardware.

Mistral Small 4 announcement · GGUF on HuggingFace

What are you building this week? Reply and tell me — I read every one.

— Kevin

clinicians.build

Discussion about this post

Ready for more?