← All docs changelog/2026-W23.md

Week 23 — Jun 2 – Jun 8, 2026

Summary

Major ML investigation system sprint: three new subsystems (contracts, perception, reasoning) were built from scratch to form a three-track architecture for property inspection understanding. Data contracts (v0) define the interfaces between tracks using JSON Schema: capture-bundle (iOS → perception), scene-facts (perception → reasoning), and data-pile (progressive-markdown KB). The perception pipeline integrates SAM2 video segmentation (object masks + track IDs) and Cosmos 3 video-language model (room/surface captions + Q&A), merging their outputs into a validated scene_facts document — the first structured perception output. A multimodal LLM-judge hallucination metric cross-checks Cosmos captions against input frames. The reasoning module scaffolds a grounded Q&A runner over progressive-markdown knowledge bases with A/B eval comparing grounded vs video-only answers. A /health endpoint was added to the ML service.

17 code commits | 50+ new files | ~+3,200 lines


Highlights

ML Data Contracts (v0)

Three JSON Schema contracts that keep the investigation tracks aligned:

Contract Flow Schema
capture-bundle Track 1 (iOS capture) → Track 2 (perception) capture-bundle.schema.json
scene-facts Track 2 (perception) → Track 3 (reasoning) scene-facts.schema.json
data-pile / KB Tracks 1+2 → Track 3 (reasoning) kb-frontmatter.schema.json

All schemas are JSON Schema (draft 2020-12), ARKit world frame (right-handed, y-up, meters). A shared loader module handles schema loading, validation, and markdown front-matter parsing. The scene-facts contract was updated to v0.2.0 to make obb optional for 2D-only perception.

Perception Pipeline (SAM2 + Cosmos 3)

Two complementary perception models deployed as Modal apps:

  • SAM2 (grizzlebear-sam2-jh): video segmentation producing per-frame masks with persistent track IDs. Supports real-clip ingestion (keyframe extraction → volume upload → inference).
  • Cosmos 3 (grizzlebear-cosmos-jh): NVIDIA video-language model (Cosmos Reasoner NIM) deployed as a Modal app. Produces room labels, surface descriptions, and scene Q&A answers.

The merge step (perception/merge.py) combines SAM2 bounding boxes with Cosmos captions into a fully populated scene_facts document. A hallucination judge cross-checks Cosmos outputs against input frames via a separate LLM call.

Reasoning Module

Stage 0 reasoning runner: loads a data pile (progressive-markdown KB), validates each doc's front-matter against the data-pile contract, answers questions via a stub LLM. Pure/local — no Modal, no network. Stage 1 adds an A/B eval comparing grounded (KB-augmented) vs video-only reasoning.

ML Health Endpoint

GET /health on the ML service returns liveness status. Includes a smoke test script at dev/ml/scripts/smoke_health.py.


Daily Breakdown

Jun 2 (17 code commits)

  • f4d4700 ml(contracts): add v0 data contracts + validator (+988)
  • 7aaa2f4 ml(contracts): add shared loader for v0 contracts (+86)
  • da50e4f ml(contracts): scene-facts 0.2.0 — obb optional for 2D-only perception (+58/-3)
  • b76a6ce ml(perception): scaffold perception harness skeleton (+319/-3)
  • 6a6a733 ml(perception): integrate SAM2 masks + track IDs (+399)
  • 768a74d ml(perception): real-clip ingestion for SAM2 eval (+200/-28)
  • 1d962ea ml(perception): Cosmos 3 reasoner captions + scene Q&A eval (+457)
  • 4f74c41 ml(perception): fix Cosmos NIM secret shape for nvcr pull (+19/-15)
  • f47bd16 ml(perception): deploy real Cosmos 3 NIM + record live eval (+75/-14)
  • f9d7510 ml(perception): add show_report viewer for Cosmos eval artifacts (+23)
  • 472c13c ml(perception): multimodal LLM-judge hallucination metric (+112/-1)
  • f8c35e8 ml(perception): merge SAM2 + Cosmos into populated scene_facts (+224)
  • 53da940 ml(reasoning): scaffold reasoning runner + stub LLM (+308)
  • fa49511 ml(reasoning): grounded-vs-video-only A/B eval (+185)
  • 593cab3 ml(SH-0b): add /health endpoint + jh smoke test (+78)
  • 98e753f docs(ml): add development plan (+112)

Modified Files (key changes)

ML Contracts

  • dev/ml/contracts/new: 3 JSON Schema contracts (capture-bundle, scene-facts, data-pile) + shared loader + validator

ML Perception

  • dev/ml/perception/harness.pynew: Stage 0 perception harness skeleton
  • dev/ml/perception/models/sam2_model.pynew: SAM2 video segmentation wrapper
  • dev/ml/perception/sam2_app.pynew: Modal app for SAM2 inference
  • dev/ml/perception/cosmos/client.pynew: Cosmos 3 NIM client
  • dev/ml/perception/cosmos/judge.pynew: LLM-judge hallucination metric
  • dev/ml/perception/cosmos_nim_app.pynew: Cosmos 3 NIM Modal app
  • dev/ml/perception/eval_cosmos.pynew: Cosmos evaluation harness
  • dev/ml/perception/merge.pynew: SAM2 + Cosmos → scene_facts merge
  • dev/ml/perception/keyframes.pynew: keyframe extraction from video
  • dev/ml/perception/storage.pynew: Modal volume storage helpers

ML Reasoning

  • dev/ml/reasoning/runner.pynew: data-pile loader + question answering
  • dev/ml/reasoning/stub_llm.pynew: stub LLM for Stage 0
  • dev/ml/reasoning/ab_eval.pynew: grounded vs video-only A/B eval

ML Service

  • dev/ml/ml_endpoint.py/health endpoint
  • dev/ml/scripts/smoke_health.pynew: health endpoint smoke test