Week 23 — Jun 2 – Jun 8, 2026
Summary
Major ML investigation system sprint: three new subsystems (contracts, perception, reasoning) were built from scratch to form a three-track architecture for property inspection understanding. Data contracts (v0) define the interfaces between tracks using JSON Schema: capture-bundle (iOS → perception), scene-facts (perception → reasoning), and data-pile (progressive-markdown KB). The perception pipeline integrates SAM2 video segmentation (object masks + track IDs) and Cosmos 3 video-language model (room/surface captions + Q&A), merging their outputs into a validated scene_facts document — the first structured perception output. A multimodal LLM-judge hallucination metric cross-checks Cosmos captions against input frames. The reasoning module scaffolds a grounded Q&A runner over progressive-markdown knowledge bases with A/B eval comparing grounded vs video-only answers. A /health endpoint was added to the ML service.
17 code commits | 50+ new files | ~+3,200 lines
Highlights
ML Data Contracts (v0)
Three JSON Schema contracts that keep the investigation tracks aligned:
| Contract | Flow | Schema |
|---|---|---|
| capture-bundle | Track 1 (iOS capture) → Track 2 (perception) | capture-bundle.schema.json |
| scene-facts | Track 2 (perception) → Track 3 (reasoning) | scene-facts.schema.json |
| data-pile / KB | Tracks 1+2 → Track 3 (reasoning) | kb-frontmatter.schema.json |
All schemas are JSON Schema (draft 2020-12), ARKit world frame (right-handed, y-up, meters). A shared loader module handles schema loading, validation, and markdown front-matter parsing. The scene-facts contract was updated to v0.2.0 to make obb optional for 2D-only perception.
Perception Pipeline (SAM2 + Cosmos 3)
Two complementary perception models deployed as Modal apps:
- SAM2 (
grizzlebear-sam2-jh): video segmentation producing per-frame masks with persistent track IDs. Supports real-clip ingestion (keyframe extraction → volume upload → inference). - Cosmos 3 (
grizzlebear-cosmos-jh): NVIDIA video-language model (Cosmos Reasoner NIM) deployed as a Modal app. Produces room labels, surface descriptions, and scene Q&A answers.
The merge step (perception/merge.py) combines SAM2 bounding boxes with Cosmos captions into a fully populated scene_facts document. A hallucination judge cross-checks Cosmos outputs against input frames via a separate LLM call.
Reasoning Module
Stage 0 reasoning runner: loads a data pile (progressive-markdown KB), validates each doc's front-matter against the data-pile contract, answers questions via a stub LLM. Pure/local — no Modal, no network. Stage 1 adds an A/B eval comparing grounded (KB-augmented) vs video-only reasoning.
ML Health Endpoint
GET /health on the ML service returns liveness status. Includes a smoke test script at dev/ml/scripts/smoke_health.py.
Daily Breakdown
Jun 2 (17 code commits)
f4d4700ml(contracts): add v0 data contracts + validator (+988)7aaa2f4ml(contracts): add shared loader for v0 contracts (+86)da50e4fml(contracts): scene-facts 0.2.0 — obb optional for 2D-only perception (+58/-3)b76a6ceml(perception): scaffold perception harness skeleton (+319/-3)6a6a733ml(perception): integrate SAM2 masks + track IDs (+399)768a74dml(perception): real-clip ingestion for SAM2 eval (+200/-28)1d962eaml(perception): Cosmos 3 reasoner captions + scene Q&A eval (+457)4f74c41ml(perception): fix Cosmos NIM secret shape for nvcr pull (+19/-15)f47bd16ml(perception): deploy real Cosmos 3 NIM + record live eval (+75/-14)f9d7510ml(perception): add show_report viewer for Cosmos eval artifacts (+23)472c13cml(perception): multimodal LLM-judge hallucination metric (+112/-1)f8c35e8ml(perception): merge SAM2 + Cosmos into populated scene_facts (+224)53da940ml(reasoning): scaffold reasoning runner + stub LLM (+308)fa49511ml(reasoning): grounded-vs-video-only A/B eval (+185)593cab3ml(SH-0b): add /health endpoint + jh smoke test (+78)98e753fdocs(ml): add development plan (+112)
Modified Files (key changes)
ML Contracts
dev/ml/contracts/— new: 3 JSON Schema contracts (capture-bundle, scene-facts, data-pile) + shared loader + validator
ML Perception
dev/ml/perception/harness.py— new: Stage 0 perception harness skeletondev/ml/perception/models/sam2_model.py— new: SAM2 video segmentation wrapperdev/ml/perception/sam2_app.py— new: Modal app for SAM2 inferencedev/ml/perception/cosmos/client.py— new: Cosmos 3 NIM clientdev/ml/perception/cosmos/judge.py— new: LLM-judge hallucination metricdev/ml/perception/cosmos_nim_app.py— new: Cosmos 3 NIM Modal appdev/ml/perception/eval_cosmos.py— new: Cosmos evaluation harnessdev/ml/perception/merge.py— new: SAM2 + Cosmos → scene_facts mergedev/ml/perception/keyframes.py— new: keyframe extraction from videodev/ml/perception/storage.py— new: Modal volume storage helpers
ML Reasoning
dev/ml/reasoning/runner.py— new: data-pile loader + question answeringdev/ml/reasoning/stub_llm.py— new: stub LLM for Stage 0dev/ml/reasoning/ab_eval.py— new: grounded vs video-only A/B eval
ML Service
dev/ml/ml_endpoint.py—/healthendpointdev/ml/scripts/smoke_health.py— new: health endpoint smoke test