← All docs architecture.md

Grizzlebear Architecture

Last updated: 2026-06-03

Grizzlebear is a multi-service Python application deployed on Modal (serverless). It powers the TradeSpark field-inspection platform with real-time communication, ML-assisted planning, data synchronization, and billing.

Service Map

Service Domain Pattern Purpose
API Gateway api-{env}.grizzlebear.io FastAPI router — fans out to all sub-apps
user_data_app (sub-app) Consolidated users + data service (registration, login, profiles, asset CRUD, sync)
low_priority_app (sub-app) Consolidated voices + geocoding + capture + static_site
LiveKit (sub-app) WebRTC rooms + agent worker for real-time video sessions
ML Gateway (sub-app) LLM routing, data capture, training pipeline, dashboard
Model Proxy model-{env}.grizzlebear.io Reverse proxy to on-prem vLLM/Ollama (Mac Mini via UDM firewall)
Static Site static-{env}.grizzlebear.io Centralized demo pages, deploy dashboard, shared TradeSpark theme, Markdown blog engine
tsweb tsweb-{env}.grizzlebear.io Quarantined tsweb-app Supabase integration — projects API, nightly scraper
CI/CD cicd env Modal-driven promotion DAG, webhook dispatcher, remote deploy/test runners

Production (main) drops the -{env} suffix: api.grizzlebear.io, data.grizzlebear.io, etc.

Infrastructure

                     +-----------+
  iOS / Web App ---->| Modal     |----> Supabase (auth + project data)
                     | (FastAPI) |----> S3 (blobs, configs, telemetry, ML data)
                     |           |----> Stripe (billing)
                     |           |----> ElevenLabs / OpenAI (TTS)
                     |           |----> Mapbox / Google Maps (geocoding)
                     |           |----> Gemini API (LLM)
                     |           |
                     |  Model    |----> Mac Mini on-prem (vLLM / Ollama)
                     |  Proxy    |     via static egress IP + UDM allowlist
                     +-----------+
                          |
                     Modal Volume (/root/models/) — shared cross-env, HuggingFace weights

Key Infrastructure Choices

  • Serverless compute: Modal — each service is an independent modal.App
  • Database: SQLite per-location (S3-backed; writes use ETag CAS + per-key locking via with_location_db for cross-container and same-container concurrency safety)
  • Blob storage: S3 via CloudBucketMount, partitioned by account/geo_prefix/location
  • Auth: Supabase (magic links, OTP) — replaced the original badauth module
  • Secrets: Modal Secrets dashboard + .env per environment. Supabase uses env-split secrets (SupabaseProd for main, SupabaseDev for all others). Temporary override (Apr 28): main is hardcoded to SupabaseDev while the new prod Supabase project is being restored (see IMPROVE.md H19)
  • CI/CD: Modal-driven promotion DAG (ci/webhook.py dispatcher) — dev -> beta (auto-test) -> main (manual gate). Deploy dashboard at static.grizzlebear.io/deploy-dashboard/
  • Docker bases: Heavy base images pre-baked to AWS ECR (core, livekit-server, livekit-agent, ml-training, session_to_splat). Modal images FROM the ECR bases to avoid repeated pip installs

Environments

Name Branch URL pattern Notes
main main *.grizzlebear.io Production, manual deploy gate
beta beta *-beta.grizzlebear.io Staging, auto-tested by CI
dev dev *-dev.grizzlebear.io Shared development
jh jh *-jh.grizzlebear.io Personal (Jeremiah)
rk rk *-rk.grizzlebear.io Personal (RK)
cc *-cc.grizzlebear.io Personal
fl *-fl.grizzlebear.io Personal

Brand & Design

The canonical TradeSpark Design System lives at /design/TradeSpark Design System/ (README, colors_and_type.css, preview components, JSX UI kit). It is wired as a Claude Code skill at .claude/skills/tradespark-design. All dev/static_site/ templates follow the system's tokens, voice rules, and eyebrow → .spark-text headline pattern.

Module Layout (dev/)

dev/
  app.py                  # Modal app entry point — includes all sub-apps
  core/                   # Shared: env config, URLs, secrets, DB, auth helpers
    core.py               #   Modal images (layered on ECR bases), volumes, secrets
    admin.py              #   TradesparkEmailAdmin dependency (email-allowlist gate via TS_ADMIN_EMAILS)
    model_versions.py     #   Per-env model version pins + resolve_version()
    logging_config.py     #   Shared get_logger() factory (env-driven LOG_LEVEL)
    markdown_gen.py       #   Project-to-markdown renderer
    notifier.py           #   Email notifications via Resend (key from Modal Secret)
  user_data_app/          # Consolidated Modal function: users + data
  users/                  # Registration, login, profiles
  data/                   # Asset CRUD, chunked upload, project sync
    data.py               #   SQLite operations
    data_endpoint.py      #   FastAPI routes
    sync.py               #   Supabase -> markdown sync (in-process dispatch)
  low_priority_app/       # Consolidated Modal function: voices + geocoding + capture + static_site
  voices/                 # TTS providers
  geocoding/              # Reverse geocoding + map images
  capture/                # AR data collection
  images/                 # Computer vision (Gemini segments)
  livekit_ts/             # WebRTC rooms + agent
  livekit-recorder/       # Session recording (Go) — AES-CBC encrypted writes to S3
  ml/                     # ML pipeline
    ml_endpoint.py        #   FastAPI routes (gateway, training, serving, data, /health)
    comparison.html       #   Side-by-side model comparison UI
    gateway/              #   LLM routing + streaming chat (Gemini, Claude, OpenAI, Ollama)
    contracts/            #   v0 data contracts (capture-bundle, scene-facts, data-pile) + shared loader + validator
    perception/           #   Track 2 — video → scene_facts (SAM2 segmentation + Cosmos 3 captions + merge)
      cosmos/             #     Cosmos 3 NIM client + hallucination judge
      models/             #     Model wrappers (noop, sam2_model)
      scripts/            #     Eval runners, clip upload, merge
    reasoning/            #   Track 3 — grounded Q&A over progressive-markdown KB
      fixtures/           #     Question sets + hand-authored fixture pile
      scripts/            #     Acceptance + A/B eval runners
    data_pipeline/        #   Supabase scraper, synthetic generator, converters
    training/             #   Model registry, training configs, trainer stubs
    serving/              #   vLLM inference via Modal @modal.cls() + .remote.aio() RPC
    mobile/               #   LiteRT on-device model distribution (HF → S3 → iOS)
    eval/                 #   Evaluation framework
  model_proxy/            # Reverse proxy to on-prem model servers
  websocket/              # Real-time session communication
  static_site/            # Centralized demo pages + blog + docs + deploy dashboard
    endpoint.py           #   FastAPI routes for all demo pages, blog, and docs
    deploy_dashboard.py   #   Deploy dashboard backend (/api/envs, promote, test, approve-prod)
    docs.py               #   Docs corpus loader (renders docs/ + IMPROVE.md at /docs/*)
    content.py            #   Markdown blog engine with YAML frontmatter
    assets/               #   Shared CSS (TradeSpark tokens) + JS (Site.apiFetch, serviceUrl)
    templates/demos/      #   8 demo pages: ml-comparison, ml-generation, ml-eval, ml-training, traction, mobile-session, model-proxy, websocket-test
    templates/deploy-dashboard/  # Deploy pipeline visualization
  queues/                 # Background job pipelines (Modal GPU functions)
    session_to_splat.py   #   Session recording → Gaussian splatting
    session_to_splat.video_3d_reconstruction.py  # Full video-to-3D: decrypt → ffmpeg → COLMAP → fastgs
    colmap_undistorted_sfm_export.sh  # COLMAP SfM with checkpoint-based preemption tolerance
    video_to_gsplat.sh    #   End-to-end ffmpeg → COLMAP → splatting
  tsweb/                  # Quarantined tsweb-app Supabase integration (tsweb.grizzlebear.io)
    endpoint.py           #   GET /projects (user-scoped, joins properties)
    queries.py            #   Supabase query functions (moved from core/)
    location.py           #   Location resolution (moved from core/)
    scraper.py            #   Nightly Supabase data scraper (moved from ml/data_pipeline/)
    scheduled.py          #   Modal cron for nightly scraper
    client.py             #   tsweb_supabase() client factory
  automation/             # Headless Claude runner (separate Modal app: grizzlebear-claude-runner)
    claude_runner.py      #   Modal app — runs /improve and /document skills on weekly cron
    hardening.py          #   Safety guards: path allowlists, pre-push hook, gitleaks, env scrub
  telemetry/              # Error tracking
  migrations/             # DB schema versioning
  automation/             # Headless Claude runner (separate Modal app)
    claude_runner.py      #   grizzlebear-claude-runner: runs /improve + /document via Claude CLI
    hardening.py          #   Path allowlist, pre-push hook, env scrub, gitleaks (no Modal imports)
  _archived/              # Deprecated modules (billing, devices, etc.)
ci/                       # Modal-driven CI/CD pipeline (lives at repo root)
  webhook.py              #   HTTP webhook dispatcher — promote, test, approve-prod
  deploy_in_modal.py      #   Remote Modal deploy function
  bruno_in_modal.py       #   Remote Bruno test runner (parallel tier-aware)
  _git_in_modal.py        #   Git operations inside Modal (merge, branch tips, env states)
  scheduled_cleanup.py    #   Cron: non-prod app cleanup + canary tests

ML Pipeline

The ML subsystem captures real user + synthetic data for fine-tuning Gemma 4 models, and runs a three-track investigation system for property inspection understanding.

Three model slots:

Slot Base Model Target Hardware Training Method
TS_Modal Gemma 4 31B (dense) H100 80GB (cloud) Full LoRA
TS_mobile4B Gemma 4 E4B A10G (cloud train + serve) -> mobile QLoRA 4-bit
TS_mobile2B Gemma 4 E2B A10G (cloud serve) -> mobile QLoRA 4-bit

Data flow:

  1. Nightly Supabase scraper captures project/task/inspection data
  2. ML Gateway logs all LLM request/response pairs to S3 JSONL
  3. Synthetic generator creates distillation datasets from Gemini outputs
  4. Format converters produce model-specific chat templates
  5. (Future) Unsloth trainer fine-tunes with LoRA/QLoRA

Model storage: A single Modal Volume (grizzlebear-model-weights) lives in the main environment and is mounted cross-env by every app. Weights are stored at /root/models/{slot}/{version}/. Active version per env is declared in dev/core/model_versions.py — either a pinned "vN" or "latest", where "latest" resolves via {slot}/_latest.json on the Volume (written by the trainer on successful runs).

Model comparison: /comparison page streams all 6 models (Gemini, Claude, OpenAI + 3 Gemma 4 slots) side-by-side via SSE multiplexing for evaluation and distillation quality checks.

Eval pipeline: Non-blocking multi-model evaluation via POST /ml/eval/run. Spawns one eval worker per model using Modal Dicts (eval_jobs/eval_runs) and the same spawn-and-poll pattern as training. Progress events flow from eval_runner.py through to the /demos/ml-eval dashboard (golden-set picker, comparison table, per-record diff drawer). Zombie self-heal probes running FunctionCalls on each status poll.

Three-Track Investigation System

The ML subsystem organizes property inspection understanding into three parallel tracks connected by data contracts:

Track 1 (iOS Capture) --[capture-bundle]--> Track 2 (Perception) --[scene-facts]--> Track 3 (Reasoning)
                                                                                         ^
                                                 [data-pile / KB] -----------------------+

Data Contracts (ml/contracts/): Three JSON Schema contracts (draft 2020-12, ARKit world frame) define the interfaces: capture-bundle (iOS capture manifest), scene-facts (structured perception output), and data-pile (progressive-markdown KB front-matter). A shared loader handles schema validation and markdown front-matter parsing.

Track 2 — Perception (ml/perception/): Turns video/capture bundles into structured scene_facts documents. Two complementary models deployed as separate Modal apps:

  • SAM2 (grizzlebear-sam2-{env}): Video segmentation producing per-frame masks with persistent track IDs
  • Cosmos 3 (grizzlebear-cosmos-{env}): NVIDIA video-language model (Cosmos Reasoner NIM) for room labels, surface descriptions, and scene Q&A
  • Merge: Combines SAM2 bounding boxes with Cosmos captions into a validated scene_facts document
  • Hallucination judge: LLM cross-check of Cosmos captions against input frames

Track 3 — Reasoning (ml/reasoning/): Grounded question-answering over a progressive-markdown knowledge base (the "data pile"). Loads and validates KB docs, answers questions via LLM. A/B eval compares grounded (KB-augmented) vs video-only reasoning quality.

Data Sync Architecture

Mobile app data flows through a session-start sync pattern:

Supabase (project data) --[sync-project]--> Location SQLite DB + S3 blob
                                                     |
iOS DataSyncService <---[list_assets + download]-----+
  • POST /v1/sync-project fetches project tree, generates markdown with YAML frontmatter
  • GET /v1/assets?projectId=all returns deduplicated asset list (window function prevents OOM)
  • Chunked upload/download supports large files (init -> upload chunks -> finalize)

CI/CD Pipeline

The promotion DAG runs inside Modal (ci/webhook.py dispatcher) and is exposed via the deploy dashboard at static.grizzlebear.io/deploy-dashboard/.

jh/rk/cc/fl ──[promote]──> dev ──[deploy+test]──> beta ──[deploy+test]──> main
                                                                              |
                                                              [manual approval gate]
                                                                              |
                                                              [deploy to main + canary tests]
  • Promote to dev: just promote jh dev or dashboard button → dispatch_promote_to_dev (inline: merge → deploy → test)
  • Promote to beta: dashboard button or just promote dev betadispatch_dev_to_beta (merge → deploy → test)
  • Approve prod: dashboard button (gated: only enabled when beta tests pass on current tip) or just approve-prod
  • Canary tests: daily at 14:00 UTC via ci/scheduled_cleanup.py::scheduled_main_test_cron

Test infrastructure uses parallel tier-aware Bruno runners (dev/test_app.sh locally, ci/bruno_in_modal.py in CI) with credentials sourced from dev/.env / Modal Secrets. Bruno collections are split into sub-folders (e.g. ML API → chat/data-files/eval/gateway/generation/health) for finer parallel fan-out, with per-request progress streaming.

Scheduled Automation

dev/automation/ deploys a separate Modal app (grizzlebear-claude-runner) that runs the /improve and /document Claude Code skills headless — isolated from the grizzlebear-api blast radius (own image with Node + Claude CLI + gitleaks, own secrets sourced from env=main). Output commits land on origin/dev with reserved prefixes: IMPROVE: (refreshes IMPROVE.md only) and DOCUMENT: (updates /docs/ only).

Safety is enforced in automation/hardening.py (pure, no Modal imports for easy audit): per-skill allowed-path checks, a git pre-push hook that blocks force/delete pushes and any push off dev, env-var scrubbing, and a gitleaks scan. Manual-only while in shake-out (uv run modal run -e jh automation/claude_runner.py::run_improve); cron schedule= kwargs are commented in place.

Related Documentation

  • specs/DESIGN_DOC.md — Original system design (covers early services; partially outdated)
  • specs/ML_PIPELINE_SPEC.md — Detailed ML pipeline specification (Gemma 4, training stack)
  • specs/BILLING_SPEC.md — Stripe integration patterns
  • specs/TASKS_SPEC.md — Task management domain model
  • dev/billing/README.md — Billing module setup
  • website/readme.md — Website (password reset UI)
  • localhost/readme.md — Local Docker development