Grizzlebear Architecture
Last updated: 2026-06-03
Grizzlebear is a multi-service Python application deployed on Modal (serverless). It powers the TradeSpark field-inspection platform with real-time communication, ML-assisted planning, data synchronization, and billing.
Service Map
| Service | Domain Pattern | Purpose |
|---|---|---|
| API Gateway | api-{env}.grizzlebear.io |
FastAPI router — fans out to all sub-apps |
| user_data_app | (sub-app) | Consolidated users + data service (registration, login, profiles, asset CRUD, sync) |
| low_priority_app | (sub-app) | Consolidated voices + geocoding + capture + static_site |
| LiveKit | (sub-app) | WebRTC rooms + agent worker for real-time video sessions |
| ML Gateway | (sub-app) | LLM routing, data capture, training pipeline, dashboard |
| Model Proxy | model-{env}.grizzlebear.io |
Reverse proxy to on-prem vLLM/Ollama (Mac Mini via UDM firewall) |
| Static Site | static-{env}.grizzlebear.io |
Centralized demo pages, deploy dashboard, shared TradeSpark theme, Markdown blog engine |
| tsweb | tsweb-{env}.grizzlebear.io |
Quarantined tsweb-app Supabase integration — projects API, nightly scraper |
| CI/CD | cicd env |
Modal-driven promotion DAG, webhook dispatcher, remote deploy/test runners |
Production (main) drops the -{env} suffix: api.grizzlebear.io, data.grizzlebear.io, etc.
Infrastructure
+-----------+
iOS / Web App ---->| Modal |----> Supabase (auth + project data)
| (FastAPI) |----> S3 (blobs, configs, telemetry, ML data)
| |----> Stripe (billing)
| |----> ElevenLabs / OpenAI (TTS)
| |----> Mapbox / Google Maps (geocoding)
| |----> Gemini API (LLM)
| |
| Model |----> Mac Mini on-prem (vLLM / Ollama)
| Proxy | via static egress IP + UDM allowlist
+-----------+
|
Modal Volume (/root/models/) — shared cross-env, HuggingFace weights
Key Infrastructure Choices
- Serverless compute: Modal — each service is an independent
modal.App - Database: SQLite per-location (S3-backed; writes use ETag CAS + per-key locking via
with_location_dbfor cross-container and same-container concurrency safety) - Blob storage: S3 via
CloudBucketMount, partitioned byaccount/geo_prefix/location - Auth: Supabase (magic links, OTP) — replaced the original
badauthmodule - Secrets: Modal Secrets dashboard +
.envper environment. Supabase uses env-split secrets (SupabaseProdfor main,SupabaseDevfor all others). Temporary override (Apr 28): main is hardcoded toSupabaseDevwhile the new prod Supabase project is being restored (seeIMPROVE.mdH19) - CI/CD: Modal-driven promotion DAG (
ci/webhook.pydispatcher) —dev->beta(auto-test) ->main(manual gate). Deploy dashboard atstatic.grizzlebear.io/deploy-dashboard/ - Docker bases: Heavy base images pre-baked to AWS ECR (core, livekit-server, livekit-agent, ml-training, session_to_splat). Modal images
FROMthe ECR bases to avoid repeated pip installs
Environments
| Name | Branch | URL pattern | Notes |
|---|---|---|---|
main |
main |
*.grizzlebear.io |
Production, manual deploy gate |
beta |
beta |
*-beta.grizzlebear.io |
Staging, auto-tested by CI |
dev |
dev |
*-dev.grizzlebear.io |
Shared development |
jh |
jh |
*-jh.grizzlebear.io |
Personal (Jeremiah) |
rk |
rk |
*-rk.grizzlebear.io |
Personal (RK) |
cc |
— | *-cc.grizzlebear.io |
Personal |
fl |
— | *-fl.grizzlebear.io |
Personal |
Brand & Design
The canonical TradeSpark Design System lives at /design/TradeSpark Design System/ (README, colors_and_type.css, preview components, JSX UI kit). It is wired as a Claude Code skill at .claude/skills/tradespark-design. All dev/static_site/ templates follow the system's tokens, voice rules, and eyebrow → .spark-text headline pattern.
Module Layout (dev/)
dev/
app.py # Modal app entry point — includes all sub-apps
core/ # Shared: env config, URLs, secrets, DB, auth helpers
core.py # Modal images (layered on ECR bases), volumes, secrets
admin.py # TradesparkEmailAdmin dependency (email-allowlist gate via TS_ADMIN_EMAILS)
model_versions.py # Per-env model version pins + resolve_version()
logging_config.py # Shared get_logger() factory (env-driven LOG_LEVEL)
markdown_gen.py # Project-to-markdown renderer
notifier.py # Email notifications via Resend (key from Modal Secret)
user_data_app/ # Consolidated Modal function: users + data
users/ # Registration, login, profiles
data/ # Asset CRUD, chunked upload, project sync
data.py # SQLite operations
data_endpoint.py # FastAPI routes
sync.py # Supabase -> markdown sync (in-process dispatch)
low_priority_app/ # Consolidated Modal function: voices + geocoding + capture + static_site
voices/ # TTS providers
geocoding/ # Reverse geocoding + map images
capture/ # AR data collection
images/ # Computer vision (Gemini segments)
livekit_ts/ # WebRTC rooms + agent
livekit-recorder/ # Session recording (Go) — AES-CBC encrypted writes to S3
ml/ # ML pipeline
ml_endpoint.py # FastAPI routes (gateway, training, serving, data, /health)
comparison.html # Side-by-side model comparison UI
gateway/ # LLM routing + streaming chat (Gemini, Claude, OpenAI, Ollama)
contracts/ # v0 data contracts (capture-bundle, scene-facts, data-pile) + shared loader + validator
perception/ # Track 2 — video → scene_facts (SAM2 segmentation + Cosmos 3 captions + merge)
cosmos/ # Cosmos 3 NIM client + hallucination judge
models/ # Model wrappers (noop, sam2_model)
scripts/ # Eval runners, clip upload, merge
reasoning/ # Track 3 — grounded Q&A over progressive-markdown KB
fixtures/ # Question sets + hand-authored fixture pile
scripts/ # Acceptance + A/B eval runners
data_pipeline/ # Supabase scraper, synthetic generator, converters
training/ # Model registry, training configs, trainer stubs
serving/ # vLLM inference via Modal @modal.cls() + .remote.aio() RPC
mobile/ # LiteRT on-device model distribution (HF → S3 → iOS)
eval/ # Evaluation framework
model_proxy/ # Reverse proxy to on-prem model servers
websocket/ # Real-time session communication
static_site/ # Centralized demo pages + blog + docs + deploy dashboard
endpoint.py # FastAPI routes for all demo pages, blog, and docs
deploy_dashboard.py # Deploy dashboard backend (/api/envs, promote, test, approve-prod)
docs.py # Docs corpus loader (renders docs/ + IMPROVE.md at /docs/*)
content.py # Markdown blog engine with YAML frontmatter
assets/ # Shared CSS (TradeSpark tokens) + JS (Site.apiFetch, serviceUrl)
templates/demos/ # 8 demo pages: ml-comparison, ml-generation, ml-eval, ml-training, traction, mobile-session, model-proxy, websocket-test
templates/deploy-dashboard/ # Deploy pipeline visualization
queues/ # Background job pipelines (Modal GPU functions)
session_to_splat.py # Session recording → Gaussian splatting
session_to_splat.video_3d_reconstruction.py # Full video-to-3D: decrypt → ffmpeg → COLMAP → fastgs
colmap_undistorted_sfm_export.sh # COLMAP SfM with checkpoint-based preemption tolerance
video_to_gsplat.sh # End-to-end ffmpeg → COLMAP → splatting
tsweb/ # Quarantined tsweb-app Supabase integration (tsweb.grizzlebear.io)
endpoint.py # GET /projects (user-scoped, joins properties)
queries.py # Supabase query functions (moved from core/)
location.py # Location resolution (moved from core/)
scraper.py # Nightly Supabase data scraper (moved from ml/data_pipeline/)
scheduled.py # Modal cron for nightly scraper
client.py # tsweb_supabase() client factory
automation/ # Headless Claude runner (separate Modal app: grizzlebear-claude-runner)
claude_runner.py # Modal app — runs /improve and /document skills on weekly cron
hardening.py # Safety guards: path allowlists, pre-push hook, gitleaks, env scrub
telemetry/ # Error tracking
migrations/ # DB schema versioning
automation/ # Headless Claude runner (separate Modal app)
claude_runner.py # grizzlebear-claude-runner: runs /improve + /document via Claude CLI
hardening.py # Path allowlist, pre-push hook, env scrub, gitleaks (no Modal imports)
_archived/ # Deprecated modules (billing, devices, etc.)
ci/ # Modal-driven CI/CD pipeline (lives at repo root)
webhook.py # HTTP webhook dispatcher — promote, test, approve-prod
deploy_in_modal.py # Remote Modal deploy function
bruno_in_modal.py # Remote Bruno test runner (parallel tier-aware)
_git_in_modal.py # Git operations inside Modal (merge, branch tips, env states)
scheduled_cleanup.py # Cron: non-prod app cleanup + canary tests
ML Pipeline
The ML subsystem captures real user + synthetic data for fine-tuning Gemma 4 models, and runs a three-track investigation system for property inspection understanding.
Three model slots:
| Slot | Base Model | Target Hardware | Training Method |
|---|---|---|---|
TS_Modal |
Gemma 4 31B (dense) | H100 80GB (cloud) | Full LoRA |
TS_mobile4B |
Gemma 4 E4B | A10G (cloud train + serve) -> mobile | QLoRA 4-bit |
TS_mobile2B |
Gemma 4 E2B | A10G (cloud serve) -> mobile | QLoRA 4-bit |
Data flow:
- Nightly Supabase scraper captures project/task/inspection data
- ML Gateway logs all LLM request/response pairs to S3 JSONL
- Synthetic generator creates distillation datasets from Gemini outputs
- Format converters produce model-specific chat templates
- (Future) Unsloth trainer fine-tunes with LoRA/QLoRA
Model storage: A single Modal Volume (grizzlebear-model-weights) lives in the main environment and is mounted cross-env by every app. Weights are stored at /root/models/{slot}/{version}/. Active version per env is declared in dev/core/model_versions.py — either a pinned "vN" or "latest", where "latest" resolves via {slot}/_latest.json on the Volume (written by the trainer on successful runs).
Model comparison: /comparison page streams all 6 models (Gemini, Claude, OpenAI + 3 Gemma 4 slots) side-by-side via SSE multiplexing for evaluation and distillation quality checks.
Eval pipeline: Non-blocking multi-model evaluation via POST /ml/eval/run. Spawns one eval worker per model using Modal Dicts (eval_jobs/eval_runs) and the same spawn-and-poll pattern as training. Progress events flow from eval_runner.py through to the /demos/ml-eval dashboard (golden-set picker, comparison table, per-record diff drawer). Zombie self-heal probes running FunctionCalls on each status poll.
Three-Track Investigation System
The ML subsystem organizes property inspection understanding into three parallel tracks connected by data contracts:
Track 1 (iOS Capture) --[capture-bundle]--> Track 2 (Perception) --[scene-facts]--> Track 3 (Reasoning)
^
[data-pile / KB] -----------------------+
Data Contracts (ml/contracts/): Three JSON Schema contracts (draft 2020-12, ARKit world frame) define the interfaces: capture-bundle (iOS capture manifest), scene-facts (structured perception output), and data-pile (progressive-markdown KB front-matter). A shared loader handles schema validation and markdown front-matter parsing.
Track 2 — Perception (ml/perception/): Turns video/capture bundles into structured scene_facts documents. Two complementary models deployed as separate Modal apps:
- SAM2 (
grizzlebear-sam2-{env}): Video segmentation producing per-frame masks with persistent track IDs - Cosmos 3 (
grizzlebear-cosmos-{env}): NVIDIA video-language model (Cosmos Reasoner NIM) for room labels, surface descriptions, and scene Q&A - Merge: Combines SAM2 bounding boxes with Cosmos captions into a validated
scene_factsdocument - Hallucination judge: LLM cross-check of Cosmos captions against input frames
Track 3 — Reasoning (ml/reasoning/): Grounded question-answering over a progressive-markdown knowledge base (the "data pile"). Loads and validates KB docs, answers questions via LLM. A/B eval compares grounded (KB-augmented) vs video-only reasoning quality.
Data Sync Architecture
Mobile app data flows through a session-start sync pattern:
Supabase (project data) --[sync-project]--> Location SQLite DB + S3 blob
|
iOS DataSyncService <---[list_assets + download]-----+
POST /v1/sync-projectfetches project tree, generates markdown with YAML frontmatterGET /v1/assets?projectId=allreturns deduplicated asset list (window function prevents OOM)- Chunked upload/download supports large files (init -> upload chunks -> finalize)
CI/CD Pipeline
The promotion DAG runs inside Modal (ci/webhook.py dispatcher) and is exposed via the deploy dashboard at static.grizzlebear.io/deploy-dashboard/.
jh/rk/cc/fl ──[promote]──> dev ──[deploy+test]──> beta ──[deploy+test]──> main
|
[manual approval gate]
|
[deploy to main + canary tests]
- Promote to dev:
just promote jh devor dashboard button →dispatch_promote_to_dev(inline: merge → deploy → test) - Promote to beta: dashboard button or
just promote dev beta→dispatch_dev_to_beta(merge → deploy → test) - Approve prod: dashboard button (gated: only enabled when beta tests pass on current tip) or
just approve-prod - Canary tests: daily at 14:00 UTC via
ci/scheduled_cleanup.py::scheduled_main_test_cron
Test infrastructure uses parallel tier-aware Bruno runners (dev/test_app.sh locally, ci/bruno_in_modal.py in CI) with credentials sourced from dev/.env / Modal Secrets. Bruno collections are split into sub-folders (e.g. ML API → chat/data-files/eval/gateway/generation/health) for finer parallel fan-out, with per-request progress streaming.
Scheduled Automation
dev/automation/ deploys a separate Modal app (grizzlebear-claude-runner) that runs the /improve and /document Claude Code skills headless — isolated from the grizzlebear-api blast radius (own image with Node + Claude CLI + gitleaks, own secrets sourced from env=main). Output commits land on origin/dev with reserved prefixes: IMPROVE: (refreshes IMPROVE.md only) and DOCUMENT: (updates /docs/ only).
Safety is enforced in automation/hardening.py (pure, no Modal imports for easy audit): per-skill allowed-path checks, a git pre-push hook that blocks force/delete pushes and any push off dev, env-var scrubbing, and a gitleaks scan. Manual-only while in shake-out (uv run modal run -e jh automation/claude_runner.py::run_improve); cron schedule= kwargs are commented in place.
Related Documentation
specs/DESIGN_DOC.md— Original system design (covers early services; partially outdated)specs/ML_PIPELINE_SPEC.md— Detailed ML pipeline specification (Gemma 4, training stack)specs/BILLING_SPEC.md— Stripe integration patternsspecs/TASKS_SPEC.md— Task management domain modeldev/billing/README.md— Billing module setupwebsite/readme.md— Website (password reset UI)localhost/readme.md— Local Docker development