Week 17 — Apr 21-27, 2026
Summary
Two major initiatives this week: (1) config-driven model versioning overhaul (Apr 22) and (2) full-codebase logging standardization (Apr 23). The comparison page got vLLM container warmup controls and live markdown rendering. A codebase-wide cleanup pass fixed duplicate deps/imports, pinned unpinned packages, and addressed minor doc/script issues. The logging sweep introduced a centralized get_logger() factory and converted ~700 print() calls to structured logging across all services. Supabase secrets were split by environment on Apr 24 so dev/beta connect to the dev project and production uses prod. On Apr 25, the data service gained ETag CAS + per-key locking to eliminate silent data loss from concurrent location DB writes. On Apr 27, CLAUDE.md and worktree auto-bootstrap were added so new Claude sessions and worktrees are immediately functional.
A new static_site service (static.grizzlebear.io) centralizing 5 demo pages under a shared TradeSpark-themed shell was developed on a worktree branch (3 commits, +2,673/-1,134 lines) — merged to dev on Apr 30 (W18).
26 commits (dev) | 70+ files changed | +3,449 / -1,402 lines
Highlights
Config-Driven Model Versioning (Apr 22)
Replaced three overlapping version-resolution mechanisms with one: a single Modal Volume in main, mounted cross-env, with version pins declared in dev/core/model_versions.py. Each env specifies "latest" (resolved via _latest.json on Volume) or a pinned "vN" per slot. Removes CI preflight manifest jobs and the _current_{env}.json system.
vLLM Empty-Output Guard (Apr 22)
All three vLLM generate paths now check for empty output lists before indexing. Previously, edge cases (oversized prompt, OOM, tokenizer failure) caused IndexError → opaque 500. Now returns "" with a logged warning.
LiteRT Convert Endpoint Short-Circuit (Apr 22)
POST /ml/mobile/convert-litert now returns 501 immediately instead of spinning up an A10G container to hit NotImplementedError. Scaffold preserved for when litert-torch gains E-series support.
Trainer Download Guard (Apr 22)
Fixed a bug where a partial HuggingFace download (directory exists but contains no weight files) was treated as "already complete". The trainer now checks for actual .safetensors/.bin files.
Comparison Page: Warmup + Markdown Rendering (Apr 23)
Per-slot Start buttons spawn non-blocking modal.FunctionCall warmups, with live status polling (stopped/queued/loading/ready/error). Response streams now render as live-parsed markdown via marked + DOMPurify, with time-to-ready displayed after cold start completes.
Centralized Logging (Apr 23)
New dev/core/logging_config.py provides a shared get_logger(name) factory (reads LOG_LEVEL from env, defaults to DEBUG). Replaced duplicated 5-line boilerplate across 24 files with a single import. Then swept ~700 print() calls to structured logger.*() across all services (billing, capture, core, geocoding, livekit, ml, users, voices, websocket). Auth header logging now redacted; exceptions use logger.exception(). CLI entrypoints kept as print() for user-facing terminal output.
Codebase Cleanup (Apr 23)
Duplicate dependencies removed and unpinned packages pinned in requirements.txt. Duplicate and unused imports removed across 7 files. Integration test script ---output typo fixed. Duplicate README header disambiguated. build.log and .coverage* added to .gitignore.
Supabase Secrets Split by Environment (Apr 24)
supabase_secret_name is now selected at deploy time based on MODAL_ENVIRONMENT: main → SupabaseProd, everything else → SupabaseDev. Ensures dev/beta/personal environments use the dev Supabase project. Also removed a stale migration comment containing plaintext credentials from core.py (credentials still in git history — rotation tracked in IMPROVE.md).
Race-Safe Location DB Writes via ETag CAS (Apr 25)
Concurrent writes to the same location's SQLite DB silently lost rows — the download/modify/put cycle had no ETag check, so the last writer won. Two new layers protect writes: (1) S3 conditional writes (IfMatch ETag CAS) for cross-container correctness with retry on 412 PreconditionFailed, and (2) threading.Lock keyed by S3 key for same-container serialization. All write call sites migrated to the new with_location_db(location, fn) entry point. Includes 5 unit tests and a gated real-S3 chaos suite.
CLAUDE.md + Worktree Auto-Bootstrap (Apr 27)
Added project-level CLAUDE.md files documenting repo layout, scripts, venv, test scope, and deploy workflow. New worktrees auto-bootstrap on first Claude session via a SessionStart hook that symlinks dev/.venv, dev/.env, and .claude/settings.local.json from the main checkout.
Static Site Service (Apr 27 — worktree branch, not yet merged)
New dev/static_site/ module hosts demo HTML, markdown blog posts, and shared CSS/JS at static.grizzlebear.io. Uses a server-side wrapper (header + body + footer) with window.SITE_CONFIG injection for per-env service URLs. Client-side auth gate with localStorage tokens. All 5 existing demo pages migrated from their original services (model-proxy, ml-dashboard, ml-comparison, mobile-session, websocket-test) with 301 redirects from old routes. Rethemed to TradeSpark AI design tokens (light cool surfaces, navy/orange/gold palette).
Daily Breakdown
Apr 27 (3 commits — 1 code, 2 docs-only)
e2c5473Add CLAUDE.md and worktree auto-bootstrap3e285c2Refresh IMPROVE.md with 2026-04-26 review findings (docs-only)4aec68bAdd Apr 24-25 changelogs, refresh weekly summary + architecture docs (docs-only)
Apr 25 (1 commit)
161dd64Add ETag CAS + per-key lock for race-safe location DB writes
Apr 24 (1 commit)
ce6c528Split Supabase secrets by environment: SupabaseDev for non-main, SupabaseProd for main
Apr 23 (15 commits)
fc2b8d8Add vLLM container warmup + live status polling to comparison page24ba25cRender response streams as live-parsed markdown in comparison pagec39b525Show time-to-ready in comparison warmup labele1dbb8cFix triple-dash typo in integration test scriptb7a1b4dRemove duplicate deps and pin unpinned packages in requirements.txt1756f48Remove duplicate and unused imports across 7 files325bb13Fix duplicate Setup header in README.mda750367Add build.log and .coverage* to .gitignore, untrack build.log132d9b2Log Apr 16 changelog and refresh IMPROVE / W16 notes (docs-only)6bfae42Document W17 + refresh architecture/services (docs-only)d9d2104Centralize logger setup via get_logger() factory21159ddConvert print() to logger in core/ and users/ (sweep 1/5)053c6fcConvert print() to logger in ml/ diagnostic function (sweep 2/5)55fc058Convert print() to logger in billing/ and capture/ (sweep 3/5)0504247Convert print() to logger in geocoding/ and voices/ (sweep 4/5)5d09743Convert print() to logger in livekit_ts/ and websocket/ (sweep 5/5)bffa045Convert print() to logger in billing/stripe_client.py and config.py (final)
Apr 22 (3 commits)
86034e2Simplify model versioning — shared Volume, config-driven env pinsd202c02Short-circuit convert-litert endpoint to skip GPU cold startd14eecbGuard vLLM output indexing against empty results
New Files
| File | Purpose |
|---|---|
CLAUDE.md |
Project memory: repo layout, scripts, venv, worktree convention, deploy workflow |
dev/CLAUDE.md |
Pointer to root CLAUDE.md with dev/-specific notes |
.claude/settings.json |
SessionStart hook, permission allowlist for Claude sessions |
dev/.justfiles/setup-worktree.sh |
Worktree bootstrap: symlinks .venv, .env, settings.local.json |
dev/core/model_versions.py |
Per-env model version pins + resolve_version() helper |
dev/core/logging_config.py |
Shared get_logger(name) factory (env-driven log level) |
dev/tests/test_data_service/test_concurrency.py |
Gated real-S3 chaos tests (parallel store + chunk upload) |
Modified Files
Apr 25
dev/core/data.py—with_location_dbCAS write wrapper,_download_location_db/_upload_location_db/_seed_location_dbwith ETag tracking, per-keythreading.Lock, all write call sites migrated (+397/-105)dev/data/sync.py—sync_project_from_supabasemigrated towith_location_dbdev/tests/test_data_service/test_data_service.py— CAS retry, exhaustion, seed race, lock serialization unit tests (+279)
Apr 24
dev/core/core.py— env-conditionalsupabase_secret_name(SupabaseProdfor main,SupabaseDevotherwise), removed plaintext credential comment
Apr 22
dev/core/core.py—ml_model_volumepinned toenvironment_name="main"for cross-env mountdev/ml/ml_endpoint.py— convert-litert 501 short-circuit, vLLM output bounds checks, diagnose surfaces_latest.jsondev/ml/mobile/litert.py— manifest usesMODEL_VERSIONSinstead of "newest in registry"dev/ml/training/model_registry.py—MODEL_VERSIONS+_latest.jsonreplaces_current_{env}.jsondev/ml/training/trainer.py—_has_weight_files()guard for partial downloads.gitlab-ci.yml— removedpreflight_manifest_{beta,main}jobs
Apr 23 — Comparison page
dev/ml/comparison.html— warmup UI, markdown rendering, time-to-readydev/ml/ml_endpoint.py—serve_warmup,serve_warmup_status,serve_container_statusendpoints
Apr 23 — Cleanup
requirements.txt— deduped + pinneddev/integration-tests.sh—---output→--outputREADME.md— duplicate header fix.gitignore—build.log,.coverage*- 7 files — duplicate/unused import removal
Apr 23 — Logging standardization
- 24 files —
get_logger()adoption (replaced boilerplate) dev/core/core.py,dev/core/mls.py— print→logger, auth header redactiondev/users/users_endpoint.py— print→loggerdev/ml/ml_endpoint.py— print→logger in diagnosticsdev/billing/billing.py,billing_endpoint.py,config.py,stripe_client.py— print→loggerdev/capture/capture.py,capture_endpoint.py— print→loggerdev/geocoding/geocoding.py,geocoding_endpoint.py,map_3d_tiles.py— print→loggerdev/voices/voices.py,voices_endpoint.py— print→logger- 16 files in
dev/livekit_ts/,dev/websocket/— print→logger (~500+ calls)