← All docs changelog/2026-W17.md

Week 17 — Apr 21-27, 2026

Summary

Two major initiatives this week: (1) config-driven model versioning overhaul (Apr 22) and (2) full-codebase logging standardization (Apr 23). The comparison page got vLLM container warmup controls and live markdown rendering. A codebase-wide cleanup pass fixed duplicate deps/imports, pinned unpinned packages, and addressed minor doc/script issues. The logging sweep introduced a centralized get_logger() factory and converted ~700 print() calls to structured logging across all services. Supabase secrets were split by environment on Apr 24 so dev/beta connect to the dev project and production uses prod. On Apr 25, the data service gained ETag CAS + per-key locking to eliminate silent data loss from concurrent location DB writes. On Apr 27, CLAUDE.md and worktree auto-bootstrap were added so new Claude sessions and worktrees are immediately functional.

A new static_site service (static.grizzlebear.io) centralizing 5 demo pages under a shared TradeSpark-themed shell was developed on a worktree branch (3 commits, +2,673/-1,134 lines) — merged to dev on Apr 30 (W18).

26 commits (dev) | 70+ files changed | +3,449 / -1,402 lines


Highlights

Config-Driven Model Versioning (Apr 22)

Replaced three overlapping version-resolution mechanisms with one: a single Modal Volume in main, mounted cross-env, with version pins declared in dev/core/model_versions.py. Each env specifies "latest" (resolved via _latest.json on Volume) or a pinned "vN" per slot. Removes CI preflight manifest jobs and the _current_{env}.json system.

vLLM Empty-Output Guard (Apr 22)

All three vLLM generate paths now check for empty output lists before indexing. Previously, edge cases (oversized prompt, OOM, tokenizer failure) caused IndexError → opaque 500. Now returns "" with a logged warning.

LiteRT Convert Endpoint Short-Circuit (Apr 22)

POST /ml/mobile/convert-litert now returns 501 immediately instead of spinning up an A10G container to hit NotImplementedError. Scaffold preserved for when litert-torch gains E-series support.

Trainer Download Guard (Apr 22)

Fixed a bug where a partial HuggingFace download (directory exists but contains no weight files) was treated as "already complete". The trainer now checks for actual .safetensors/.bin files.

Comparison Page: Warmup + Markdown Rendering (Apr 23)

Per-slot Start buttons spawn non-blocking modal.FunctionCall warmups, with live status polling (stopped/queued/loading/ready/error). Response streams now render as live-parsed markdown via marked + DOMPurify, with time-to-ready displayed after cold start completes.

Centralized Logging (Apr 23)

New dev/core/logging_config.py provides a shared get_logger(name) factory (reads LOG_LEVEL from env, defaults to DEBUG). Replaced duplicated 5-line boilerplate across 24 files with a single import. Then swept ~700 print() calls to structured logger.*() across all services (billing, capture, core, geocoding, livekit, ml, users, voices, websocket). Auth header logging now redacted; exceptions use logger.exception(). CLI entrypoints kept as print() for user-facing terminal output.

Codebase Cleanup (Apr 23)

Duplicate dependencies removed and unpinned packages pinned in requirements.txt. Duplicate and unused imports removed across 7 files. Integration test script ---output typo fixed. Duplicate README header disambiguated. build.log and .coverage* added to .gitignore.

Supabase Secrets Split by Environment (Apr 24)

supabase_secret_name is now selected at deploy time based on MODAL_ENVIRONMENT: mainSupabaseProd, everything else → SupabaseDev. Ensures dev/beta/personal environments use the dev Supabase project. Also removed a stale migration comment containing plaintext credentials from core.py (credentials still in git history — rotation tracked in IMPROVE.md).

Race-Safe Location DB Writes via ETag CAS (Apr 25)

Concurrent writes to the same location's SQLite DB silently lost rows — the download/modify/put cycle had no ETag check, so the last writer won. Two new layers protect writes: (1) S3 conditional writes (IfMatch ETag CAS) for cross-container correctness with retry on 412 PreconditionFailed, and (2) threading.Lock keyed by S3 key for same-container serialization. All write call sites migrated to the new with_location_db(location, fn) entry point. Includes 5 unit tests and a gated real-S3 chaos suite.

CLAUDE.md + Worktree Auto-Bootstrap (Apr 27)

Added project-level CLAUDE.md files documenting repo layout, scripts, venv, test scope, and deploy workflow. New worktrees auto-bootstrap on first Claude session via a SessionStart hook that symlinks dev/.venv, dev/.env, and .claude/settings.local.json from the main checkout.

Static Site Service (Apr 27 — worktree branch, not yet merged)

New dev/static_site/ module hosts demo HTML, markdown blog posts, and shared CSS/JS at static.grizzlebear.io. Uses a server-side wrapper (header + body + footer) with window.SITE_CONFIG injection for per-env service URLs. Client-side auth gate with localStorage tokens. All 5 existing demo pages migrated from their original services (model-proxy, ml-dashboard, ml-comparison, mobile-session, websocket-test) with 301 redirects from old routes. Rethemed to TradeSpark AI design tokens (light cool surfaces, navy/orange/gold palette).


Daily Breakdown

Apr 27 (3 commits — 1 code, 2 docs-only)

  • e2c5473 Add CLAUDE.md and worktree auto-bootstrap
  • 3e285c2 Refresh IMPROVE.md with 2026-04-26 review findings (docs-only)
  • 4aec68b Add Apr 24-25 changelogs, refresh weekly summary + architecture docs (docs-only)

Apr 25 (1 commit)

  • 161dd64 Add ETag CAS + per-key lock for race-safe location DB writes

Apr 24 (1 commit)

  • ce6c528 Split Supabase secrets by environment: SupabaseDev for non-main, SupabaseProd for main

Apr 23 (15 commits)

  • fc2b8d8 Add vLLM container warmup + live status polling to comparison page
  • 24ba25c Render response streams as live-parsed markdown in comparison page
  • c39b525 Show time-to-ready in comparison warmup label
  • e1dbb8c Fix triple-dash typo in integration test script
  • b7a1b4d Remove duplicate deps and pin unpinned packages in requirements.txt
  • 1756f48 Remove duplicate and unused imports across 7 files
  • 325bb13 Fix duplicate Setup header in README.md
  • a750367 Add build.log and .coverage* to .gitignore, untrack build.log
  • 132d9b2 Log Apr 16 changelog and refresh IMPROVE / W16 notes (docs-only)
  • 6bfae42 Document W17 + refresh architecture/services (docs-only)
  • d9d2104 Centralize logger setup via get_logger() factory
  • 21159dd Convert print() to logger in core/ and users/ (sweep 1/5)
  • 053c6fc Convert print() to logger in ml/ diagnostic function (sweep 2/5)
  • 55fc058 Convert print() to logger in billing/ and capture/ (sweep 3/5)
  • 0504247 Convert print() to logger in geocoding/ and voices/ (sweep 4/5)
  • 5d09743 Convert print() to logger in livekit_ts/ and websocket/ (sweep 5/5)
  • bffa045 Convert print() to logger in billing/stripe_client.py and config.py (final)

Apr 22 (3 commits)

  • 86034e2 Simplify model versioning — shared Volume, config-driven env pins
  • d202c02 Short-circuit convert-litert endpoint to skip GPU cold start
  • d14eecb Guard vLLM output indexing against empty results

New Files

File Purpose
CLAUDE.md Project memory: repo layout, scripts, venv, worktree convention, deploy workflow
dev/CLAUDE.md Pointer to root CLAUDE.md with dev/-specific notes
.claude/settings.json SessionStart hook, permission allowlist for Claude sessions
dev/.justfiles/setup-worktree.sh Worktree bootstrap: symlinks .venv, .env, settings.local.json
dev/core/model_versions.py Per-env model version pins + resolve_version() helper
dev/core/logging_config.py Shared get_logger(name) factory (env-driven log level)
dev/tests/test_data_service/test_concurrency.py Gated real-S3 chaos tests (parallel store + chunk upload)

Modified Files

Apr 25

  • dev/core/data.pywith_location_db CAS write wrapper, _download_location_db/_upload_location_db/_seed_location_db with ETag tracking, per-key threading.Lock, all write call sites migrated (+397/-105)
  • dev/data/sync.pysync_project_from_supabase migrated to with_location_db
  • dev/tests/test_data_service/test_data_service.py — CAS retry, exhaustion, seed race, lock serialization unit tests (+279)

Apr 24

  • dev/core/core.py — env-conditional supabase_secret_name (SupabaseProd for main, SupabaseDev otherwise), removed plaintext credential comment

Apr 22

  • dev/core/core.pyml_model_volume pinned to environment_name="main" for cross-env mount
  • dev/ml/ml_endpoint.py — convert-litert 501 short-circuit, vLLM output bounds checks, diagnose surfaces _latest.json
  • dev/ml/mobile/litert.py — manifest uses MODEL_VERSIONS instead of "newest in registry"
  • dev/ml/training/model_registry.pyMODEL_VERSIONS + _latest.json replaces _current_{env}.json
  • dev/ml/training/trainer.py_has_weight_files() guard for partial downloads
  • .gitlab-ci.yml — removed preflight_manifest_{beta,main} jobs

Apr 23 — Comparison page

  • dev/ml/comparison.html — warmup UI, markdown rendering, time-to-ready
  • dev/ml/ml_endpoint.pyserve_warmup, serve_warmup_status, serve_container_status endpoints

Apr 23 — Cleanup

  • requirements.txt — deduped + pinned
  • dev/integration-tests.sh---output--output
  • README.md — duplicate header fix
  • .gitignorebuild.log, .coverage*
  • 7 files — duplicate/unused import removal

Apr 23 — Logging standardization

  • 24 files — get_logger() adoption (replaced boilerplate)
  • dev/core/core.py, dev/core/mls.py — print→logger, auth header redaction
  • dev/users/users_endpoint.py — print→logger
  • dev/ml/ml_endpoint.py — print→logger in diagnostics
  • dev/billing/billing.py, billing_endpoint.py, config.py, stripe_client.py — print→logger
  • dev/capture/capture.py, capture_endpoint.py — print→logger
  • dev/geocoding/geocoding.py, geocoding_endpoint.py, map_3d_tiles.py — print→logger
  • dev/voices/voices.py, voices_endpoint.py — print→logger
  • 16 files in dev/livekit_ts/, dev/websocket/ — print→logger (~500+ calls)