← All docs changelog/2026-W16.md

Week 16 — Apr 14-20, 2026

Summary

Major vLLM serving milestone: all three Gemma 4 model slots now serve real inference via Modal .remote.aio() RPC. New iOS LiteRT download pipeline delivers on-device model bundles via presigned S3 URLs. Production hostname bug fixed for auth flow on comparison/dashboard pages.

7 commits (dev + jh) | 21 files changed | +2,737 / -865 lines


Highlights

Real vLLM Serving — All 3 Gemma 4 Slots (Apr 16)

Moved from stubs to live inference for all three Gemma 4 classes. Uses Modal's @modal.cls() + .remote.aio() pattern (not @web_server, which has 303 redirect issues). E2B runs on A10G (T4 lacks shared memory for Triton attention). min_containers=0 with 600s scaledown for cost control.

iOS LiteRT Download Pipeline (Apr 16)

New /ml/mobile/ endpoints let iOS apps fetch .litertlm model bundles. Base v0 bundles come from HuggingFace pre-converted repos. Manifest endpoint returns version, file size, sha256, and presigned S3 URL. Verified on dev: E2B (2.58 GB), E4B (3.65 GB).

Model Comparison UI (Apr 14)

New /comparison page for side-by-side streaming evaluation across all 6 models: Gemini, Claude, OpenAI (teachers) and TS_Modal, TS_mobile4B, TS_mobile2B (Gemma 4 slots). Each model has a version selector. Uses SSE multiplexing from chat.py.

Volume Environment Isolation Fix (Apr 14)

_current.json version pointer now namespaced as _current_{env}.json so dev/jh/main track active model versions independently on the shared Volume.

Production Hostname Fix (Apr 16)

USERS_BASE derivation in comparison.html and dashboard.html used .replace('ml-', 'users-') which silently failed on prod domains (no -env suffix), causing 404 on login. Fixed to handle both patterns.

SSE_ONLY Encryption Removal (Apr 14, jh branch)

Removed SSE_ONLY encryption type. Video recordings stored as PLAINTEXT until a proper per-account video encryption strategy is designed.


Daily Breakdown

Apr 14 (3 commits)

  • 2a7024a Add model comparison demo + fix Volume env isolation
  • b9b7b53 Add project documentation — architecture, service catalog, and changelogs
  • 9bc71f9 Remove SSE_ONLY encryption type — video encryption strategy TBD (jh branch)

Apr 16 (5 commits)

  • 3c68e07 Wire up real vLLM serving for TS_mobile2B (Gemma 4 E2B)
  • 2a7f598 Fix USERS_BASE hostname rewrite to work on prod domains
  • 40ccb75 Route E4B and 31B slots through vLLM providers
  • b0fd993 Add iOS LiteRT download pipeline
  • 4ac3982 docs and iterations/improvements logs

New Files

File Purpose
dev/ml/comparison.html Side-by-side streaming comparison UI for all 6 models
dev/ml/mobile/litert.py LiteRT download, upload, conversion pipeline for iOS

Modified Files

  • dev/core/core.pyaiohttp added to mlImage; transformers>=5.5.0 override
  • dev/core/data.py — ML artifact S3 helpers (store_ml_artifact, presigned_ml_artifact_get, etc.), SSE_ONLY removal
  • dev/ml/gateway/chat.py — Routes Claude, OpenAI, model slots; timing metadata
  • dev/ml/gateway/providers.py — Provider adapters for Claude/OpenAI/vLLM; all 3 slot dispatch
  • dev/ml/ml_endpoint.pyVllmE2B/VllmE4B/Vllm31B Modal classes; /comparison route; /ml/mobile/ endpoints; diagnostic tools
  • dev/ml/serving/vllm_server.py — Real vLLM inference with stub fallback; E4B/31B classes
  • dev/ml/training/model_registry.py — Env-aware _current_{env}.json; ModelVersion schema extensions
  • dev/ml/comparison.html — USERS_BASE hostname fix
  • dev/ml/dashboard.html — USERS_BASE hostname fix
  • specs/ML_PIPELINE_SPEC.md — Updated with env isolation docs, LiteRT pipeline