← All docs changelog/2026-04-13.md

Apr 13, 2026

1 commit | Theme: ML pipeline scaffolding

Changes

New Feature

Gemma 4 Training/Serving/Mobile Pipeline Stubs (1b42581)

Scaffolded Phases 3-5 of the ML pipeline specification for three Gemma 4 model slots:

Slot Base Model Hardware Method
TS_Modal Gemma 4 31B (dense) H100 80GB Full LoRA
TS_mobile4B Gemma 4 E4B A10G QLoRA 4-bit
TS_mobile2B Gemma 4 E2B T4 QLoRA 4-bit

New files:

  • ml/training/model_registry.pyModelSlot enum, version tracking, S3 registry (ml[-env]/models/registry.json)
  • ml/training/training_config.py — Per-slot LoRA/QLoRA training hyperparameters
  • ml/training/trainer.py — HuggingFace model download to Modal Volume + Unsloth training stubs
  • ml/serving/vllm_server.py — vLLM inference stub with guided_json for schema enforcement
  • ml/serving/ab_router.py — A/B version routing stub (canary + shadow traffic)
  • ml/mobile/quantize.py — GGUF export + mobile latency/memory benchmark stubs

Modified:

  • core/core.py — Added mlTrainingImage (CUDA + Unsloth + vLLM), Modal Volume at /root/models/, HuggingFace secret
  • core/__init__.py — Export new ML symbols
  • ml/ml_endpoint.py — 8 new endpoints for training start/status, serving models/promote, model download/versions, mobile quantize/benchmark; Volume mount; GPU-class Modal functions
  • specs/ML_PIPELINE_SPEC.md — Updated from Gemma 3 to Gemma 4, added model selection rationale and versioning strategy

Model storage: Modal Volume at /root/models/{slot}/{version}/ with current-version pointer at _current.json.