Apr 13, 2026

1 commit | Theme: ML pipeline scaffolding

Changes

Gemma 4 Training/Serving/Mobile Pipeline Stubs (1b42581)

Scaffolded Phases 3-5 of the ML pipeline specification for three Gemma 4 model slots:

Slot	Base Model	Hardware	Method
`TS_Modal`	Gemma 4 31B (dense)	H100 80GB	Full LoRA
`TS_mobile4B`	Gemma 4 E4B	A10G	QLoRA 4-bit
`TS_mobile2B`	Gemma 4 E2B	T4	QLoRA 4-bit

New files:

ml/training/model_registry.py — ModelSlot enum, version tracking, S3 registry (ml[-env]/models/registry.json)
ml/training/training_config.py — Per-slot LoRA/QLoRA training hyperparameters
ml/training/trainer.py — HuggingFace model download to Modal Volume + Unsloth training stubs
ml/serving/vllm_server.py — vLLM inference stub with guided_json for schema enforcement
ml/serving/ab_router.py — A/B version routing stub (canary + shadow traffic)
ml/mobile/quantize.py — GGUF export + mobile latency/memory benchmark stubs

Modified:

core/core.py — Added mlTrainingImage (CUDA + Unsloth + vLLM), Modal Volume at /root/models/, HuggingFace secret
core/__init__.py — Export new ML symbols
ml/ml_endpoint.py — 8 new endpoints for training start/status, serving models/promote, model download/versions, mobile quantize/benchmark; Volume mount; GPU-class Modal functions
specs/ML_PIPELINE_SPEC.md — Updated from Gemma 3 to Gemma 4, added model selection rationale and versioning strategy

Model storage: Modal Volume at /root/models/{slot}/{version}/ with current-version pointer at _current.json.