Apr 13, 2026
1 commit | Theme: ML pipeline scaffolding
Changes
New Feature
Gemma 4 Training/Serving/Mobile Pipeline Stubs (1b42581)
Scaffolded Phases 3-5 of the ML pipeline specification for three Gemma 4 model slots:
| Slot | Base Model | Hardware | Method |
|---|---|---|---|
TS_Modal |
Gemma 4 31B (dense) | H100 80GB | Full LoRA |
TS_mobile4B |
Gemma 4 E4B | A10G | QLoRA 4-bit |
TS_mobile2B |
Gemma 4 E2B | T4 | QLoRA 4-bit |
New files:
ml/training/model_registry.py—ModelSlotenum, version tracking, S3 registry (ml[-env]/models/registry.json)ml/training/training_config.py— Per-slot LoRA/QLoRA training hyperparametersml/training/trainer.py— HuggingFace model download to Modal Volume + Unsloth training stubsml/serving/vllm_server.py— vLLM inference stub withguided_jsonfor schema enforcementml/serving/ab_router.py— A/B version routing stub (canary + shadow traffic)ml/mobile/quantize.py— GGUF export + mobile latency/memory benchmark stubs
Modified:
core/core.py— AddedmlTrainingImage(CUDA + Unsloth + vLLM), Modal Volume at/root/models/, HuggingFace secretcore/__init__.py— Export new ML symbolsml/ml_endpoint.py— 8 new endpoints for training start/status, serving models/promote, model download/versions, mobile quantize/benchmark; Volume mount; GPU-class Modal functionsspecs/ML_PIPELINE_SPEC.md— Updated from Gemma 3 to Gemma 4, added model selection rationale and versioning strategy
Model storage: Modal Volume at /root/models/{slot}/{version}/ with current-version pointer at _current.json.