Apr 22, 2026
Commits
86034e2 — Simplify model versioning — shared Volume, config-driven env pins
Collapsed three overlapping version-resolution mechanisms (_current_{env}.json pointers, per-env Volumes, S3 registry preflight) into one: a single Modal Volume living in the main environment, mounted cross-env by every app. Active version per env is now declared in dev/core/model_versions.py as either a literal "vN" or "latest", where "latest" resolves through a single {slot}/_latest.json pointer that the trainer advances on successful runs.
Also fixes a trainer bug where a partial HuggingFace download (directory exists but no weight files) was treated as "already complete" — now checks for actual .safetensors / .bin files.
CI preflight manifest jobs (preflight_manifest_beta, preflight_manifest_main) removed — S3 registry no longer drives path resolution.
New files:
core/model_versions.py—MODEL_VERSIONSconfig +resolve_version()helper
Changed:
core/core.py—ml_model_volumepinned toenvironment_name="main"for cross-env mountml/ml_endpoint.py—diagnosenow surfaces_latest.jsonml/mobile/litert.py— manifest resolves viaMODEL_VERSIONSinstead of "newest LiteRT in registry"ml/training/model_registry.py— drop_current_{env}.jsonand legacy fallback; usesMODEL_VERSIONSthen_latest.jsonml/training/trainer.py—_has_weight_files()guard replaces naive "listdir is non-empty" skip check.gitlab-ci.yml— droppreflight_manifest_{beta,main}jobs
d202c02 — Short-circuit convert-litert endpoint to skip GPU cold start
The POST /ml/mobile/convert-litert endpoint was spawning an A10G container only to hit NotImplementedError in litert.py. Now returns HTTP 501 at the endpoint level before any GPU allocation, avoiding a ~60s cold-start penalty. The modal_convert_litert Modal function remains as scaffold for when litert-torch E-series support stabilizes upstream.
Changed:
ml/ml_endpoint.py— early 501 return for convert-litert
d14eecb — Guard vLLM output indexing against empty results
VllmE2B/VllmE4B/Vllm31B.generate() and the _make_vllm_generate factory accessed outputs[0].outputs[0].text with no bounds check. An empty outputs list (oversized prompt, OOM, tokenizer failure) raised IndexError, surfacing as an opaque 500 through .remote.aio(). Now each path checks both list depths, logs a warning with prompt length and max_tokens, and returns "" so requests complete with a structured response.
Changed:
ml/ml_endpoint.py— bounds checks on vLLM output indexing