Rapid-MLX v0.7.3 Release Notes
Release Date: 2026-06-12 // 6 days ago-
What changed
⏪ Revert PR #555 (in-house DiffusionGemma generation loop, shipped in v0.7.2). The
fixed_steps=8default cap caused quality regressions on long Chinese / multi-paragraph output. Performance was never the bottleneck — direct mlx-vlm passthrough already matches what the in-house loop achieved (49.5 tok/s rapid-mlx HTTP vs 53.0 tok/s mlx-vlm direct on M3 Ultra, 256-token sweep, well within HTTP wrapper tax).🛠 Fix tool calling on the mlx-vlm passthrough path so DiffusionGemma's
call:NAME{...}wire form surfaces as structuredtool_calls, not leakeddelta.content:Gemma4ToolParserregex now accepts the post-HF-decode stripped form (no outer<|tool_call>/<tool_call|>wrappers). HF'stokenizer.decode(skip_special_tokens=True)strips those marker ids regardless of theskip_special_token_idscarve-out, so the bodycall:NAME{...}is what reaches the parser in production.StreamingPostProcessor._detect_tool_callsno longer short-circuits the parser when the delta has no<or[char — it now also queries the active parser'shas_pending_tool_callso the gemma4 stripped openercall:\w+\{is recognised.- ✅ Real-tokenizer end-to-end test added under
TestStreamingPostProcessorGemma4StrippedFormso this class of bug can't ship silently again.
Alias naming —
diffusion-gemma-26bsplit intodiffusion-gemma-26b-4bitanddiffusion-gemma-26b-8bitper the project<family>-<size>-<quant>SOP. The bare alias is removed.v0.7.2 status
🚀 v0.7.2 will be yanked from PyPI and its release deleted. Users on 0.7.2 should upgrade to 0.7.3 — quality regressions from
fixed_steps=8will not be patched in-place.Verification
rapid-mlx share diffusion-gemma-26b-4bitend-to-end: SSE response ships only structuredtool_calls, no leaked content- 💻 Big-AGI UI confirms Weather + Web Search tools render as proper tool boxes (not raw
call:NAME{...}text) - ✅ 26 parser unit tests + 3 real-tokenizer postprocessor tests + 47 escape-hatch tests all pass
🚀 Release SOP gates (all green)
🚀 G1 release-smoke · G3 cli/config audit · G4 make smoke · G8 parser microbench · G10 mlx-version-bump scan (skip — no mlx-* deps changed) · G11 escape-hatch tests · G6 live server repro
🤖 Generated with Claude Code
Previous changes from v0.7.2
-
Caution
v0.7.2 is YANKED. Upgrade to v0.7.3 immediately.
0️⃣ v0.7.2 shipped a
DiffusionGemmain-house generation loop (PR #555) whosefixed_steps=8default
⏪ caused quality regressions on long Chinese / multi-paragraph output. v0.7.3 reverts PR #555 back to
🛠 the mlx-vlm passthrough path and fixes tool calling on top of it.pip install rapid-mlxresolves to v0.7.3 going forward.What's new in v0.7.2