Rapid-MLX latest version

v0.7.3

« Changelog History

Rapid-MLX v0.7.3 Release Notes

Release Date: 2026-06-12 // 6 days ago

What changed

⏪ Revert PR #555 (in-house DiffusionGemma generation loop, shipped in v0.7.2). The fixed_steps=8 default cap caused quality regressions on long Chinese / multi-paragraph output. Performance was never the bottleneck — direct mlx-vlm passthrough already matches what the in-house loop achieved (49.5 tok/s rapid-mlx HTTP vs 53.0 tok/s mlx-vlm direct on M3 Ultra, 256-token sweep, well within HTTP wrapper tax).

🛠 Fix tool calling on the mlx-vlm passthrough path so DiffusionGemma's call:NAME{...} wire form surfaces as structured tool_calls, not leaked delta.content:
- Gemma4ToolParser regex now accepts the post-HF-decode stripped form (no outer <|tool_call>/<tool_call|> wrappers). HF's tokenizer.decode(skip_special_tokens=True) strips those marker ids regardless of the skip_special_token_ids carve-out, so the body call:NAME{...} is what reaches the parser in production.
- StreamingPostProcessor._detect_tool_calls no longer short-circuits the parser when the delta has no < or [ char — it now also queries the active parser's has_pending_tool_call so the gemma4 stripped opener call:\w+\{ is recognised.
- ✅ Real-tokenizer end-to-end test added under TestStreamingPostProcessorGemma4StrippedForm so this class of bug can't ship silently again.
Alias naming — diffusion-gemma-26b split into diffusion-gemma-26b-4bit and diffusion-gemma-26b-8bit per the project <family>-<size>-<quant> SOP. The bare alias is removed.

v0.7.2 status

🚀 v0.7.2 will be yanked from PyPI and its release deleted. Users on 0.7.2 should upgrade to 0.7.3 — quality regressions from fixed_steps=8 will not be patched in-place.

Verification
- rapid-mlx share diffusion-gemma-26b-4bit end-to-end: SSE response ships only structured tool_calls, no leaked content
- 💻 Big-AGI UI confirms Weather + Web Search tools render as proper tool boxes (not raw call:NAME{...} text)
- ✅ 26 parser unit tests + 3 real-tokenizer postprocessor tests + 47 escape-hatch tests all pass
🚀 Release SOP gates (all green)

🚀 G1 release-smoke · G3 cli/config audit · G4 make smoke · G8 parser microbench · G10 mlx-version-bump scan (skip — no mlx-* deps changed) · G11 escape-hatch tests · G6 live server repro

🤖 Generated with Claude Code

Previous changes from v0.7.2

Caution

v0.7.2 is YANKED. Upgrade to v0.7.3 immediately.

0️⃣ v0.7.2 shipped a DiffusionGemma in-house generation loop (PR #555) whose fixed_steps=8 default
⏪ caused quality regressions on long Chinese / multi-paragraph output. v0.7.3 reverts PR #555 back to
🛠 the mlx-vlm passthrough path and fixes tool calling on top of it.

pip install rapid-mlx resolves to v0.7.3 going forward.

What's new in v0.7.2
- chore: bump version to 0.7.2 (fc61783)
- ⏪ DiffusionGemma in-house generation loop (PR #555, reverted in v0.7.3)