Hi VAPI team,
I'd love to see native support for Mistral's Voxtral model family as a provider option across all three pipeline layers:
• STT: Voxtral Realtime (sub-200ms latency, 13 languages, open-weights Apache 2.0)
• LLM: Voxtral Small 24B (function-calling directly from voice, multilingual)
• TTS: Voxtral TTS (4B params, very low latency, 9 languages, zero-shot voice cloning)
What makes Voxtral particularly interesting for VAPI users is that it's the first open-source stack that covers all three layers under a single model family
Voxtral Realtime already achieves near-offline accuracy at 480ms delay, which fits right within VAPI's <700ms voice-to-voice target.
The Apache 2.0 license also makes it attractive for enterprise and regulated industry use cases.
Would love to see Mistral API keys supported at minimum, and ideally a self-hosted endpoint option for Voxtral Realtime via vLLM.
Thanks very much guys !