Our VAPI implementation for a scientific conversational interview in German has the problem that the model does not always appropriately identify when to wait (e.g. when the user is thinking) or when to start speaking. Livekit Smart Endpointing might perhaps help. But the VAPI implementation is currently only available in English also though Livekit has recently been updated to support multi-language support. Could you perhaps include this multi-language support?