Production-grade voice AI pipeline — stream speech-to-text, route through large language models, and synthesize human-quality audio responses in under 800ms.
<800ms
END-TO-END LATENCY
3-IN-1
STT · LLM · TTS
99.2%
TRANSCRIPT ACCURACY
nova-pipeline — zsh
❯nova connect --model gemini-1.5-pro
✓ WebSocket connected (wss://nova/ws)
✓ STT provider: soniox (stt-rt-v3)
✓ LLM: gemini-1.5-pro streaming
✓ TTS: elevenlabs multilingual_v2
RAG context: bracbank.com/en ...
❯nova listen --vad endpoint
◎ Awaiting speech input...
❯
STT
LIVE
LLM
IDLE
TTS
IDLE
🎙
Neural STT
Soniox real-time endpoint detection with speaker diarization and VAD-based utterance merging for accurate transcription.
SONIOX · SCRIBE_V2
🧠
LLM + RAG
Gemini 1.5 Pro with optional vector-based retrieval augmentation. Session memory across turns with smart context truncation.
GEMINI · CHROMA
🔊
Streaming TTS
ElevenLabs sentence-level audio streaming with barge-in interruption detection. Sub-400ms time-to-first-audio.