Real-Time Voice Intelligence

Speak to the Neural Core.

Production-grade voice AI pipeline — stream speech-to-text, route through large language models, and synthesize human-quality audio responses in under 800ms.

<800ms

END-TO-END LATENCY

3-IN-1

STT · LLM · TTS

99.2%

TRANSCRIPT ACCURACY

nova-pipeline — zsh

❯nova connect --model gemini-1.5-pro

✓ WebSocket connected (wss://nova/ws)

✓ STT provider: soniox (stt-rt-v3)

✓ LLM: gemini-1.5-pro streaming

✓ TTS: elevenlabs multilingual_v2

RAG context: bracbank.com/en ...

❯nova listen --vad endpoint

◎ Awaiting speech input...

❯

STT

LIVE

LLM

IDLE

TTS

IDLE

🎙

Neural STT

Soniox real-time endpoint detection with speaker diarization and VAD-based utterance merging for accurate transcription.

SONIOX · SCRIBE_V2

🧠

LLM + RAG

Gemini 1.5 Pro with optional vector-based retrieval augmentation. Session memory across turns with smart context truncation.

GEMINI · CHROMA

🔊

Streaming TTS

ElevenLabs sentence-level audio streaming with barge-in interruption detection. Sub-400ms time-to-first-audio.

ELEVENLABS · MP3