How to read it
This simulator is a teaching model, not the real DSP. The numbers and thresholds match the validator ranges and the runtime defaults exactly (see phase0-report.md), and the branch logic mirrors base_service.py, but real audio is messier. Use it to build intuition for which knob moves which outcome, then confirm a real change with a test call and the VAD boot log line (base_service.py:1552), per Part 10.
- Endpointing mode shows one caller utterance with a mid-sentence pause. Watch how
vadStopSecsdecides whether that pause ends the turn early (the classic cut-off), howsmartTurnsidesteps it, and howvadConfidence/vadMinVolumeplus the filters decide whether the caller is even heard over the noise. - Barge-in mode shows the bot speaking while the caller tries to cut in. Watch how
numberOfWordssets the interruption bar, how an interruption phrase bypasses it, how an acknowledgement never interrupts, and howbotSpeechGraceSecsswallows the bot's own echo at the start of its utterance.
voiceSeconds and backOffSeconds are absent because they are dead knobs (no runtime consumer in pipecat-agent@dev — verified). Including sliders for them would teach a fiction. aicEnabled is present but, like in production, does nothing when ticked.STT streaming vs batch — what changes
Freya's on-prem STT can run in two modes via Streaming Transcription (sttConfig.additionalSettings.streaming; picks FreyaSTTStreamingService vs the default batch FreyaSTTService, base_service.py:639). It is not a control in this simulator, because flipping it does not change what any slider above does. Here is the honest mapping so you can reason about it.
- Every VAD / turn slider behaves identically in both modes. Confidence, Minimum Volume, Start Delay, Stop Delay, Wait seconds, Number of words, interruption/acknowledgement phrases, Smart Turn, and the audio filters are unchanged — Freya runs VAD upstream of STT either way, so every Endpointing and Barge-in visual here is valid for streaming and batch alike.
- Low-confidence re-ask only works with streaming. The streaming service emits per-word confidences (
word_confidences) that the re-ask gate needs; batch does not. So the wholeconfidenceReaskgroup is live on streaming and inert on batch. This is the single biggest behavioral difference (see Part 8). - Batch adds latency and leans harder on Turn Stop Timeout. Batch transcribes the whole VAD-delimited segment after the turn closes, so the reply gap is larger and
userTurnStopTimeoutis more load-bearing as the backstop. The dashboard's own help even says: "Increase for batch STT with higher latency." Streaming transcribes continuously, so it is lower-latency. - Same knobs, slightly different role. In batch, the VAD stop boundary is also where the audio is sliced and handed to STT, so the endpointing knobs additionally decide what gets transcribed. In streaming, STT runs continuously and VAD mainly governs turn-taking.