STT: low-confidence re-ask + keyterm biasing
Two STT features that fix "it misheard me" from different angles. The first, confidenceReask, drops a low-confidence transcript and asks the caller to repeat. The second, keyterm biasing, steers the recognizer toward words you expect. Update 2026-06-01: confidenceReask was a dead, forward-declared group when this guide was first written; it shipped in agent ea1f00f0 and is now fully wired. Both clusters below are live.
The split: dead vs live
This part covers two clusters of settings. The first, confidenceReask, drops a low-confidence transcript before it reaches the LLM and asks the caller to repeat. The second, the keyterm biasing knobs, steers what the STT model outputs on every call. Both are live as of agent ea1f00f0 — but mind one caveat unique to re-ask: it only acts when the STT path emits per-word confidences (streaming Freya STT). On the batch/Deepgram path it is inert, so "nothing happened" there is expected, not a bug.
Group A — confidenceReask (now live)
pipecat-agent@ea1f00f0 (was dead on the original @dev audit).
A dedicated ConfidenceReaskProcessor (src/core/processors/confidence_reask.py:62) now sits between STT and the user aggregator. On a finalized turn it folds the per-turn MIN word confidence (reask_gate.py:extract_min_confidence) and calls should_reask() (reask_gate.py:49); if it fires it drops the likely-misheard transcript (keeping it out of the LLM context and the conversation_update webhook) and emits a re-ask. Built only when enabled (build_confidence_reask_processor, :202), wired in boot_steps.py:2130, inserted into the pipeline at :2918. Key caveat: it needs per-word confidences — a no-op on the batch/Deepgram path; it bites only on streaming Freya STT.Every field below now has a real consumer and carries a live marker. One extra key, confidenceReask.reduction, exists in the schema but only accepts "min" (the gate returns early unless reduction == "min", reask_gate.py:66) — so it is effectively fixed, not tunable.
Step 42 — confidenceReask.enabled live
The master switch. When false (or the whole block is absent), the processor is never built and the gate has zero pipeline footprint (build_confidence_reask_processor returns None, confidence_reask.py:216).
Runtime: gates construction in build_confidence_reask_processor (confidence_reask.py:202); also short-circuits should_reask (reask_gate.py:64).
Step 43 — confidenceReask.threshold live
The confidence floor. If the per-turn MIN word confidence is below this, re-ask. Lower means "only re-ask when very unsure"; higher means "re-ask more eagerly." Benchmark (lab exp 17): MIN at ~0.3 catches ~56% of meaning-breaking turns at ~5% clean re-asks.
Runtime: compared against the turn MIN in should_reask (reask_gate.py:68).
Step 44 — confidenceReask.disableOnDigitNodes live
Suppresses re-asking on digit-collection nodes (card numbers, TC kimlik, OTP), where confidence is misleading — the model is confidently wrong on misheard digits, and DTMF is the better lever. Default true. A per-node override beats this: a node can force the gate enabled or disabled regardless.
Runtime: checked against DigitNodeState.is_digit_node() / reask_override() in should_reask (reask_gate.py:77-90).
Step 45 — confidenceReask.reaskMode live
How the re-ask is produced: phrase speaks a canned line from reaskPhrases (or a per-language default); llm pushes a transient context with a "ask them to repeat" instruction so the LLM phrases it. The garbled turn is never committed either way.
Runtime: branched in ConfidenceReaskProcessor._emit_reask (confidence_reask.py:180-199).
Step 46 — confidenceReask.reaskPhrases live
The pool of canned re-ask lines (used when reaskMode = phrase), rotated to avoid robotic repetition. At most 20 phrases, each at most 2000 characters. Default null → falls back to a per-language default (en/tr/es/de/fr/pt/cs, reask_gate.py:22-31).
Runtime: rotated by pick_reask_phrase (reask_gate.py:94-109).
Step 47 — confidenceReask.maxConsecutiveReasks live
Cap on how many times in a row the agent re-asks before passing the transcript through anyway, so a caller in a noisy environment is never trapped in a "sorry, say that again" loop. The counter resets on any turn that passes the gate.
Runtime: loop guard in should_reask (reask_gate.py:70-76); counter tracked on the processor (confidence_reask.py:93,161,177).
ConfidenceReaskProcessor.process_frame in pipecat-agent and walk me through how a low-confidence turn gets dropped and a re-ask emitted — which frame is dropped, and where the transcript would otherwise have entered the LLM context."confidenceReask do nothing on this call?" — the usual answer is the STT path: the gate needs per-word confidences and is inert on batch/Deepgram. Have it confirm whether the agent is on streaming Freya STT (which emits word_confidences) for the turn in question.Re-ask decision tree live logic
should_reask gate (reask_gate.py:49). Re-ask fires when confidence < threshold, under the consecutive cap, and the digit-node check passes. Remember the real gate also needs per-word confidences (streaming Freya STT) — on batch/Deepgram it never measures a confidence and never fires.Adjust the inputs to see the decision. Re-ask fires only when confidence < threshold AND it is not a suppressed digit node AND you are still under the consecutive cap.
Group B — keyterm biasing (live)
This is the real, wired path. Keyterm biasing hands the STT model a short list of words you expect to hear ("Freya," a product name, an agent persona name, brand-specific jargon) so the recognizer leans toward producing them. It is the right tool for "the model keeps writing Ferya / Freyja instead of Freya." Every knob below has an actual consumer in base_service.py and carries a live marker.
Step 44 — keyterms live
What: the primary list of terms to bias the recognizer toward. These are passed to the STT backend as keyterm hints. The default already includes the brand name so the agent reliably transcribes Freya.
Runtime: base_service.py:615 (assembled and passed into the recognizer request).
keyterms.Step 45 — boostTerms live
What: the secondary list of terms to bias, applied with the configurable boostStrength below. Where keyterms is the baseline vocabulary nudge, boostTerms is the set you want to push harder.
Runtime: base_service.py:617 (read), base_service.py:618 (applied with strength).
keyterms — promote them to boostTerms and raise boostStrength into the gentle band.Step 46 — boostStrength live
What: how hard the boostTerms are pushed. This is the knob that bites. Stay in the 1–2 gentle band. At >=3 you risk hallucination: the model starts "hearing" the boosted term in audio where it was never said, which is far worse than the occasional miss it was meant to fix.
Runtime: base_service.py:618 (multiplied into the boost on the keyterm request).
boostStrength back below 3.Step 47 — keytermMinAudioSec live
What: a minimum-audio gate. Clips shorter than this duration skip keyterm biasing entirely. The rationale: very short snippets ("evet," "hı hı") carry almost no acoustic evidence, so biasing them is mostly a way to manufacture false positives. Below the gate, biasing is bypassed.
Runtime: base_service.py:622 (gate check), base_service.py:623 (bypass branch).
keytermMinAudioSec so short clips bypass biasing.Step 48 — keytermAntiParrot live
What: a guard against the recognizer "parroting" a keyterm back simply because it was hinted. When enabled, it discounts a keyterm match that is not actually supported by the acoustics, blunting the hallucination tail you would otherwise get from aggressive boosting.
Runtime: base_service.py:627-634 (anti-parrot evaluation block).
keytermAntiParrot is enabled before reaching for a lower boostStrength.base_service.py:680-684 rather than passed in the Freya-native shape. Same source list (keyterms + boostTerms), different wire format. If you are debugging "my keyterms work on one backend but not the other," that split is where to look.base_service.py around lines 615–634, walk me through how keyterms, boostTerms and boostStrength become the recognizer request, and where keytermMinAudioSec short-circuits it. Then show the Deepgram split at 680–684." This confirms the live path end to end and shows you exactly which numbers reach the model.Keyterm boost tuner live
Set boostStrength and watch the verdict. Stay in the green band. The keytermMinAudioSec gate shows when short clips bypass biasing entirely.
Checkpoint: digit collection mishears and re-ask does nothing on that step
Scenario. A digit-collection node (card last-4, TC kimlik, OTP) keeps mishearing the caller. You see confidenceReask.enabled = true in the agent config and expect the agent to catch the low-confidence transcript and ask the caller to repeat. It never does on that step. The call just proceeds with the wrong digits.
Why. The feature is live (agent ea1f00f0), but two preconditions gate it off here: (1) disableOnDigitNodes defaults true, so the gate deliberately skips digit nodes — confidence is misleading there, the model is confidently wrong on misheard digits (a per-node override can force it on); and (2) the gate needs per-word confidences, so it is inert unless you are on streaming Freya STT.
The actual fix. For digit reliability, do not reach for re-ask. Use the right levers instead:
- A transcription prompt on the STT layer that biases toward the expected digit shape (length, format), so the recognizer produces cleaner numbers in the first place.
- DTMF (touch-tone) collection for the digit node, so the caller keys the number and there is no STT ambiguity to re-ask about.
In short: re-ask is the wrong tool for a digit node (it is off there by default). Steer with a transcription prompt or collect DTMF.
Recap
Group A (confidenceReask): now a real, wired feature (agent ea1f00f0) — ConfidenceReaskProcessor drops low-confidence turns and re-asks, gated by should_reask. Two preconditions to remember: it needs streaming-Freya per-word confidences, and it defaults off on digit nodes. Group B (keyterms): wired in base_service.py (615, 617, 618, 622, 623, 627-634; Deepgram split 680-684). Keep boostStrength in 1–2, leave keytermAntiParrot on, and use keytermMinAudioSec to keep biasing off tiny clips. Meta-lesson: this group was documented as dead and then shipped — re-verify against the live repo after every pull.