Part 8

STT: low-confidence re-ask + keyterm biasing

Two STT features that fix "it misheard me" from different angles. The first, confidenceReask, drops a low-confidence transcript and asks the caller to repeat. The second, keyterm biasing, steers the recognizer toward words you expect. Update 2026-06-01: confidenceReask was a dead, forward-declared group when this guide was first written; it shipped in agent ea1f00f0 and is now fully wired. Both clusters below are live.

The split: dead vs live

This part covers two clusters of settings. The first, confidenceReask, drops a low-confidence transcript before it reaches the LLM and asks the caller to repeat. The second, the keyterm biasing knobs, steers what the STT model outputs on every call. Both are live as of agent ea1f00f0 — but mind one caveat unique to re-ask: it only acts when the STT path emits per-word confidences (streaming Freya STT). On the batch/Deepgram path it is inert, so "nothing happened" there is expected, not a bug.

Group A — confidenceReask (now live)

Wired as of pipecat-agent@ea1f00f0 (was dead on the original @dev audit). A dedicated ConfidenceReaskProcessor (src/core/processors/confidence_reask.py:62) now sits between STT and the user aggregator. On a finalized turn it folds the per-turn MIN word confidence (reask_gate.py:extract_min_confidence) and calls should_reask() (reask_gate.py:49); if it fires it drops the likely-misheard transcript (keeping it out of the LLM context and the conversation_update webhook) and emits a re-ask. Built only when enabled (build_confidence_reask_processor, :202), wired in boot_steps.py:2130, inserted into the pipeline at :2918. Key caveat: it needs per-word confidences — a no-op on the batch/Deepgram path; it bites only on streaming Freya STT.

Every field below now has a real consumer and carries a live marker. One extra key, confidenceReask.reduction, exists in the schema but only accepts "min" (the gate returns early unless reduction == "min", reask_gate.py:66) — so it is effectively fixed, not tunable.

Step 42 — confidenceReask.enabled live

The master switch. When false (or the whole block is absent), the processor is never built and the gate has zero pipeline footprint (build_confidence_reask_processor returns None, confidence_reask.py:216).

confidenceReask.enabled boolean default false

Runtime: gates construction in build_confidence_reask_processor (confidence_reask.py:202); also short-circuits should_reask (reask_gate.py:64).

Step 43 — confidenceReask.threshold live

The confidence floor. If the per-turn MIN word confidence is below this, re-ask. Lower means "only re-ask when very unsure"; higher means "re-ask more eagerly." Benchmark (lab exp 17): MIN at ~0.3 catches ~56% of meaning-breaking turns at ~5% clean re-asks.

confidenceReask.threshold range 0–1 default 0.3

Runtime: compared against the turn MIN in should_reask (reask_gate.py:68).

Step 44 — confidenceReask.disableOnDigitNodes live

Suppresses re-asking on digit-collection nodes (card numbers, TC kimlik, OTP), where confidence is misleading — the model is confidently wrong on misheard digits, and DTMF is the better lever. Default true. A per-node override beats this: a node can force the gate enabled or disabled regardless.

confidenceReask.disableOnDigitNodes boolean default true

Runtime: checked against DigitNodeState.is_digit_node() / reask_override() in should_reask (reask_gate.py:77-90).

Step 45 — confidenceReask.reaskMode live

How the re-ask is produced: phrase speaks a canned line from reaskPhrases (or a per-language default); llm pushes a transient context with a "ask them to repeat" instruction so the LLM phrases it. The garbled turn is never committed either way.

confidenceReask.reaskMode enum phrase | llm default phrase

Runtime: branched in ConfidenceReaskProcessor._emit_reask (confidence_reask.py:180-199).

Step 46 — confidenceReask.reaskPhrases live

The pool of canned re-ask lines (used when reaskMode = phrase), rotated to avoid robotic repetition. At most 20 phrases, each at most 2000 characters. Default null → falls back to a per-language default (en/tr/es/de/fr/pt/cs, reask_gate.py:22-31).

confidenceReask.reaskPhrases array, <=20 items, each <=2000 chars default null

Runtime: rotated by pick_reask_phrase (reask_gate.py:94-109).

Step 47 — confidenceReask.maxConsecutiveReasks live

Cap on how many times in a row the agent re-asks before passing the transcript through anyway, so a caller in a noisy environment is never trapped in a "sorry, say that again" loop. The counter resets on any turn that passes the gate.

confidenceReask.maxConsecutiveReasks range 0–5 default 2

Runtime: loop guard in should_reask (reask_gate.py:70-76); counter tracked on the processor (confidence_reask.py:93,161,177).

Ask Claude Code. "Show me ConfidenceReaskProcessor.process_frame in pipecat-agent and walk me through how a low-confidence turn gets dropped and a re-ask emitted — which frame is dropped, and where the transcript would otherwise have entered the LLM context."
Ask Claude Code. "Why does confidenceReask do nothing on this call?" — the usual answer is the STT path: the gate needs per-word confidences and is inert on batch/Deepgram. Have it confirm whether the agent is on streaming Freya STT (which emits word_confidences) for the turn in question.

Re-ask decision tree live logic

This mirrors the real should_reask gate (reask_gate.py:49). Re-ask fires when confidence < threshold, under the consecutive cap, and the digit-node check passes. Remember the real gate also needs per-word confidences (streaming Freya STT) — on batch/Deepgram it never measures a confidence and never fires.

Adjust the inputs to see the decision. Re-ask fires only when confidence < threshold AND it is not a suppressed digit node AND you are still under the consecutive cap.

vs maxConsecutiveReasks 2
 

Group B — keyterm biasing (live)

This is the real, wired path. Keyterm biasing hands the STT model a short list of words you expect to hear ("Freya," a product name, an agent persona name, brand-specific jargon) so the recognizer leans toward producing them. It is the right tool for "the model keeps writing Ferya / Freyja instead of Freya." Every knob below has an actual consumer in base_service.py and carries a live marker.

Step 44 — keyterms live

What: the primary list of terms to bias the recognizer toward. These are passed to the STT backend as keyterm hints. The default already includes the brand name so the agent reliably transcribes Freya.

keyterms array of strings default ['Freya']

Runtime: base_service.py:615 (assembled and passed into the recognizer request).

Symptom: a brand or product name is consistently mis-transcribed into a phonetic neighbour. Add the exact spelling to keyterms.

Step 45 — boostTerms live

What: the secondary list of terms to bias, applied with the configurable boostStrength below. Where keyterms is the baseline vocabulary nudge, boostTerms is the set you want to push harder.

boostTerms array of strings default empty

Runtime: base_service.py:617 (read), base_service.py:618 (applied with strength).

Symptom: rare-but-critical words (an account type, a campaign codeword) get dropped or rewritten even after adding them to keyterms — promote them to boostTerms and raise boostStrength into the gentle band.

Step 46 — boostStrength live

What: how hard the boostTerms are pushed. This is the knob that bites. Stay in the 1–2 gentle band. At >=3 you risk hallucination: the model starts "hearing" the boosted term in audio where it was never said, which is far worse than the occasional miss it was meant to fix.

boostStrength range ~0–5 (1–2 gentle, >=3 hallucination risk) tune per agent

Runtime: base_service.py:618 (multiplied into the boost on the keyterm request).

Symptom: after raising boost, transcripts contain the boosted word in turns where the caller said something else entirely (phantom keyterms). Drop boostStrength back below 3.

Step 47 — keytermMinAudioSec live

What: a minimum-audio gate. Clips shorter than this duration skip keyterm biasing entirely. The rationale: very short snippets ("evet," "hı hı") carry almost no acoustic evidence, so biasing them is mostly a way to manufacture false positives. Below the gate, biasing is bypassed.

keytermMinAudioSec seconds (~0–3 useful range) tune per agent

Runtime: base_service.py:622 (gate check), base_service.py:623 (bypass branch).

Symptom: keyterms appearing on tiny back-channel turns. Raise keytermMinAudioSec so short clips bypass biasing.

Step 48 — keytermAntiParrot live

What: a guard against the recognizer "parroting" a keyterm back simply because it was hinted. When enabled, it discounts a keyterm match that is not actually supported by the acoustics, blunting the hallucination tail you would otherwise get from aggressive boosting.

keytermAntiParrot boolean / guard config on (recommended)

Runtime: base_service.py:627-634 (anti-parrot evaluation block).

Symptom: even with modest boost, the model occasionally inserts a keyterm with no acoustic support. Ensure keytermAntiParrot is enabled before reaching for a lower boostStrength.
Deepgram backend note. When the STT backend is Deepgram, the assembled keyterm set is split into Deepgram's own keyterm parameters at base_service.py:680-684 rather than passed in the Freya-native shape. Same source list (keyterms + boostTerms), different wire format. If you are debugging "my keyterms work on one backend but not the other," that split is where to look.
Ask Claude Code. "In base_service.py around lines 615–634, walk me through how keyterms, boostTerms and boostStrength become the recognizer request, and where keytermMinAudioSec short-circuits it. Then show the Deepgram split at 680–684." This confirms the live path end to end and shows you exactly which numbers reach the model.

Keyterm boost tuner live

Set boostStrength and watch the verdict. Stay in the green band. The keytermMinAudioSec gate shows when short clips bypass biasing entirely.

 
 
 
Checkpoint: digit collection mishears and re-ask does nothing on that step

Scenario. A digit-collection node (card last-4, TC kimlik, OTP) keeps mishearing the caller. You see confidenceReask.enabled = true in the agent config and expect the agent to catch the low-confidence transcript and ask the caller to repeat. It never does on that step. The call just proceeds with the wrong digits.

Why. The feature is live (agent ea1f00f0), but two preconditions gate it off here: (1) disableOnDigitNodes defaults true, so the gate deliberately skips digit nodes — confidence is misleading there, the model is confidently wrong on misheard digits (a per-node override can force it on); and (2) the gate needs per-word confidences, so it is inert unless you are on streaming Freya STT.

The actual fix. For digit reliability, do not reach for re-ask. Use the right levers instead:

  • A transcription prompt on the STT layer that biases toward the expected digit shape (length, format), so the recognizer produces cleaner numbers in the first place.
  • DTMF (touch-tone) collection for the digit node, so the caller keys the number and there is no STT ambiguity to re-ask about.

In short: re-ask is the wrong tool for a digit node (it is off there by default). Steer with a transcription prompt or collect DTMF.

Recap

Group A (confidenceReask): now a real, wired feature (agent ea1f00f0) — ConfidenceReaskProcessor drops low-confidence turns and re-asks, gated by should_reask. Two preconditions to remember: it needs streaming-Freya per-word confidences, and it defaults off on digit nodes. Group B (keyterms): wired in base_service.py (615, 617, 618, 622, 623, 627-634; Deepgram split 680-684). Keep boostStrength in 1–2, leave keytermAntiParrot on, and use keytermMinAudioSec to keep biasing off tiny clips. Meta-lesson: this group was documented as dead and then shipped — re-verify against the live repo after every pull.