Where these run. TTS text passes through an ordered filter chain built in base_service.py:842-860: MarkdownTextFilter() -> SpeechTextFilter(language, substitutions, 6 toggles) -> LanguageTextFilter (unless language is multi). The six normalization toggles live under ttsConfig.speechNormalization (zod speechNormalizationSchema, validators.ts:87-94). The layers run in order: structural email/URL/emoji handling, then your custom substitutions, then number normalization.
The always-on filter chain live
What it is. Every TTS utterance is rewritten before synthesis. This is not one toggle but the pipeline that hosts all the toggles below. Markdown is stripped first, then SpeechTextFilter applies normalization and substitutions, then per-language fixups.
Runtime. Chain assembled in base_service.py:842-860; the speech filter implementation lives in speech_text_filter.py. Because filtering happens before synthesis, no TTS provider sees the raw "https://" or the bare emoji.
- When to change: you do not toggle the chain itself — you toggle the individual normalizers below.
Symptom it owns: any "the bot spoke the literal characters" complaint traces back to a normalizer being off (or the text being shaped so the regex did not match).
Emails, URLs, emojis live
What it is. Rewrite emails (user@example.com → "user at example dot com"), URLs (strip the scheme, turn "." into "dot"), and strip emojis so the model does not try to vocalize them.
Runtime. base_service.py:849-851 wires the flags; implementation at speech_text_filter.py:139-141. These three are language-agnostic and on by default.
- When to change: rarely turn off. The realistic case is disabling URL normalization if your bot reads short codes that look like URLs but are not.
Symptom it fixes: "it literally said h-t-t-p-s colon slash slash" → confirm URL normalization is on and the text is shaped so the matcher catches it.
Phone / TC-identity / general numbers live Turkish-only
What it is. Speak digit strings naturally instead of as one run: phone numbers grouped 3-3-2-2, 11-digit TC identity numbers grouped 3-3-3-2, and general numbers via Turkish num2words.
Runtime. base_service.py:852-854; implementation speech_text_filter.py:142-144 via _tr_digits_to_words. The runtime only implements Turkish, and the dashboard disables these three toggles unless language === "tr" (tts-config-panel.tsx:250-251).
validators.ts:85). Enable these deliberately, only for flows that read numbers back to the caller — and only with language = tr.- When to change: enable
identityNumbers/phoneNumbersfor read-back flows;generalNumberswhen long numeric runs (amounts, IBAN tails) should be spoken as words.
Symptom it fixes: "it read my TC number as one enormous number / dropped the leading zero" → set language = tr and enable identityNumbers.
Try it — normalization previewer
Custom text substitutions live
What it is. Your own ordered regex find/replace rules, applied before TTS — e.g. force "Fibabanka" → "Fiba banka", or expand an abbreviation the model mangles.
Runtime. Wired at base_service.py:847; compiled at speech_text_filter.py:179-221 and applied sequentially (top to bottom) at :308-323. Uses the ReDoS-safe regex engine with a 0.1s-per-rule timeout, so a runaway pattern cannot hang the call.
- When to change: pronunciation fixes, brand names, expanding domain acronyms. Order matters; case-sensitivity is per-rule.
Symptom it fixes: "it mispronounces our company name / a product term" → add a substitution. Fastest pronunciation fix, no model change.
Try it — substitution tester
speech_text_filter.py:179-221 and :308-323 — how are textSubstitutions compiled, and what enforces the 0.1s-per-rule timeout?"Background sound live
What it is. A looping ambient audio bed (e.g. a subtle call-center murmur) mixed under the bot's voice so the silence between sentences does not feel sterile or obviously synthetic.
Runtime. base_service.py:1386-1392 constructs a ResamplingSoundfileMixer(volume=1.2) that loops the file under the TTS output. Note this is top-level, not under ttsConfig. The audio is uploaded via dedicated audio-asset endpoints, not pasted raw.
- When to change: when a customer feels the dead-silent background sounds "fake" or "creepy". Keep the bed subtle.
Symptom it fixes: "the total silence between sentences feels unnatural / makes it obvious it's a bot."
backgroundSoundUrl in pipecat-agent and confirm the ResamplingSoundfileMixer volume, then show me the dashboard validator that caps it at 200 chars."Checkpoint
Customer says the agent reads back IBANs and TC numbers as a single huge number. Two things to set?
1. Set ttsConfig.language = "tr" — the number normalizers are gated on Turkish, both in the UI and at runtime. 2. Enable speechNormalization.identityNumbers (and generalNumbers for the IBAN's long numeric run). Both are off/unset by default. If language is not tr, these toggles are disabled in the UI (tts-config-panel.tsx:250-251) and ignored at runtime — so the language change is the prerequisite, not optional.