Documentation Index
Fetch the complete documentation index at: https://docs.aurinfer.com/llms.txt
Use this file to discover all available pages before exploring further.
Voice Engine
Navigate to Universal → Voice Engine on your character’s dashboard. This section has five sub-pages: Overview, Settings, Frequency, Instructions, and Preferences.
The Voice Engine controls everything about how your character sounds — which voice model is used, how expressive or stable the voice is, when it speaks in voice calls, and how it uses audio tags to add emotion.
The active voice is shown at the top of every Voice Engine page alongside its provider badge (e.g., Eleven v3) and an enable/delete toggle.
Overview
Navigate to Voice Engine → Overview.
Shows technical details about the currently configured voice and performance metrics.
Voice Details
| Field | Description |
|---|
| Voice ID | The unique ID of the voice model in use — copy it for reference |
| Model | The TTS (text-to-speech) synthesis model (e.g., eleven_v3) |
| Language | The primary language the voice was trained in (e.g., En) |
| Accent | The accent variant, if set (e.g., persian) — can be removed with ✕ |
| Gender | The voice’s gender classification (e.g., Male) |
Voice Latency
| Metric | Description |
|---|
| Average | Typical latency in seconds for the voice model to generate audio |
| P95 | 95th percentile latency — the upper bound most responses fall under |
Voice latency reflects only the TTS model’s rendering time. Total response latency includes the AI engine processing time on top of this.
Training Samples
A list of audio samples used to train or clone the voice, shown as MPEG files with duration and file size. These are the base recordings the voice is derived from.
Click Edit Voice to change which voice is assigned to this character.
Settings
Navigate to Voice Engine → Settings.
Fine-tune the audio characteristics of the voice with these sliders:
| Setting | Description |
|---|
| Stability | Controls how consistent the voice sounds between sentences. Higher = more uniform; lower = more expressive variation. |
| Similarity Boost | How closely the output matches the original voice clone. Higher = more faithful to the training samples. |
| Style | How much the voice exaggerates stylistic characteristics like emotion and emphasis. |
| Speed | The speaking rate. Lower = slower and more deliberate; higher = faster delivery. |
| Pitch | Shifts the overall pitch up or down from the voice’s natural baseline. |
For conversational characters, a Stability of 0.5–0.7 with Style around 0.3–0.5 gives natural variation without sounding erratic. For dramatic or emotional characters, lower Stability and higher Style.
Frequency
Navigate to Voice Engine → Frequency.
Voice Frequency controls how often your character responds with voice messages (in text channels where voice is enabled).
There are two modes:
Let Character Decide (toggle)
When on, the character uses its own personality and context to decide when to send a voice message versus a text reply. The Manual Frequency slider is disabled while this is active.
Manual Frequency Control (slider)
When “Let Character Decide” is off, you manually set the responding-with-voice rate:
| Slider Position | Value | Effect |
|---|
| Far left | 0 (Never) | Character never sends voice messages |
| Middle | ~0.5 (50%) | Character sends voice roughly half the time |
| Far right | 1 (Always) | Character always responds with voice |
Instructions
Navigate to Voice Engine → Instructions.
Expression Instructions — a freeform text field (up to 1000 characters) where you write instructions for how the voice model uses audio tags to express emotion and style.
What are Audio Tags?
Audio tags are descriptive instructions enclosed in square brackets, inserted into the text before synthesis. Examples:
[whisper] — makes the voice lower and hushed
[applause] — adds applause sound effect
[yells] — makes the voice louder and more urgent
[sigh] — inserts a sigh
[British accent] — shifts the accent
[laughs], [starts laughing], [crying], [breathes], [echoing], [explosion]
Write instructions telling the voice when and how to use these tags. For example:
Use [laughs] before sarcastic remarks.
Insert [breathes] frequently for a dramatic, contemplative feel.
Use [yells] when responding to insults or provocations.
Add [echoing] before every sentence for an otherworldly effect.
The character follows these instructions when constructing spoken responses.
Preferences
Navigate to Voice Engine → Preferences.
Controls how the character behaves in Discord voice channels:
| Setting | Description |
|---|
| Auto-join Voice Channels | If enabled, the character joins a voice channel automatically when members are present, without being explicitly invited |
| Auto-leave Voice Channels | If enabled, the character leaves the voice channel automatically when no human members remain |