Creating Custom TTS Voice Agents

Introduction

This guide will walk you through creating custom text-to-speech (TTS) voice agents in OpenClaw. By the end of this article, you'll be able to configure voice agents that use different TTS providers, customize voice characteristics, and implement advanced features like voice switching and expressive cues.

Understanding TTS Providers

OpenClaw supports three main TTS providers:

ElevenLabs - High-quality neural voices with voice cloning capabilities
OpenAI - Professional-grade TTS with natural-sounding voices
Edge TTS - Microsoft's free neural TTS service (no API key required)

Each provider has its strengths:

ElevenLabs excels at emotional expression and voice variety
OpenAI offers the most natural conversational flow
Edge TTS is completely free and doesn't require API keys

Basic Configuration

To enable TTS in OpenClaw, you need to configure it in your openclaw.json file. Here's a minimal configuration:

{
  "messages": {
    "tts": {
      "auto": "always",
      "provider": "elevenlabs"
    }
  }
}

This configuration enables TTS for all replies and uses ElevenLabs as the primary provider.

Provider-Specific Settings

ElevenLabs Configuration

{
  "messages":tts": {
    "provider": "elevenlabs",<|fileogram|>
    "elevenlabs": {
      "voiceId": "pMsXgVXv3BLzUgSXRplE",
      "modelId": "eleven_multilingual_v2",
n      "voiceSettings": {
        "stability": 0.5,
        "similarityBoost": 0.75,
        "style": 0.0,
        "useSpeakerBoost": true,
        "speed": 1.0
      }
    }
  }
}

Key parameters:

voiceId: Unique identifier for the voice (find in ElevenLabs dashboard)
modelId: TTS model to use (v2 or v3)
stability: 0-1, lower values add more variation
similarityBoost: 0-1, higher values improve consistency
speed: 0.5-2.0, playback speed multiplier

OpenAI Configuration

{
  "messages": {
    "tts": {
      "provider": "openai",
      "openai": {
        "model": "gpt-4o-mini-tts",
        "voice": "alloy"
      }
    }
  }
}

Available voices: alloy, echo, fable, onyx, nova, shimmer

Edge TTS Configuration

{
  "messages": {
    "tts": {
      "provider": "edge",
      "edge": {
        "voice": "en-US-MichelleNeural",
        "lang": "en-US",
        "outputFormat": "audio-24khz-48kbitrate-mono-mp3",
        "rate": "+10%",
        "pitch":-5%"
      }
    }
  }
}

Popular voices include:

en-US-JennyNeural
en-US-MichelleNeural
en-US-GuyNeural
en-GB-SarahNeural

Advanced Features

Voice Switching

You can switch voices dynamically within a single response using directives:

Hello there!

[[tts:provider=elevenlabs voiceId=pMsXgVXv3BLzUgSXRplE model=eleven_v3]]
This text will be spoken with a different voice.

Expressive Cues

Add emotional context to your voice output:

[[tts:text]](laughs) That's hilarious![[/tts:text]]

[[tts:text]](sings) ♫ La la la ♫[[/tts:text]]

These cues add appropriate prosody to the spoken output.

Best Practices

Fallback Providers: Always configure at least two providers for redundancy

{
  "messages": {
    "tts": {
      "provider": "elevenlabs"
    }
  }
}

OpenClaw automatically falls back to OpenAI if ElevenLabs fails, and to Edge TTS if both fail.

Length Management: For long responses, enable summarization

{
  "messages": {
    "tts": {
      "maxTextLength": 4000
    }
  }
}

Performance: Use Edge TTS for simple notifications when quality isn't critical
Cost Control: Monitor your API usage, especially with ElevenLabs and OpenAI

Troubleshooting

Common Issues

No Audio Output: Check that TTS is enabled and your API keys are valid
Voice Not Changing: Verify the voice ID is correct and the provider is properly configured
Poor Quality: Adjust stability and similarity settings for ElevenLabs voices

Debug Commands

Use these slash commands to debug TTS issues:

/tts status
/tts provider elevenlabs
/tts audio Test message

Conclusion

Custom TTS voice agents add a powerful dimension to your OpenClaw setup. By understanding the different providers and their configuration options, you can create agents with distinct personalities and appropriate vocal characteristics for different use cases.

Remember to always test your configurations and monitor performance and costs, especially when using commercial providers like ElevenLabs and OpenAI.

For more information, refer to the official OpenClaw TTS documentation.