Mastering the TTS Tool for Audio Generation in OpenClaw

OpenClaw provides a powerful local text-to-speech (TTS) capability through the sherpa-onnx-tts skill, enabling private, offline audio generation without relying on cloud services. This is ideal for users who prioritize data privacy, need reliable TTS in disconnected environments, or want to avoid API costs and latency.

Why Use Local TTS?

Cloud-based TTS services are convenient, but they come with trade-offs: network dependency, potential privacy concerns, API rate limits, and ongoing costs. OpenClaw's integration with sherpa-onnx solves these by running the entire TTS pipeline locally on your machine. This means:

Complete Privacy: Your text never leaves your device.
Zero Cost: Free to use after initial setup.
Offline Operation: Works without an internet connection.
Low Latency: Audio is generated instantly on your hardware.

This makes it perfect for generating voice notes, automating audio announcements, or creating personalized audio content securely.

Installing the sherpa-onnx-tts Skill

The sherpa-onnx-tts skill is not installed by default and requires a manual setup to download the necessary runtime and voice models.

Step 1: Install the Skill

First, ensure the skill is installed in your OpenClaw environment:

openclaw skills install sherpa-onnx-tts

Step 2: Download the Runtime and Model

The skill requires two components: the sherpa-onnx runtime (a shared library) and a voice model. The installer will guide you through downloading the correct runtime for your operating system (macOS, Linux, or Windows) and a default voice model.

The default model is piper-en_US-lessac-high, a clear and natural-sounding American English voice. The installer will download and extract these into ~/.openclaw/tools/sherpa-onnx-tts/runtime and ~/.openclaw/tools/sherpa-onnx-tts/models respectively.

Step 3: Configure Environment Variables

OpenClaw needs to know where to find the runtime and model. Update your ~/.openclaw/openclaw.json configuration file:

{
  skills: {
    entries: {
      "sherpa-onnx-tts": {
        env: {
          SHERPA_ONNX_RUNTIME_DIR: "~/.openclaw/tools/sherpa-onnx-tts/runtime",
          SHERPA_ONNX_MODEL_DIR: "~/.openclaw/tools/sherpa-onnx-tts/models/vits-piper-en_US-lessac-high",
        },
      },
    },
  },
}

After saving the config, restart the OpenClaw Gateway for the changes to take effect.

Using the TTS Tool

Once configured, you can use the TTS tool from any agent session that has access to the skill. You can invoke it directly as a tool call, or use the command-line wrapper.

Tool Call

{
  "name": "tts",
  "arguments": {
    "text": "Hello from your local OpenClaw assistant.",
    "channel": "whatsapp"
  }
}

This will generate a .wav audio file and return a MEDIA: path that can be sent as a voice message in supported channels like WhatsApp or Telegram.

Command-Line Usage

You can also use the wrapper script directly from the shell:

# Navigate to the skill's bin directory
cd ~/.openclaw/workspace/skills/sherpa-onnx-tts/bin

# Generate speech
./sherpa-onnx-tts -o ./my_audio.wav "This audio was generated locally."

The generated my_audio.wav file can then be used in any application.

Advanced Configuration

The sherpa-onnx-tts skill supports multiple voice models. You can download additional Piper voices (which are compatible with sherpa-onnx) and place them in the models directory. To use a different voice, update the SHERPA_ONNX_MODEL_DIR environment variable in your config to point to the new model's folder.

If a model directory contains multiple .onnx files, you can specify the exact model file with the SHERPA_ONNX_MODEL_FILE environment variable or the --model-file command-line flag.

Troubleshooting

Command not found: Ensure the skill's bin directory is in your PATH, or call the script with its full path.
Model not loading: Check the path in SHERPA_ONNX_MODEL_DIR is correct and points to a directory containing .onnx and tokens.txt files.
Runtime errors: Ensure the SHERPA_ONNX_RUNTIME_DIR points to the correct extracted runtime library for your OS.

By leveraging the sherpa-onnx-tts skill, you can add robust, private audio generation capabilities to your OpenClaw automation workflows, keeping your data secure and your operations offline.