Quick Start

Choose your preferred installation method. All options get you to a working voice AI assistant.

1

npm — Fastest

One command. Sets up the project, installs dependencies, and walks you through LLM configuration interactively.

npx openvoiceui setup

2

Pinokio — One-Click

Download Pinokio, search for "OpenVoiceUI" in the app store, and click Install. Zero terminal interaction required.

Pinokio App Store → Search "OpenVoiceUI" → Install

3

Docker — Production

Clone the repo, copy the example env file, add your API keys, and start both OpenVoiceUI and OpenClaw in containers.

git clone https://github.com/MCERQUA/OpenVoiceUI
cd OpenVoiceUI
cp .env.example .env    # Add your API keys
docker compose up

4

VPS — Always-On

Deploy to any Linux VPS with Docker installed. Point a domain, set up SSL via Cloudflare, and your assistant is accessible from anywhere. Runs on 2 cores / 4GB RAM minimum.

See the deployment guide on GitHub →

5

VS Code — Dev Container

Open the repository in VS Code, accept the "Reopen in Container" prompt, and the dev container spins up with all dependencies pre-configured.

Open in Dev Container → Automatic setup

Configuration

After installation, configure your LLM, TTS, and STT providers in the .env file.

LLM Provider

Set your preferred AI model. OpenClaw routes to any Anthropic-compatible API endpoint.

# OpenAI
OPENAI_API_KEY=sk-...

# Anthropic
ANTHROPIC_API_KEY=sk-ant-...

# Groq (fast + free tier)
GROQ_API_KEY=gsk_...

# Ollama (local, free)
OLLAMA_BASE_URL=http://localhost:11434

Text-to-Speech

Choose how your AI speaks. Multiple TTS engines supported.

# Supertonic (self-hosted)
TTS_PROVIDER=supertonic
TTS_URL=http://supertonic:5050

# Browser native
TTS_PROVIDER=browser

# Custom endpoint
TTS_PROVIDER=custom
TTS_URL=http://your-tts:8080

Speech-to-Text

Configure how your voice is captured and transcribed.

# Web Speech API (Chrome, free)
STT_PROVIDER=webspeech

# Deepgram (streaming)
DEEPGRAM_API_KEY=...

# Groq Whisper (batch)
GROQ_API_KEY=gsk_...

Architecture

How the pieces fit together. Three components, one seamless experience.

1

Browser UI

Voice capture (STT), audio playback (TTS), canvas rendering, desktop environment. Runs in any modern browser.

2

Flask Server

OpenVoiceUI application server. Handles file management, canvas pages, uploads, TTS routing, and API endpoints.

3

OpenClaw Gateway

LLM router and agent orchestrator. Manages sessions, tool execution, sub-agents, skills, and model switching.

Browser (Voice + Canvas) ↔ Flask Server (API + Files) ↔ OpenClaw (LLM + Tools) ↔ Any LLM Provider

Documentation