RBS Voice Cloner v1.0.0 is out today. It's a free, fully offline AI voice cloning tool for Windows, built on the open-source XTTS v2 engine. You feed it 5–30 seconds of someone's voice, and it can read any text you type back to you in that voice — in any of 28+ languages. The whole thing runs on your own PC. No account, no subscription, no cloud calls.
I'm Rai, the developer. I'm a solo dev based in Singapore, and I've been working on this on and off for about six months. The motivation was simple: ElevenLabs is excellent but expensive, the open-source models are excellent but a pain to set up, and there was a clear gap for a friendly Windows installer that wraps the open-source side.
What's actually in the box
The app is built on XTTS v2 from Coqui — at the time of release, the most natural-sounding open-source text-to-speech model that anyone could run on consumer hardware. Specifically:
- Clone a voice from a 5–30 second audio sample. Longer samples (closer to 30 seconds, varied delivery) give noticeably better results than shorter ones.
- Generate speech in 28+ languages including English, Hindi, Spanish, French, German, Italian, Portuguese, Polish, Turkish, Russian, Dutch, Czech, Arabic, Mandarin Chinese, Japanese, Hungarian, Korean.
- Record directly from your microphone in-app, or upload any audio file (WAV, MP3, M4A).
- Built-in audio editor with waveform view, trim, noise reduction, fade in/out, and pitch control.
- Export as WAV (lossless) or MP3 (smaller file).
- GPU-accelerated on any NVIDIA CUDA card; falls back to CPU automatically if you don't have a discrete GPU.
- Save unlimited voice profiles — clone your voice once, reuse it forever.
What people are actually using it for
From the early-access feedback, the most common use cases:
- Audiobook drafts — authors generating a draft narration of their own book in their own voice, to hear how it reads before committing to a real recording.
- Indie game NPC voice lines — solo and small-team game devs who can't afford voice actors for hundreds of small lines.
- Language practice — language learners generating native-sounding pronunciation of unfamiliar words and phrases.
- Accessibility — turning long articles into audio to listen on a commute, in a familiar voice.
- YouTube voiceovers — creators who don't love their own voice on camera, generating a cleaner version of themselves for narration.
- Localisation drafts — generating rough translated voice tracks for video so a translator can hear the timing before recording properly.
100% free, 100% offline
After the first launch (which downloads the ~2 GB XTTS v2 model from Hugging Face) the app runs entirely offline. No subscription, no cloud processing, no telemetry. Your voice samples and generated audio stay on your machine. You can pull the network cable after install and everything still works.
The download is about 1.4 GB and the installed size after the model downloads is around 4 GB. Compare that to V2 which is ~2 GB to download and ~5 GB installed — V1 is the lighter option if disk space matters.
System requirements
- Windows 10 or Windows 11 (64-bit). Windows 7/8 not supported.
- 8 GB RAM minimum, 16 GB recommended for smoother editing.
- 4 GB+ free disk space (10 GB recommended once you're saving generated audio).
- Internet connection for the first-time model download only (~2 GB, one time).
- NVIDIA GPU with CUDA recommended for fast generation. CPU-only mode works but is roughly 5-10x slower.
- Administrator rights for installation. App itself doesn't need admin to run after install.
Tips for a better-sounding clone
The single biggest factor in clone quality isn't the model — it's the source sample. Five things that matter:
- Quiet room. Background noise hurts more than people expect.
- One speaker only. No music underneath. No overlapping voices.
- 20-30 seconds works better than 5 seconds, even though 5 is the minimum.
- Vary the delivery in the sample — questions, statements, an exclamation if natural — so the model captures your range.
- Speak normally. Don't read in monotone, don't perform either. The clone reproduces what's in the sample, so a deadpan sample produces deadpan output.
Troubleshooting the common issues
"It's only using my CPU even though I have an NVIDIA card." Update your NVIDIA driver (any recent driver supporting CUDA 11.8+ should work). Restart the app. If it still defaults to CPU, the diagnose log in the app's Settings menu will tell you what's missing. V2 has a dedicated Diagnose page (Ctrl+D) that's much clearer if you keep hitting this.
"The cloned voice doesn't sound much like the source." Re-record with a longer, varied sample in a quieter room. Whispered or low-volume samples produce muddy output.
"First launch is downloading forever." The XTTS v2 model is ~2 GB and Hugging Face's download speed varies a lot by region. If your internet drops mid-download, delete %LOCALAPPDATA%\RBS Voice Cloner\models\ and relaunch — the app will re-download cleanly.
Use it responsibly
Voice cloning is genuinely powerful technology now and the legal landscape changed quickly through 2025-2026. Only clone voices you have permission to clone — your own voice, public-domain recordings, or someone who has explicitly agreed in writing. Cloning a real person's voice without consent for misleading content is illegal in many places. There's a longer breakdown in the voice cloning legality guide if you want the specifics.
How to verify the download
The download page shows a SHA-256 hash. After you download the ZIP, open PowerShell and run Get-FileHash <path> -Algorithm SHA256. Compare the output to the hash on the page — if they match, you have an unmodified release. There's also a VirusTotal scan link on the page for an independent malware-check.
Download RBS Voice Cloner V1 — Free
Windows 10/11 · ~248 MB · No subscription · 100% offline after install