Cloning your own voice used to be a research project — install Python, install PyTorch with the right CUDA version, debug "DLL not found" errors, write Python code, hope it works. In 2026 it's a 5-minute install on Windows, fully free, fully offline. This is the simplest possible walkthrough.
You'll end up with a saved voice profile of your own voice that can read any text in 17 languages.
What you need before you start
- Windows 10 or Windows 11 (64-bit).
- ~5 GB free disk space (for the app + the XTTS v2 model).
- A microphone (any laptop mic works) or a 30-second WAV/MP3 of your voice.
- Internet for the first launch only (the model downloads once, then everything runs offline).
- Optional but recommended: any NVIDIA RTX GPU. Generation runs 5–10× faster on CUDA. CPU-only works fine if you don't have one.
Step 1 — Download & install (1 minute)
Grab RBS Voice Cloner V2 from the website. It's a ~2 GB ZIP because it bundles PyTorch + CUDA 12.8 runtime so you don't have to install Python. Extract, run the installer, click through.
Verify the download is unmodified: the download page shows a SHA-256 hash. After downloading, open PowerShell and run Get-FileHash <path-to-zip> -Algorithm SHA256. The two strings should match. There's also a VirusTotal link on the page so you can see independent malware-scan results.
Step 2 — First launch & model download (2-3 minutes)
Open the app. The first time you launch it, it downloads the XTTS v2 model (~2 GB) from Hugging Face. This is a one-time download — after this, no internet is required.
While you wait, press Ctrl+D to open the Diagnose page. It shows whether the app sees your GPU and whether CUDA is wired up. If you have an NVIDIA RTX card and it shows "GPU Mode", you're set for fast generation. If it shows "CPU Mode", everything still works, just a bit slower.
Step 3 — Record your voice sample (1 minute)
Go to the Clone Voice page in the left sidebar. You have two options:
- Record live — click "Record Mic" and speak naturally for 25–30 seconds. Read a paragraph from a book or news article. Don't whisper. Don't shout. Aim for a normal speaking voice.
- Upload a file — click "Upload File" and pick a WAV, MP3, or M4A file of your voice. 15–30 seconds is the sweet spot.
Tips for a better clone:
- Quiet room. Background noise hurts the clone quality.
- One speaker only. No music, no overlapping voices.
- Natural delivery. Don't read in monotone, but don't act either.
- Vary the sentences a little — questions, statements, a long word or two — so the model captures your range.
Step 4 — Save the voice profile (10 seconds)
Give the profile a name (e.g. "My voice"), pick the primary language (e.g. English), and click Save Voice Profile. The clone is computed locally on your PC. The profile is now in your "My voices" list and can be reused as many times as you want.
Step 5 — Generate speech (30 seconds)
Switch to the Text to Speech page in the sidebar. Pick your saved voice from the dropdown. Type or paste any text. Click Generate.
On an RTX 3060 you'll get audio in 2–3 seconds per sentence. On CPU-only, expect 10–20 seconds per sentence. The result appears in the audio player below — you can play it, edit it in the built-in editor, apply EQ, or save it as WAV/MP3.
Optional — Polish the output with the built-in EQ
Generated voice sometimes sounds slightly "thin" or a bit harsh. Open the Audio Editor and switch to the Equaliser tab. Pick a preset:
- Natural — flat, default. Try first.
- Warm — boosts low-mids, softens harshness. Good for narration.
- Bright — boosts presence. Good for thin voices.
- Podcast — slight low-cut + presence boost. Good for podcasts/voiceovers.
- Phone — narrows to mid-range, sounds like an old phone call.
Full breakdown of each band is in the 7-band EQ guide.
Troubleshooting
"It says CPU mode but I have an RTX card." Update your NVIDIA driver to a current version (572.x or newer for CUDA 12.8 / 13.x). Restart the app. If it still shows CPU mode, open the Diagnose page (Ctrl+D) and check the per-package report — it'll tell you exactly what's missing.
"The cloned voice doesn't sound like me." Re-record with less background noise and a more varied 25–30 second sample. Whispered or low-volume samples produce muddy clones.
"Generation is very slow." Check Diagnose page for GPU mode. If you're on CPU, expect 10–20 seconds per sentence — that's normal. Either upgrade your GPU or use shorter text chunks.
"The app won't start." The first launch downloads ~2 GB. If your internet dropped mid-download, the model files will be incomplete. Delete %LOCALAPPDATA%\RBS Voice Cloner V2\models\ and relaunch — the app will re-download cleanly.
Important: only clone voices you have permission to clone
Voice cloning is powerful. Free voice cloning is more powerful still. Your own voice is fair game. Public-domain recordings are usually fine. Cloning a real person's voice without their consent for misleading content is illegal in many jurisdictions and unethical everywhere. The app doesn't enforce this for you — you're responsible for what you make.
Get RBS Voice Cloner V2 — Free
~2.0 GB · Windows 10/11 · 100% offline · Unlimited generation, no subscription.
⬇ Download Free