A friend mentioned they wanted a RTX 5090 to bettter run PersonaPlex. I had no idea what that was, so I looked it up — "NVIDIA PersonaPlex: Natural Conversational AI With Any Role and Voice."
Sounds pretty cool, the audio samples were compelling, and I just happened to have a 5090 sitting around looking for a real use case. Up until now, the most exciting thing I'd done with it was spin up gpt-oss:20B to write a scary story about my cats. Anyways here is how I got PersonaPlex running locally.
(Used Claude to summarize my notes below)
What is PersonaPlex?
PersonaPlex is NVIDIA's new full-duplex conversational AI that lets you customize both voice and persona while maintaining natural conversation dynamics — interruptions, backchanneling, and realistic turn-taking. It's built on Kyutai's Moshi architecture and represents a breakthrough: previous systems forced you to choose between natural conversation or customizable voice/personality. PersonaPlex delivers both.
What is Moshi vs PersonaPlex?
Moshi (by Kyutai) is the base model — a 7B parameter speech-text foundation model that introduced full-duplex conversation. It can listen and speak simultaneously, handle interruptions, and produce natural backchanneling ("uh-huh", "yeah"). However, Moshi has a fixed voice and personality.
PersonaPlex (by NVIDIA) is a fine-tuned version of Moshi that adds:
- Voice prompting — provide an audio sample to adopt vocal characteristics
- Text prompting — define persona/role via natural language
You get natural conversation dynamics + customizable character.
Prerequisites
- Windows 10/11 with WSL2
- NVIDIA GPU
- CUDA drivers installed (check with
nvidia-smi) - A Hugging Face account
Why WSL2?
We recommend WSL2 for everyone, not just RTX 50 series users:
- PyTorch nightly builds have better Linux support
- Triton (required for model compilation) works more reliably on Linux
- Avoids Windows-specific dependency issues with Rust, sentencepiece, etc.
Step 1: Install WSL2
From PowerShell:
wsl --install -d Ubuntu-22.04
It should ask for you username and password and drop you into the shell.
Step 2: Install Python 3.12
Ubuntu 22.04 doesn't have Python 3.12 in default repos, so add the deadsnakes PPA:
sudo apt update
sudo apt install software-properties-common -y
sudo add-apt-repository ppa:deadsnakes/ppa -y
sudo apt update
sudo apt install python3.12 python3.12-venv python3.12-dev git -y
Step 3: Install Build Tools
Triton needs a C compiler to build kernels:
sudo apt install build-essential -y
Step 4: Clone and Set Up Environment
git clone https://github.com/NVIDIA/personaplex
cd personaplex
python3.12 -m venv venv
source venv/bin/activate
Step 5: Install PyTorch Nightly
The stable PyTorch releases don't support RTX 50 series (sm_120) yet. Use the nightly with CUDA 13.0:
pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu130
For RTX 40 series or older, you can alternatively use:
pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu128
Verify your GPU architecture is supported:
python -c "import torch; print(torch.cuda.get_arch_list())"
For RTX 50 series, you should see sm_120 in the list.
Step 6: Install Triton and PersonaPlex
pip install triton
pip install moshi/
Step 7: Hugging Face Authentication
The model weights are gated. You need to:
- Create an account at huggingface.co
- Accept the model license at nvidia/personaplex-7b-v1
- Create an access token at huggingface.co/settings/tokens
Then set your token:
export HF_TOKEN="your_token_here"
Or login via CLI:
pip install huggingface_hub
huggingface-cli login
Step 8: Launch the Server
python -m moshi.server --ssl /tmp/personaplex_ssl
First run downloads ~17GB of model weights. Once loaded, you'll see:
Access the Web UI directly at https://localhost:8998
Open that URL in your Windows browser. You'll get a certificate warning (self-signed cert) — click through it.
Exposing to the Internet
Want to share access or use from another device? Use Cloudflare Tunnel (free, no signup for quick tunnels):
# Download cloudflared
curl -sSL -o cloudflared https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64
chmod +x cloudflared
# Create tunnel (--no-tls-verify because of self-signed cert)
./cloudflared tunnel --url https://localhost:8998 --no-tls-verify
You'll get a public URL like https://something.trycloudflare.com.
Using PersonaPlex
Voice Selection
PersonaPlex includes pre-packaged voice embeddings:
- Natural voices:
NATF0-3(female),NATM0-3(male) — more conversational - Variety voices:
VARF0-4(female),VARM0-4(male) — more diverse range
Persona Prompting
Assistant mode:
You are a wise and friendly teacher. Answer questions or provide advice in a clear and engaging way.
Customer service:
You work for Jerusalem Shakshuka which is a restaurant and your name is Owen Foster. Information: There are two shakshuka options: Classic (poached eggs, $9.50) and Spicy (scrambled eggs with jalapenos, $10.25).
Casual conversation:
You enjoy having a good conversation. Have a reflective conversation about career changes and feeling of home.
Creative scenarios:
You enjoy having a good conversation. Have a technical discussion about fixing a reactor core on a spaceship to Mars. You are an astronaut on a Mars mission. Your name is Alex.
Offline Testing
To test without a live microphone:
python -m moshi.offline \
--voice-prompt "NATM1.pt" \
--text-prompt "You are a wise and friendly teacher." \
--input-wav "assets/test/input_assistant.wav" \
--output-wav "output.wav"
Troubleshooting
"Torch not compiled with CUDA enabled"
You installed the CPU-only PyTorch. Reinstall with CUDA nightly:
pip uninstall torch -y
pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu130
"no kernel image is available for execution on the device" (sm_120)
Your PyTorch doesn't support your GPU architecture. For RTX 50 series, you need PyTorch nightly with cu130:
pip uninstall torch -y
pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu130
python -c "import torch; print(torch.cuda.get_arch_list())" # verify sm_120 is listed
"Failed to find C compiler" (Triton error)
Install build tools:
sudo apt install build-essential -y
"TritonMissing" error
Install Triton:
pip install triton
Cloudflare tunnel shows 502 Bad Gateway
The tunnel can't verify the self-signed SSL cert. Add --no-tls-verify:
./cloudflared tunnel --url https://localhost:8998 --no-tls-verify
Python 3.12 not found in apt
Add the deadsnakes PPA:
sudo add-apt-repository ppa:deadsnakes/ppa -y
sudo apt update
sudo apt install python3.12 python3.12-venv python3.12-dev -y
What About Mac?
The base Moshi model has MLX support and runs on Apple Silicon:
pip install moshi_mlx
python -m moshi_mlx.local_web -q 4
However, PersonaPlex's persona/voice conditioning features haven't been ported to MLX yet. There's an open GitHub issue tracking Mac support.