How to Run NVIDIA PersonaPlex with an RTX 5090

A friend mentioned they wanted a RTX 5090 to bettter run PersonaPlex. I had no idea what that was, so I looked it up — "NVIDIA PersonaPlex: Natural Conversational AI With Any Role and Voice."

Sounds pretty cool, the audio samples were compelling, and I just happened to have a 5090 sitting around looking for a real use case. Up until now, the most exciting thing I'd done with it was spin up gpt-oss:20B to write a scary story about my cats. Anyways here is how I got PersonaPlex running locally.

(Used Claude to summarize my notes below)

What is PersonaPlex?

PersonaPlex is NVIDIA's new full-duplex conversational AI that lets you customize both voice and persona while maintaining natural conversation dynamics — interruptions, backchanneling, and realistic turn-taking. It's built on Kyutai's Moshi architecture and represents a breakthrough: previous systems forced you to choose between natural conversation or customizable voice/personality. PersonaPlex delivers both.

What is Moshi vs PersonaPlex?

Moshi (by Kyutai) is the base model — a 7B parameter speech-text foundation model that introduced full-duplex conversation. It can listen and speak simultaneously, handle interruptions, and produce natural backchanneling ("uh-huh", "yeah"). However, Moshi has a fixed voice and personality.

PersonaPlex (by NVIDIA) is a fine-tuned version of Moshi that adds:

  • Voice prompting — provide an audio sample to adopt vocal characteristics
  • Text prompting — define persona/role via natural language

You get natural conversation dynamics + customizable character.

Prerequisites

  • Windows 10/11 with WSL2
  • NVIDIA GPU
  • CUDA drivers installed (check with nvidia-smi)
  • A Hugging Face account

Why WSL2?

We recommend WSL2 for everyone, not just RTX 50 series users:

  • PyTorch nightly builds have better Linux support
  • Triton (required for model compilation) works more reliably on Linux
  • Avoids Windows-specific dependency issues with Rust, sentencepiece, etc.

Step 1: Install WSL2

From PowerShell:

wsl --install -d Ubuntu-22.04

It should ask for you username and password and drop you into the shell.

Step 2: Install Python 3.12

Ubuntu 22.04 doesn't have Python 3.12 in default repos, so add the deadsnakes PPA:

sudo apt update
sudo apt install software-properties-common -y
sudo add-apt-repository ppa:deadsnakes/ppa -y
sudo apt update
sudo apt install python3.12 python3.12-venv python3.12-dev git -y

Step 3: Install Build Tools

Triton needs a C compiler to build kernels:

sudo apt install build-essential -y

Step 4: Clone and Set Up Environment

git clone https://github.com/NVIDIA/personaplex
cd personaplex
python3.12 -m venv venv
source venv/bin/activate

Step 5: Install PyTorch Nightly

The stable PyTorch releases don't support RTX 50 series (sm_120) yet. Use the nightly with CUDA 13.0:

pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu130

For RTX 40 series or older, you can alternatively use:

pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu128

Verify your GPU architecture is supported:

python -c "import torch; print(torch.cuda.get_arch_list())"

For RTX 50 series, you should see sm_120 in the list.

Step 6: Install Triton and PersonaPlex

pip install triton
pip install moshi/

Step 7: Hugging Face Authentication

The model weights are gated. You need to:

  1. Create an account at huggingface.co
  2. Accept the model license at nvidia/personaplex-7b-v1
  3. Create an access token at huggingface.co/settings/tokens

Then set your token:

export HF_TOKEN="your_token_here"

Or login via CLI:

pip install huggingface_hub
huggingface-cli login

Step 8: Launch the Server

python -m moshi.server --ssl /tmp/personaplex_ssl

First run downloads ~17GB of model weights. Once loaded, you'll see:

Access the Web UI directly at https://localhost:8998

Open that URL in your Windows browser. You'll get a certificate warning (self-signed cert) — click through it.

Exposing to the Internet

Want to share access or use from another device? Use Cloudflare Tunnel (free, no signup for quick tunnels):

# Download cloudflared
curl -sSL -o cloudflared https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64
chmod +x cloudflared

# Create tunnel (--no-tls-verify because of self-signed cert)
./cloudflared tunnel --url https://localhost:8998 --no-tls-verify

You'll get a public URL like https://something.trycloudflare.com.

Using PersonaPlex

Voice Selection

PersonaPlex includes pre-packaged voice embeddings:

  • Natural voices: NATF0-3 (female), NATM0-3 (male) — more conversational
  • Variety voices: VARF0-4 (female), VARM0-4 (male) — more diverse range

Persona Prompting

Assistant mode:

You are a wise and friendly teacher. Answer questions or provide advice in a clear and engaging way.

Customer service:

You work for Jerusalem Shakshuka which is a restaurant and your name is Owen Foster. Information: There are two shakshuka options: Classic (poached eggs, $9.50) and Spicy (scrambled eggs with jalapenos, $10.25).

Casual conversation:

You enjoy having a good conversation. Have a reflective conversation about career changes and feeling of home.

Creative scenarios:

You enjoy having a good conversation. Have a technical discussion about fixing a reactor core on a spaceship to Mars. You are an astronaut on a Mars mission. Your name is Alex.

Offline Testing

To test without a live microphone:

python -m moshi.offline \
  --voice-prompt "NATM1.pt" \
  --text-prompt "You are a wise and friendly teacher." \
  --input-wav "assets/test/input_assistant.wav" \
  --output-wav "output.wav"

Troubleshooting

"Torch not compiled with CUDA enabled"

You installed the CPU-only PyTorch. Reinstall with CUDA nightly:

pip uninstall torch -y
pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu130

"no kernel image is available for execution on the device" (sm_120)

Your PyTorch doesn't support your GPU architecture. For RTX 50 series, you need PyTorch nightly with cu130:

pip uninstall torch -y
pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/cu130
python -c "import torch; print(torch.cuda.get_arch_list())"  # verify sm_120 is listed

"Failed to find C compiler" (Triton error)

Install build tools:

sudo apt install build-essential -y

"TritonMissing" error

Install Triton:

pip install triton

Cloudflare tunnel shows 502 Bad Gateway

The tunnel can't verify the self-signed SSL cert. Add --no-tls-verify:

./cloudflared tunnel --url https://localhost:8998 --no-tls-verify

Python 3.12 not found in apt

Add the deadsnakes PPA:

sudo add-apt-repository ppa:deadsnakes/ppa -y
sudo apt update
sudo apt install python3.12 python3.12-venv python3.12-dev -y

What About Mac?

The base Moshi model has MLX support and runs on Apple Silicon:

pip install moshi_mlx
python -m moshi_mlx.local_web -q 4

However, PersonaPlex's persona/voice conditioning features haven't been ported to MLX yet. There's an open GitHub issue tracking Mac support.