Local AI voice tools
for Claude Code.

Three Claude Code skills that chain together: transcribe a meeting, analyze it with full project context, then generate an audio briefing. Everything runs locally — no cloud APIs, no API keys, no recurring costs.

Get Started Setup Guide
/voice-to-text transcript /strategic-analysis analysis + voice text /text-to-voice briefing.mp3

/voice-to-text

Transcribe any audio or video file with automatic speaker identification. Handles meetings, interviews, podcasts, voice memos — any recording with one or more speakers.

  • Speaker diarization (who said what)
  • Supports m4a, mp3, wav, mp4, mkv, and more
  • Replace generic labels with real names
  • Output as txt, srt, vtt, or json
  • 30-min file transcribes in ~3-5 minutes

/strategic-analysis

The bridge between transcription and audio. Reads your transcript, gathers full project context (CLAUDE.md, docs, git history), and produces a strategic analysis with a TTS-ready voice version.

  • 4 modes: sales, consulting, competitive, debrief
  • Project-aware (reads CLAUDE.md, docs, git log)
  • Outputs analysis.md + analysis-voice.txt
  • Voice text written for natural spoken delivery
  • Works standalone or chained with the other skills

/text-to-voice

Convert any text or markdown file to an MP3. The skill handles markdown cleanup, acronym expansion, voice selection, and MP3 encoding automatically.

  • 50 voices (American & British English)
  • Generates audio in seconds (Kokoro engine)
  • Auto-converts markdown to speech-friendly text
  • Outputs MP3 via ffmpeg
  • Optional: Orpheus (emotion tags) or Coqui (voice cloning)
# Each skill works standalone too > /text-to-voice any-text-file.txt > /voice-to-text any-recording.m4a > /strategic-analysis any-document.md
01

Install the skills

Clone the repo and copy the skill folders into your Claude Code config directory. That's all the "installation" there is for the skills themselves.

git clone https://github.com/pengasuzie/ai-voice-kit.git cp -r ai-voice-kit/skills/text-to-voice ~/.claude/skills/ cp -r ai-voice-kit/skills/voice-to-text ~/.claude/skills/ cp -r ai-voice-kit/skills/strategic-analysis ~/.claude/skills/
02

Install Kokoro TTS

Kokoro is the default engine. Install the CLI with pipx, then download the model files (~350 MB). This is a one-time setup.

pipx install kokoro-tts mkdir -p ~/.local/share/kokoro curl -L -o ~/.local/share/kokoro/kokoro-v1.0.onnx \ https://github.com/thewh1teagle/kokoro-onnx/ releases/download/model-files-v1.0/ kokoro-v1.0.onnx curl -L -o ~/.local/share/kokoro/voices-v1.0.bin \ https://github.com/thewh1teagle/kokoro-onnx/ releases/download/model-files-v1.0/ voices-v1.0.bin
03

Install ffmpeg

Required for converting WAV output to MP3. If you're on a Mac, one command.

# macOS brew install ffmpeg # Ubuntu / Debian sudo apt install ffmpeg
04

Use it

Open Claude Code and type the slash command. That's it. The skill handles engine selection, voice picking, format conversion, and output reporting.

> /text-to-voice meeting-notes.md Reading meeting-notes.md... 847 words, ~4 min Converting markdown to speech text... Generating with Kokoro (voice: bf_lily)... Output: ├── meeting-notes.mp3 (4m 12s, 3.8 MB) └── Engine: Kokoro, Voice: bf_lily
bf_lily
British female (default)
af_heart
American female
am_adam
American male
bm_george
British male
af_bella
American female
af_sarah
American female
am_michael
American male
bf_emma
British female
af_nova
American female
am_eric
American male
bm_daniel
British male
bf_isabella
British female

Run kokoro-tts --help-voices for the full list of 50 voices.

Meeting to audio briefing

Record a client call on your phone. Transcribe it with speaker labels. Run a strategic analysis that pulls in your full project context. Then generate an audio briefing you can listen to on the drive home.

The strategic analysis skill reads your CLAUDE.md, docs, competitor files, and git history — so the output is grounded in everything you know, not just the transcript.

# Step 1: Transcribe > /voice-to-text client-call.m4a 2 speakers detected. 4,231 words. # Step 2: Analyze > /strategic-analysis client-call.txt Gathering project context... ├── analysis.md └── analysis-voice.txt # Step 3: Generate audio > /text-to-voice analysis-voice.txt ├── analysis-voice.mp3 (6m 18s) └── Engine: Kokoro, Voice: bf_lily

Pick the right tool

Kokoro is the default and handles 95% of use cases. But the skill also supports Orpheus (for emotional, natural-sounding speech with tags like <laugh>) and Coqui XTTS v2 (for cloning any voice from a 6-second WAV sample).

Just tell the skill which engine you want when it asks. Setup instructions for the optional engines are in the GitHub README.

Engine Speed Best For
Kokoro Seconds Daily use, bulk
Orpheus ~3.5x RT Emotion, prosody
Coqui XTTS ~1x RT Voice cloning