Converts web pages and articles into chapter-based podcasts using OpenAI TTS. Extracts chapters from H2 headings, generates MP3 with embedded chapter markers, and caches individual chapters for incremental regeneration. Powers the audio narration on this very website.
How It Works
- Parse HTML and extract chapter structure from H2 headings
- Generate speech for each chapter using OpenAI TTS
- Embed chapter markers in the output MP3
- Output a
chapters.jsonfile for web players (Plyr.js, etc.) - Cache chapters individually — only regenerate what changed
Usage
# Convert a local HTML file
page2pod convert index.html
# Convert from URL
page2pod convert https://example.com/article
# Force regenerate all chapters
page2pod convert index.html --force
# Regenerate specific chapters
page2pod convert index.html --chapters 2,5
# List chapters without generating
page2pod chapters index.html
# Choose a voice (alloy, echo, fable, onyx, nova, shimmer)
page2pod convert index.html --voice novaInstallation
pip install openai mutagen beautifulsoup4 requestsRequires an OPENAI_API_KEY environment variable.
Cache Structure
Chapters are cached at ~/.cache/page2pod/<page-id>/ with individual MP3 files per chapter. Only changed chapters are regenerated on subsequent runs.