Converts web pages and articles into chapter-based podcasts using OpenAI TTS. Extracts chapters from H2 headings, generates MP3 with embedded chapter markers, and caches individual chapters for incremental regeneration. Powers the audio narration on this very website.

How It Works

  • Parse HTML and extract chapter structure from H2 headings
  • Generate speech for each chapter using OpenAI TTS
  • Embed chapter markers in the output MP3
  • Output a chapters.json file for web players (Plyr.js, etc.)
  • Cache chapters individually — only regenerate what changed

Usage

# Convert a local HTML file
page2pod convert index.html

# Convert from URL
page2pod convert https://example.com/article

# Force regenerate all chapters
page2pod convert index.html --force

# Regenerate specific chapters
page2pod convert index.html --chapters 2,5

# List chapters without generating
page2pod chapters index.html

# Choose a voice (alloy, echo, fable, onyx, nova, shimmer)
page2pod convert index.html --voice nova

Installation

pip install openai mutagen beautifulsoup4 requests

Requires an OPENAI_API_KEY environment variable.

Cache Structure

Chapters are cached at ~/.cache/page2pod/<page-id>/ with individual MP3 files per chapter. Only changed chapters are regenerated on subsequent runs.