A local podcast monitoring and transcription system that:
- Monitors RSS feeds for new podcast episodes
- Downloads new episodes automatically
- Transcribes audio using either OpenAI Whisper API or local mlx-whisper
- Summarizes content using either OpenAI GPT-4 or local LLMs via Ollama
- Stores metadata in SQLite
- Generates daily feed summaries
- Ensure you have Python 3.x installed
- Install ffmpeg (required for audio processing):
brew install ffmpeg
- Install and set up Ollama (required only for local summarization):
# Install Ollama
brew install ollama
# Start Ollama server
ollama serve
# In a new terminal, pull the model
ollama pull qwen2.5:3b
# Verify the model is working
ollama run qwen2.5:3b "Hello, how are you?"
- Set up Python environment:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
- Configure environment variables (if using OpenAI):
cp .env.example .env
# Edit .env and add your OpenAI API key if using OpenAI services
The application can be configured through config.py
. Key settings include:
AUDIO_STORAGE_PATH
: Where to store downloaded podcasts (default:~/Podcasts
)TRANSCRIPT_STORAGE_PATH
: Where to store transcripts (default:~/Podcasts/Transcripts
)
PODCAST_FEEDS
: List of RSS feed URLs to monitorMAX_EPISODES_PER_FEED
: Maximum number of episodes to pull from each feed (default: 5)
TRANSCRIPTION_MODE
: Choose between "local" or "openai"- "local": Uses mlx-whisper locally (free, no API key needed)
- "openai": Uses OpenAI's Whisper API (requires API key)
WHISPER_MODEL
: Model to use for local transcription (default: "mlx-community/distil-whisper-large-v3")
SUMMARIZATION_MODE
: Choose between "local" or "openai"- "local": Uses Ollama locally (free, no API key needed)
- "openai": Uses OpenAI's GPT-4 (requires API key)
OPENAI_SUMMARY_MODEL
: Model to use for OpenAI summarizationOLLAMA_MODEL
: Model to use for local summarization (default: "qwen2.5:3b")OLLAMA_URL
: URL for Ollama server (default: "http://localhost:11434")
CHECK_INTERVAL_MINUTES
: How often to check feeds (default: 60)RETAIN_DAYS
: How many days of history to keep (default: 30)
- Configure your settings in
config.py
- If using local summarization, ensure Ollama is running:
# Start Ollama in a separate terminal if not already running
ollama serve
- Run the services:
To run the background service that monitors feeds and processes episodes:
python main.py
The background service will:
- Check configured RSS feeds
- Download new episodes
- Generate transcripts (using chosen transcription method)
- Create summaries (using chosen summarization method)
- Save all metadata to a local SQLite database
Audio files are stored in ~/Podcasts/
by default
Transcripts are stored in ~/Podcasts/Transcripts/
To access the content through a web interface:
python web_server.py
Then open your browser to http://localhost:8000
This project uses OpenLIT to track observability metrics. To enable, set the OTEL_EXPORTER_OTLP_ENDPOINT
in your .env file or set the environment variable in your shell.
export OTEL_EXPORTER_OTLP_ENDPOINT=http://127.0.0.1:4318
Clone the OpenLIT repo:
git clone [email protected]:openlit/openlit.git
Start Docker Compose:
docker compose up -d