Building a Frictionless Capture System: Voice Memos to Obsidian
November 15, 2025
"Our brains are for having ideas, not holding them."
This principle, from Jeff Su's C.O.R.E productivity system taught to thousands of Googlers, resonated with me immediately. The C.O.R.E system starts with Capture—immediately recording information before it's forgotten. But capture only works if it's truly frictionless.
After watching his explanation of how capture should be "a reflex, not a decision," I realized my current capture workflow had too much friction. So I built something better.
The Problem I Wanted to Solve
Ideas don't wait for convenient moments. They show up while running, driving, or carrying groceries. My previous capture methods all had friction:
- Physical notebook: Forgotten at home, requires stopping to write
- Phone notes app: Unlock phone, find app, type with thumbs
- Email to myself: Clutters inbox, awkward to process later
- Task management apps: Too many taps and required fields
The Apple Watch partially solved this—I can record voice memos hands-free in seconds. But voice memos alone just create a different problem: they pile up unprocessed in the Voice Memos app instead of living in my Obsidian vault where I actually work.
I needed the memos transcribed, tagged with metadata, and automatically filed in Obsidian.
The Solution: Automated Transcription Pipeline
I built a system that bridges the gap between Apple Watch capture and Obsidian organization:
- Capture: Record voice memo on Apple Watch (2 seconds, hands-free)
- Automatic sync: Voice memo syncs to Mac via iCloud
- Scheduled processing: Cron job runs my Python script
- AI transcription: OpenAI's Whisper converts audio to text
- Smart filing: Saves to Obsidian with rich metadata
- Cleanup: Deletes original voice memo
The entire process runs automatically. I capture the thought, and later it appears in my Obsidian vault, transcribed and ready to process.
Technical Implementation
The Core Script
I wrote a Python script using OpenAI's Whisper for transcription. Whisper runs locally (no API costs) and handles various audio formats and languages automatically.
Key features:
- Batch processing: Processes all pending voice memos in one run
- Multiple model sizes: Trade speed for accuracy (tiny/base/small/medium/large)
- Flexible output formats: txt, srt, vtt, or json
- Audio preservation: Optionally copies original audio alongside transcription
- Safe cleanup: Only deletes originals after successful transcription
Metadata Generation
The script extracts timestamps from Voice Memo filenames (format: 20250505 072813-DADFB99E.m4a) and generates Obsidian frontmatter:
---
who:
- "[[~eladio]]"
what:
- "[[%voice]]"
- "[[%memo]]"
when:
- "[[@2025]]"
- "[[@2025-05]]"
- "[[@2025-05-05]]"
- "[[@2025-W18]]"
- "[[@2025-05-05 07:28:13]]"
where:
- "[[+stony-brook]]"
- "[[+new-york]]"
status:
- "[[!archive]]"
visibility:
- "[[!private]]"
---
This rich metadata means I'll be able to query voice memos by date, week, location, or type using Obsidian's Dataview queries.
The Automation
A simple shell script orchestrates the workflow:
#!/bin/bash
VOICE_MEMOS_DIR="/Users/eladio/Library/Group Containers/group.com.apple.VoiceMemos.shared/Recordings"
OUTPUT_DIR="$HOME/src/transcribe-voice-memos/transcriptions"
cd "$HOME/src/transcribe-voice-memos"
source venv/bin/activate
python transcribe_voice_memos.py "$VOICE_MEMOS_DIR" \
--batch \
--output "$OUTPUT_DIR" \
--copy-audio \
--delete-after
Running this via cron means transcriptions appear automatically without any manual work.
Why This Aligns with C.O.R.E
Jeff Su's system emphasizes that capture should be "a reflex, not a decision." This automation should achieve exactly that:
No Decision Required: I won't need to think about where to file it, how to categorize it, or whether it's "important enough." Just speak and move on.
Offload Immediately: The thought leaves my brain and enters my external system instantly. No mental carrying until I can "properly" write it down.
Separate Capture from Processing: I capture in the moment. I organize and review later during dedicated sessions at my desk. Each step happens when it's most effective.
Trust Through Automation: Because it's automated and reliable, I can trust it completely. This trust removes hesitation—I won't second-guess whether to use it.
The Technical Stack
For anyone wanting to build something similar:
- Python 3.13 with virtual environment
- OpenAI Whisper (open source, runs locally—no API costs)
- PyTorch for machine learning backend
- Shell scripts for orchestration
- Cron for scheduling (could use macOS Launch Agents)
The complete project lives at ~/src/transcribe-voice-memos/ and is relatively straightforward to adapt for other note-taking systems.
Configuration Options
The script is flexible:
Model Selection (trading speed for accuracy):
tiny: Fastest, good for simple ideasbase: My default, solid balancemedium: Better for technical termslarge: Best accuracy, slower
Output Formats:
txt: Markdown with frontmatter (my choice)srt: Subtitle format with timestampsvtt: WebVTT formatjson: Full structured data
Processing Modes:
- Auto-delete originals (clean workflow)
- Keep originals (redundant backup)
- Copy audio files (my approach—transcription plus original for reference)
Ideas for Future Enhancement
I'm already thinking about improvements:
Intelligent Categorization: Use GPT to analyze memo content and automatically categorize (work, personal, ideas, todos) and route to appropriate Obsidian folders.
Immediate Processing: Integrate with iOS Shortcuts to trigger transcription right after recording instead of waiting for the next cron run.
Voice Commands: Parse commands embedded in memos ("File this as a project idea" or "Add to shopping list") and act on them.
Task Extraction: Automatically create Obsidian tasks when memos contain phrases like "TODO" or "Remember to..."
The Design Philosophy: Eliminate Friction
This project reinforced an important lesson: good productivity tools aren't about features—they're about removing friction.
The Apple Watch already existed. Obsidian already existed. Voice memos were already being created. Transcription technology was already available.
The friction was in the gaps between tools: manually transcribing audio, copying files, adding metadata, organizing notes.
By automating those gaps, the workflow becomes frictionless. And frictionless capture, as the C.O.R.E system teaches, is the foundation of effective productivity.
Getting Started
If you want to build something similar:
- Set up Python environment:
python -m venv venv - Install dependencies:
pip install openai-whisper torch - Write a script to process voice memos from Apple's directory
- Generate appropriate metadata/frontmatter for your system
- Create automation (cron job or Launch Agent)
- Test thoroughly before enabling auto-deletion
The setup takes a few hours, but creates a system that should pay dividends over time.
What's Next
I just built this system, so I can't yet report on real-world effectiveness. But the theory is sound:
- Capture should be effortless
- Processing should be separate from capture
- Automation removes friction
- Trust in the system enables reflexive use
I'll be testing this workflow over the coming weeks and will share results in a future post. If it works as designed, it should capture ideas that would otherwise be lost while keeping my Obsidian vault as the single source of truth.
The real test will be whether I naturally reach for voice memos when ideas strike, or if friction still creeps in. Time will tell.
The code for this project is available at ~/src/transcribe-voice-memos/. I'll write a follow-up post once I've used this system in real-world conditions and can share actual results rather than theoretical benefits.