Eladio Caritos

"Our brains are for having ideas, not holding them."

This principle, from Jeff Su's C.O.R.E productivity system taught to thousands of Googlers, resonated with me immediately. The C.O.R.E system starts with Capture—immediately recording information before it's forgotten. But capture only works if it's truly frictionless.

After watching his explanation of how capture should be "a reflex, not a decision," I realized my current capture workflow had too much friction. So I built something better.

The Problem I Wanted to Solve

Ideas don't wait for convenient moments. They show up while running, driving, or carrying groceries. My previous capture methods all had friction:

Physical notebook: Forgotten at home, requires stopping to write
Phone notes app: Unlock phone, find app, type with thumbs
Email to myself: Clutters inbox, awkward to process later
Task management apps: Too many taps and required fields

The Apple Watch partially solved this—I can record voice memos hands-free in seconds. But voice memos alone just create a different problem: they pile up unprocessed in the Voice Memos app instead of living in my Obsidian vault where I actually work.

I needed the memos transcribed, tagged with metadata, and automatically filed in Obsidian.

The Solution: Automated Transcription Pipeline

I built a system that bridges the gap between Apple Watch capture and Obsidian organization:

Capture: Record voice memo on Apple Watch (2 seconds, hands-free)
Automatic sync: Voice memo syncs to Mac via iCloud
Scheduled processing: Cron job runs my Python script
AI transcription: OpenAI's Whisper converts audio to text
Smart filing: Saves to Obsidian with rich metadata
Cleanup: Deletes original voice memo

The entire process runs automatically. I capture the thought, and later it appears in my Obsidian vault, transcribed and ready to process.

Technical Implementation

The Core Script

I wrote a Python script using OpenAI's Whisper for transcription. Whisper runs locally (no API costs) and handles various audio formats and languages automatically.

Key features:

Batch processing: Processes all pending voice memos in one run
Multiple model sizes: Trade speed for accuracy (tiny/base/small/medium/large)
Flexible output formats: txt, srt, vtt, or json
Audio preservation: Optionally copies original audio alongside transcription
Safe cleanup: Only deletes originals after successful transcription

Metadata Generation

The script extracts timestamps from Voice Memo filenames (format: 20250505 072813-DADFB99E.m4a) and generates Obsidian frontmatter:

---
who:
  - "[[~eladio]]"
what:
  - "[[%voice]]"
  - "[[%memo]]"
when:
  - "[[@2025]]"
  - "[[@2025-05]]"
  - "[[@2025-05-05]]"
  - "[[@2025-W18]]"
  - "[[@2025-05-05 07:28:13]]"
where:
  - "[[+stony-brook]]"
  - "[[+new-york]]"
status:
  - "[[!archive]]"
visibility:
  - "[[!private]]"
---

This rich metadata means I'll be able to query voice memos by date, week, location, or type using Obsidian's Dataview queries.

The Automation

A simple shell script orchestrates the workflow:

#!/bin/bash
VOICE_MEMOS_DIR="/Users/eladio/Library/Group Containers/group.com.apple.VoiceMemos.shared/Recordings"
OUTPUT_DIR="$HOME/src/transcribe-voice-memos/transcriptions"

cd "$HOME/src/transcribe-voice-memos"
source venv/bin/activate

python transcribe_voice_memos.py "$VOICE_MEMOS_DIR" \
    --batch \
    --output "$OUTPUT_DIR" \
    --copy-audio \
    --delete-after

Running this via cron means transcriptions appear automatically without any manual work.

Why This Aligns with C.O.R.E

Jeff Su's system emphasizes that capture should be "a reflex, not a decision." This automation should achieve exactly that:

No Decision Required: I won't need to think about where to file it, how to categorize it, or whether it's "important enough." Just speak and move on.

Offload Immediately: The thought leaves my brain and enters my external system instantly. No mental carrying until I can "properly" write it down.

Separate Capture from Processing: I capture in the moment. I organize and review later during dedicated sessions at my desk. Each step happens when it's most effective.

Trust Through Automation: Because it's automated and reliable, I can trust it completely. This trust removes hesitation—I won't second-guess whether to use it.

The Technical Stack

For anyone wanting to build something similar:

Python 3.13 with virtual environment
OpenAI Whisper (open source, runs locally—no API costs)
PyTorch for machine learning backend
Shell scripts for orchestration
Cron for scheduling (could use macOS Launch Agents)

The complete project lives at ~/src/transcribe-voice-memos/ and is relatively straightforward to adapt for other note-taking systems.

Configuration Options

The script is flexible:

Model Selection (trading speed for accuracy):

tiny: Fastest, good for simple ideas
base: My default, solid balance
medium: Better for technical terms
large: Best accuracy, slower

Output Formats:

txt: Markdown with frontmatter (my choice)
srt: Subtitle format with timestamps
vtt: WebVTT format
json: Full structured data

Processing Modes:

Auto-delete originals (clean workflow)
Keep originals (redundant backup)
Copy audio files (my approach—transcription plus original for reference)

Ideas for Future Enhancement

I'm already thinking about improvements:

Intelligent Categorization: Use GPT to analyze memo content and automatically categorize (work, personal, ideas, todos) and route to appropriate Obsidian folders.

Immediate Processing: Integrate with iOS Shortcuts to trigger transcription right after recording instead of waiting for the next cron run.

Voice Commands: Parse commands embedded in memos ("File this as a project idea" or "Add to shopping list") and act on them.

Task Extraction: Automatically create Obsidian tasks when memos contain phrases like "TODO" or "Remember to..."

The Design Philosophy: Eliminate Friction

This project reinforced an important lesson: good productivity tools aren't about features—they're about removing friction.

The Apple Watch already existed. Obsidian already existed. Voice memos were already being created. Transcription technology was already available.

The friction was in the gaps between tools: manually transcribing audio, copying files, adding metadata, organizing notes.

By automating those gaps, the workflow becomes frictionless. And frictionless capture, as the C.O.R.E system teaches, is the foundation of effective productivity.

Getting Started

If you want to build something similar:

Set up Python environment: python -m venv venv
Install dependencies: pip install openai-whisper torch
Write a script to process voice memos from Apple's directory
Generate appropriate metadata/frontmatter for your system
Create automation (cron job or Launch Agent)
Test thoroughly before enabling auto-deletion

The setup takes a few hours, but creates a system that should pay dividends over time.

What's Next

I just built this system, so I can't yet report on real-world effectiveness. But the theory is sound:

Capture should be effortless
Processing should be separate from capture
Automation removes friction
Trust in the system enables reflexive use

I'll be testing this workflow over the coming weeks and will share results in a future post. If it works as designed, it should capture ideas that would otherwise be lost while keeping my Obsidian vault as the single source of truth.

The real test will be whether I naturally reach for voice memos when ideas strike, or if friction still creeps in. Time will tell.

The code for this project is available at ~/src/transcribe-voice-memos/. I'll write a follow-up post once I've used this system in real-world conditions and can share actual results rather than theoretical benefits.