Runbooks Step-by-step procedures for things that break or need to happen on a schedule

When something breaks or needs to happen, the answer is usually in one of these. Each runbook has: symptom β†’ diagnosis steps β†’ fix β†’ verification β†’ follow-up notes. Copy-pasteable commands where useful.

If you can’t find what you need here, search the hub (top-right search box, or ⌘K) β€” relevant info is probably scattered across the homelab/, automation/, or claude/ sections.

When X breaks, look at…
Symptom Runbook
Doc-sync report is 300 bytes of “401 / socket closed” Fix doc-sync auth failure
4 AM briefing email didn’t arrive / morning-briefing.md is stale Daily briefing didn’t arrive
ha-mcp / farm services unreachable Farm network is down
Recordings from JPR not turning into diary entries or tasks Dictation not processing
Sonarr/Radarr/Lidarr key got exposed Rotate *arr API keys
MCP changes don’t show up in Claude Reload Claude Desktop / MCPs
Expected a cron-driven thing to run, didn’t see it Cron job didn’t fire
deploy-vps.sh aborted; site is on a stale tree Recover from a broken deploy
Pasted a secret somewhere I shouldn’t have Credential leaked in chat
~/Sync/ED/ not converging across machines Syncthing not converging
Adding a runbook

Create content/runbooks/<short-slug>/_index.md with frontmatter like:

---
title: "Short imperative title"
page_links:
  - { label: "Related service", url: "/homelab/whatever/" }
---

Then write four sections β€” Symptom, Diagnose, Fix, Verify β€” each wrapped in the theme’s section shortcode (open with section id="..." title="...", close with /section). Add the new runbook to the sidebar_sections list above and the symptom table.

The runbook section is also picked up by Pagefind search, so even if you forget where you filed it, the search will find it.

Credential leaked in chat (or anywhere it shouldn't be)
Treat the credential as compromised
Anything pasted into a Claude conversation, a chat thread, a screenshot, a public/semi-public doc, or a git repo β€” even briefly β€” should be treated as leaked. Editing the doc to remove the value does NOT unleak it. The only fix is rotation.
1. Triage (within 5 min)

Decide blast radius:

Risk Examples Rotate within
Critical β€” public-facing, full account access Anthropic API key, Cloudflare API token, root SSH keys, Gmail OAuth Immediately
High β€” service that can reach money / customer data / others’ systems Stripe key, Gotify token (if used for alerts that gate decisions), database master pwd Today
Medium β€” service that’s mostly homelab-internal Sonarr/Radarr/Lidarr API keys, internal app passwords, indexer credentials This week (logged in TASKS.md)
Low β€” read-only or already-public-equivalent Public RSS feeds, public bookmarks Note but don’t rush
2. Record it

Add an entry to ~/Sync/ED/TASKS.md Active β€” Security/Credentials section with the exact value so future-you knows what to rotate. Include the consumer map (where the credential is used).

Example (this happened on 2026-05-25 with the *arr keys):

- [ ] **Rotate Sonarr/Radarr/Lidarr API keys** β€” they were hardcoded in
  ~/Sync/ED/skills/arr-media-management/SKILL.md (Syncthing-replicated).
  Old values: d792444549..., b117993eb50..., 3dc17d20ca664...
  Consumers: Prowlarr (settings β†’ apps), Recyclarr (yaml), homelab-config (none β€” extracts at runtime via arr-briefing-data.py)
3. Rotate

Service-specific procedures live in dedicated runbooks where they’re complicated:

For simple cases: log in to the service, generate a new credential, save the new value to ~/Sync/ED/SECRETS.md, update consumers, test.

4. Restart anything holding the old credential

For containerized services: docker restart <name> after env var update. For Mac launchd: kickstart the job. For Claude Desktop: full quit + relaunch.

If the leak was into a git repo:

# Find every commit containing the value
git -C ~/Sync/ED log -p --all -S 'leaked-value-here' | head -30

# Remove from history (heavy β€” use only when necessary, force-pushes break clones)
# Prefer rotating + accepting that the historical value is exposed but inert.

For the homelab-config repo (private but synced), rotating the underlying credential is usually enough β€” historical exposure of a now-invalid key is not a real risk.

5. Clean up

Once consumers are updated and the new credential works:

  • Remove the rotation entry from TASKS.md
  • Update SECRETS.md with the new value + last-rotated date
  • If the leak was a class of mistake (hardcoded in a SKILL, committed in a config), add a defense:
    • Pre-commit hook to scan for sk-, <ApiKey>, etc.
    • Bundle behavioral rule against the pattern
    • Linter for the file type
Cron job didn't fire
Diagnose by surface

The homelab has four scheduling surfaces. The “didn’t fire” question depends on which one.

Surface Where it lives How to check
Mac launchd ~/Library/LaunchAgents/*.plist launchctl list | grep -i <name> shows last exit code; tail -50 ~/Library/Logs/<job>.log
Mac user cron crontab -l on Mac Studio tail -50 /tmp/cron-*.log if the job writes there; otherwise add MAILTO="" and re-run
CT100 cron pct exec 100 -- crontab -l Errors route to Gotify via cron-gotify-wrapper.sh (priority 5 β†’ Telegram). Check the telegram bot.
hpve cron crontab -l on pve as root Same wrapper as CT100 β€” errors β†’ Gotify
Cowork scheduled tasks Cowork Settings β†’ Scheduled Tasks lastRunAt timestamp on each task; audit log in ~/Library/Application Support/Claude/local-agent-mode-sessions/...
launchd didn't fire
# Show last exit code (column 1) and PID (column 2 β€” - means not running)
launchctl list | grep -i com.bee

# Manually kickstart (run now)
launchctl kickstart -k gui/$(id -u)/com.bee.<job-name>

# If kickstart errors with "Could not find specified service", the plist isn't loaded:
launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.bee.<job-name>.plist

# If the plist has a typo, log will say so
log show --predicate 'subsystem == "com.apple.xpc.launchd"' --info --last 1h | grep com.bee.<job-name>

Common launchd gotcha: the plist points at a script that doesn’t exist (e.g., com.bee.rebuild-mcp-venvs.plist references ~/scripts/rebuild-mcp-venvs.sh β€” make sure that file exists and is executable). The launchd job loads fine but every run silently fails.

cron didn't fire
# On CT100 or hpve β€” list crontab
crontab -l

# The cron-gotify-wrapper.sh captures stderr to Gotify. If you don't see an
# error, the job ran cleanly OR the wrapper isn't installed.
# Verify wrapping:
crontab -l | head -10  # should see /usr/local/bin/cron-gotify-wrapper.sh <cmd>

# Validate the cron expression
# mcp__cron-validator__cron_validate("0 4 * * *")
Cowork scheduled task didn't fire
# Find today's session for the task
python3 << 'PY'
import json, os, glob, datetime
base = os.path.expanduser('~/Library/Application Support/Claude/local-agent-mode-sessions')
today = datetime.date.today().isoformat()
task = 'daily-briefing'  # change as needed
for f in glob.glob(f'{base}/*/*/local_*.json'):
    try: d = json.load(open(f))
    except: continue
    if d.get('scheduledTaskId') != task: continue
    if datetime.datetime.fromtimestamp(d['createdAt']/1000).date().isoformat() != today: continue
    print(datetime.datetime.fromtimestamp(d['createdAt']/1000), '->', datetime.datetime.fromtimestamp(d['lastActivityAt']/1000), f)
PY

If no session for today appeared, the Cowork scheduler didn’t fire β€” check Cowork Settings β†’ Scheduled Tasks. If a session DID appear but the expected output is missing, read its audit.jsonl to see where it got stuck.

Fix

Most often the fix is one of:

  • Re-bootstrap the launchd plist after editing
  • Re-run manually to confirm the job works, then wait for next scheduled run
  • Update the Cowork task definition (Settings β†’ Scheduled Tasks β†’ Edit) if the cron expression is wrong

For broken behavior despite the job firing, see the specific runbooks (doc-sync auth fail, daily briefing not arriving, dictation not processing).

Daily briefing didn't arrive
Symptom

It’s after 4:30 AM and you don’t see:

  • Email from beedifferent5455@gmail.com with subject “Daily Briefing β€” …”
  • A current entry in ~/Sync/ED/morning-briefing.md (mtime should be today)
  • A Drafts note tagged “briefing”
Diagnose
# Did the scheduled task actually fire?
# (Check Cowork scheduled-tasks list; lastRunAt should be today ~04:01)

# Did the briefing write a file?
ls -la ~/Sync/ED/morning-briefing.md
# stale mtime = task failed mid-run

# Pull the audit log for today's 04:01 session
python3 << 'PY'
import json, os, glob, datetime
base = os.path.expanduser('~/Library/Application Support/Claude/local-agent-mode-sessions')
today = datetime.date.today().isoformat()
for f in glob.glob(f'{base}/*/*/local_*.json'):
    try: d = json.load(open(f))
    except: continue
    if d.get('scheduledTaskId') != 'daily-briefing': continue
    if datetime.datetime.fromtimestamp(d['createdAt']/1000).date().isoformat() == today:
        sess_dir = f.replace('.json','')
        print('Session:', sess_dir)
        print('  audit.jsonl:', os.path.getsize(sess_dir + '/audit.jsonl'), 'bytes')
PY

Read the last 20 events of that session’s audit.jsonl to see where it got stuck.

Common failure modes:

Failure Tell
*Shell-quoting hell on arr extraction Many retries with osascript β†’ ssh β†’ pct β†’ docker β†’ curl strings, never reaches Write step. Should NOT happen now β€” Section 1 uses ~/scripts/arr-briefing-data.py. If you see this, the SKILL.md was reverted.
MCP unavailable homelab-snapshot MCP errored / not loaded. Fallback is to Read ~/Sync/ED/.homelab-snapshot.json directly.
Rate limit / API error rate_limit_event records in audit, or HTTP errors. Wait or check anthropic.com status.
Session timed out / token exhausted Session ran 7+ minutes and stopped mid-thought before reaching the Write step.
Fix

If today’s run failed:

  1. Force a manual rerun β€” Cowork sidebar β†’ Scheduled Tasks β†’ daily-briefing β†’ Run now.
  2. While it runs, watch for the same failure pattern. If arr-briefing-data.py is the culprit, test it standalone: python3 ~/scripts/arr-briefing-data.py --hours 24.
  3. If ~/Sync/ED/.homelab-snapshot.json is missing/stale, the launchd job (com.bee.homelab-snapshot) didn’t run β€” kick it: launchctl kickstart -k gui/$(id -u)/com.bee.homelab-snapshot.
Verify

After the rerun:

ls -la ~/Sync/ED/morning-briefing.md     # mtime = today
ls -la ~/Sync/ED/todays-briefing.md       # mtime = today
cat ~/Sync/ED/morning-briefing.md | head -20
# inbox: subject "Daily Briefing β€” <today>"

The SKILL is hardened to write explicit “No dictation in the last 24 hours” and “No diary entry for this date” lines when those sources are empty β€” so even a no-content day produces a useful email, not silence.

Dictation not processing
Symptom

You recorded a JPR voice memo on Apple Watch / iPhone but:

  • It hasn’t appeared in ~/Sync/ED/dictation/processed/
  • ~/Sync/ED/daily-diary.md is stale
  • No tasks from dash commands made it to TASKS.md

The hourly process-dictation Cowork task runs at :04 past every hour β€” recordings should appear within ~70 minutes of capture.

Diagnose

1. Did iCloud sync the file from your watch/phone to Mac?

ls -la ~/Library/Mobile\ Documents/iCloud~com~openplanetsoftware~just-press-record/Documents/$(date +%Y-%m-%d)/

If the day folder is empty or the file is .something.icloud (placeholder), iCloud hasn’t downloaded it. Force the download:

brctl download ~/Library/Mobile\ Documents/iCloud~com~openplanetsoftware~just-press-record/Documents/$(date +%Y-%m-%d)/

The pipeline runs brctl download automatically before scanning, but only on day directories it can already see β€” a never-synced day won’t be scanned.

2. Did the pipeline script run?

tail -20 /tmp/dictation-run.log

If you see “No new dictation files found” right after your recording was made, JPR-on-watch didn’t sync in time. Wait 5-15 minutes and the next hourly run should catch it.

3. Was the recording skipped?

The script skips files < 50 KB (accidental taps). Confirm size:

ls -la ~/Library/Mobile\ Documents/iCloud~com~openplanetsoftware~just-press-record/Documents/$(date +%Y-%m-%d)/*.m4a

4. Was there an anomaly?

Recordings > 2h with < 5 min of detected speech trigger a Gotify alert. Check ~/Sync/ED/dictation/processed/ for an ## ⚠ ANOMALY entry β€” that means transcription ran but the audio was flagged as a left-running mic.

Fix

Force a manual run:

/bin/zsh ~/Sync/ED/dictation/process-dictation.sh

This handles iCloud download, transcription (Whisper large-v3 via mlx-whisper), chunking long recordings, and parsing dash commands. Watch the output for errors.

If transcription itself failed, check the venv:

ls -la ~/venvs/whisper-asr/bin/python3
~/venvs/whisper-asr/bin/python3 -c "import mlx_whisper; print(mlx_whisper.__version__)"

If the venv is broken, ~/scripts/rebuild-mcp-venvs.sh doesn’t cover whisper-asr β€” rebuild manually:

python3.12 -m venv ~/venvs/whisper-asr
source ~/venvs/whisper-asr/bin/activate
pip install mlx-whisper
Verify

After the manual run:

# A processed report should exist with today's date prefix
ls -lat ~/Sync/ED/dictation/processed/ | head -3

# daily-diary.md should reflect today's content if there was any narrative
ls -la ~/Sync/ED/daily-diary.md
head -20 ~/Sync/ED/daily-diary.md

Date rule: all artifacts use the recording’s start time from the JPR folder name (Documents/YYYY-MM-DD/HH-MM-SS.m4a), never date +%Y-%m-%d or file mtime. A recording started at 11:50 PM May 21 finishing at 12:30 AM May 22 belongs to May 21 even when picked up by the 01:04 AM May 22 cron.

Farm network is down
Symptom
  • Home Assistant unreachable (https://192.168.0.10:8123 from Mac times out)
  • ha-mcp shows Server transport closed unexpectedly in ~/Library/Logs/Claude/mcp-server-ha-mcp.log
  • fpve.netbird.cloud not pingable from home
  • MCP health monitor Gotify-alerts ha-mcp failure counts
Diagnose from home
# Can the home Proxmox reach the farm Proxmox?
ssh pve 'ping -c 3 -W 2 fpve.netbird.cloud'
# 100% packet loss = NetBird mesh is broken between hpve and fpve

# Is the NetBird mesh API up?
curl -s -o /dev/null -w 'HTTP %{http_code}\n' https://app.netbird.io

# Check NetBird peer status (requires netbird MCP loaded)
# mcp__netbird__list_peers β€” fpve should appear with status: online

If fpve.netbird.cloud is the only unreachable peer:

  • NetBird daemon on fpve is down or disconnected, OR
  • Farm’s Starlink/Omada lost internet, OR
  • fpve itself is powered off / kernel-panicked

You can’t diagnose from home if the host is unreachable. You need farm-LAN access.

Fix (farm-side)

When physically on the farm or when LAN reachability returns:

# 1. Verify fpve is up
ssh root@192.168.0.191 'uptime; netbird status'

# 2. If NetBird daemon is dead, restart it
ssh root@192.168.0.191 'systemctl restart netbird; netbird status'

# 3. If fpve had a reboot, the IPv6 sysctls aren't persistent β€” re-apply
ssh root@192.168.0.191 'sysctl -w net.ipv6.conf.vmbr0.accept_ra=2 net.ipv6.conf.vmbr0.autoconf=1 net.ipv6.conf.vmbr0.accept_ra_defrtr=1 net.ipv6.conf.vmbr0.accept_ra_pinfo=1'

# 4. Check HA is up
ssh root@192.168.0.191 'pct exec 100 -- ping -c 2 192.168.0.10'
# HA itself is a separate VM/CT; should respond independently of fpve uptime
Verify from home
# Mesh restored
ssh pve 'ping -c 3 -W 2 fpve.netbird.cloud'

# HA reachable
curl -s -o /dev/null -w 'HTTP %{http_code}\n' --max-time 5 http://192.168.0.10:8123

# Restart Claude Desktop so ha-mcp reconnects
osascript -e 'tell application "Claude" to quit'
sleep 2
open -a Claude

In a new Cowork session, smoke-test ha-mcp: ask it to list HA areas. Should return 13 areas (Barn, Garage, Kitchen, etc.).

Fix doc-sync auth failure
Symptom

Morning email arrives but ~/Sync/ED/.doc-sync-log/YYYY-MM-DD.md is ~300 bytes containing one of:

  • Failed to authenticate. API Error: 401 The socket connection was closed unexpectedly...
  • Failed to authenticate. API Error: 401 <html><head><title>502 Bad Gateway</title>...cloudflare</html>
  • The Gotify alert: ⚠️ Doc-Sync YYYY-MM-DD β€” AUTH FAILED

The CLI mis-reports the real cause as “401” regardless of the underlying issue. The most likely cause is a corrupted API key file, not an actual auth-server problem.

Diagnose
# Inspect the key file β€” look for any garbage prefix
head -c 25 ~/.config/anthropic-api-key
echo

# Check file size β€” a clean key is exactly 108 bytes (no trailing newline)
wc -c ~/.config/anthropic-api-key

If the file starts with anything other than sk-ant-api03-, it’s corrupted. The historical bug was a literal -n prefix from a manual echo -n "$KEY" > file in a shell where -n was printed instead of treated as a flag.

Test the key directly against the API:

KEY=$(cat ~/.config/anthropic-api-key)
curl -s -o /dev/null -w 'HTTP %{http_code}\n' \
    -H "x-api-key: $KEY" \
    -H 'anthropic-version: 2023-06-01' \
    https://api.anthropic.com/v1/models

HTTP 200 = key works. HTTP 401 = key is bad.

Fix

Rewrite the file cleanly with printf (which doesn’t have the -n ambiguity):

# Back up the broken version first
cp ~/.config/anthropic-api-key ~/.config/anthropic-api-key.bak.$(date +%Y%m%d-%H%M%S)

# Get the key from your password manager / wherever it lives, then:
printf '%s' 'sk-ant-api03-...' > ~/.config/anthropic-api-key
chmod 600 ~/.config/anthropic-api-key

# Verify
ls -la ~/.config/anthropic-api-key   # should show 108 bytes, mode 600
head -c 25 ~/.config/anthropic-api-key  # should start with sk-ant-api03-

Never use echo -n to write the key β€” different shells handle -n differently and some write it as a literal prefix.

Verify

Run doc-sync manually with yesterday’s date:

~/Sync/ED/skills/doc-sync/scripts/run.sh
tail -30 ~/Sync/ED/.doc-sync-log/.last-run.log

You should see Auth precheck OK (HTTP 200) in the log and the report should be 5–20 KB (not 300 bytes).

Why this is already hardened

As of 2026-05-25, run.sh has two defenses:

  1. Strip on load β€” sed -E 's/^-n[[:space:]]+//; s/[[:space:]]+$//' removes a stray -n prefix and any trailing whitespace before using the key.
  2. Step 0 fail-fast precheck β€” single /v1/models request with --max-time 15. If it returns non-200, writes the actual cause to the day’s report, Gotify-alerts at priority 8, and exits 1 instead of burning 5+ minutes on a 1 MB prompt.

So the failure mode going forward should be a clear error inside 15 seconds, not a silent stub at 3 AM.

Recover from a broken Bee Hub deploy
Symptom
  • ~/Library/Logs/bee-hub-deploy.log shows Hugo build FAILED (rc=1). Aborting deploy.
  • Live site at hub.edmd.me looks current (not stale)
  • A Gotify alert may have fired if Hugo’s stderr was wrapped (it isn’t currently β€” only validator failures alert)

The good news: the deploy script has strict mode (--panicOnWarning) and the exit-code check aborts on any Hugo error before rsyncing. So a broken local build doesn’t propagate to live. The live site stays on the last good tree until you fix it.

Find the error
cd ~/Sync/ED/homelab/bee_hub
/opt/homebrew/bin/hugo --panicOnWarning --printPathWarnings 2>&1 | tail -20

Hugo’s strict-mode errors are usually one of:

Error Likely cause
shortcode "section" must be closed or self-closed Missing close tag for a section shortcode (forgot the /section line)
failed to extract shortcode "<name>": shortcode "<name>" not found Typo in shortcode name
failed to render shortcode "<name>" Bad params or unclosed nested block
parse failed YAML frontmatter syntax error β€” usually a missing quote or bad indent
duplicate path warning Two pages compile to the same URL β€” check slug: overrides
template render error A custom layout/partial references something that doesn’t exist

The error message includes a file path and line number. Jump straight there.

Fix

Edit the file, fix the issue, run Hugo again locally before redeploying:

cd ~/Sync/ED/homelab/bee_hub
/opt/homebrew/bin/hugo --panicOnWarning --printPathWarnings 2>&1 | tail -5
# Want to see "Total in NNNms" and no ERROR lines

Once it builds clean, re-run the full deploy:

zsh ~/Sync/ED/homelab/bee_hub/deploy-vps.sh
tail -5 ~/Library/Logs/bee-hub-deploy.log

Should end with Deploy complete.

Emergency rollback

If you pushed a broken change before the strict-mode protection caught it (shouldn’t happen now), the live site lives on:

  • VPS (public): root@100.123.69.155:/var/www/bee-hub/
  • CT103 (internal): root@192.168.8.54:/var/www/bee-hub/

To roll back to a previous Hugo build:

# Move current tree aside, rebuild from a known-good git commit, redeploy
cd ~/Sync/ED/homelab/bee_hub
git stash             # set aside in-progress edits
git log --oneline -20 # find a good commit
git checkout <hash>
zsh deploy-vps.sh     # rebuilds + rsyncs the old tree
git checkout main     # come back to current
git stash pop         # restore edits

The deploy targets are rsync targets, not git checkouts on the remote β€” so “rollback” means re-deploying an older local build over the top.

Verify live state
curl -s -o /dev/null -w 'public: %{http_code} %{size_download} bytes\n' https://hub.edmd.me/
curl -s -o /dev/null -w 'internal: %{http_code} %{size_download} bytes\n' http://192.168.8.54/

# spot-check a known page
curl -s https://hub.edmd.me/runbooks/ | grep -c 'Runbooks'

Both should return 200 and a non-trivial size. The index regen log should also confirm: Wrote /Users/bee/Sync/ED/BEE_HUB_INDEX.md (322 pages, ...).

Reload Claude Desktop / MCPs
When to do this
  • Edited ~/Sync/ED/config/claude_desktop_config.json (added/removed/modified an MCP)
  • Edited an MCP server’s source code in ~/.mcp-servers/<name>/
  • Edited a SKILL.md and need it to appear in available_skills (also requires ~/scripts/sync-cowork-snapshot.sh first β€” see below)
  • A specific MCP shows Server transport closed unexpectedly in ~/Library/Logs/Claude/mcp-server-<name>.log
Procedure

Step 1 β€” quit fully (not just close the window):

Cmd-Q from the Claude Desktop menu bar

If you see “Claude Desktop is still running” indicators (tray icon, dock bounce), kill from terminal:

osascript -e 'tell application "Claude" to quit'
# or, harder:
pkill -f 'Claude.app/Contents/MacOS/Claude'

Step 2 β€” relaunch. Open /Applications/Claude.app or open -a Claude.

Step 3 β€” verify MCPs loaded. Open a Cowork session. The first system reminder of any session lists deferred MCP tools. Look for the MCP you expected.

If you only changed a SKILL, the change won’t appear in the new session unless you also:

~/scripts/sync-cowork-snapshot.sh

The Cowork session snapshot is keyed on stable UUIDs and is REUSED across sessions β€” Cmd-Q doesn’t rebuild it, plugin reinstall doesn’t rebuild it. Only the explicit rsync does.

Verify

For a specific MCP, do a one-tool call in a new Cowork session. e.g. for tana-local: mcp__tana-local__list_workspaces. For memory: mcp__memory__memory_stats.

For SKILL availability: open a session and type /skill β€” the new SKILL should appear in the dropdown.

Gotchas
  • Tana startup race: if you launch Claude Desktop before Tana is fully open, tana-local will ECONNREFUSED 127.0.0.1:8262 and bail. Launch Tana first, then Claude. The MCP doesn’t auto-reconnect β€” restart Claude Desktop after the race.
  • ha-mcp connection failures: check that NetBird has the 192.168.0.0/24 route enabled and that fpve.netbird.cloud is reachable. The MCP itself wires correctly; the typical failure is the upstream HA being unreachable.
  • MCP venvs on MacBook may be stale after a Syncthing pull. The com.bee.rebuild-mcp-venvs launchd watcher handles it on file change, but you can force it: launchctl kickstart -k gui/$(id -u)/com.bee.rebuild-mcp-venvs.
Rotate Sonarr / Radarr / Lidarr API keys
When to run this
  • A key leaked into chat, into a doc, or into the homelab-config git repo.
  • A consumer (briefing helper, Prowlarr, etc.) started rejecting auth.
  • Routine rotation.
Fix

The *arr API keys live inside each container’s /config/config.xml (<ApiKey>...</ApiKey>). Rotating = generate a new key, write it back, restart the container, update any consumer that hardcoded the old value.

Generate new keys and rotate one at a time (so you can verify between):

# Generate a 32-char hex key
NEW_KEY=$(openssl rand -hex 16)
echo "$NEW_KEY"

# SSH to pve, edit Sonarr config.xml inside CT100
ssh pve "pct exec 100 -- bash -c 'sed -i.bak \"s|<ApiKey>[^<]*</ApiKey>|<ApiKey>$NEW_KEY</ApiKey>|\" /var/lib/docker/volumes/sonarr_config/_data/config.xml || docker exec sonarr sed -i.bak \"s|<ApiKey>[^<]*</ApiKey>|<ApiKey>$NEW_KEY</ApiKey>|\" /config/config.xml'"

# Restart Sonarr
ssh pve "pct exec 100 -- docker restart sonarr"

# Wait for it to come back, then verify the new key works
sleep 10
ssh pve "pct exec 100 -- curl -s 'http://localhost:8989/api/v3/system/status?apikey=$NEW_KEY'" | head -c 200

Repeat for Radarr (port 7878) and Lidarr (port 8686).

Update consumers

The daily-briefing helper (~/scripts/arr-briefing-data.py) extracts keys at runtime from config.xml via SSH, so it doesn’t need updating after rotation β€” that’s the whole point of not hardcoding.

What does need updating:

Consumer Where to update
Prowlarr http://192.168.8.100:9696 β†’ Settings β†’ Apps β†’ Sonarr/Radarr/Lidarr β†’ API Key field
Recyclarr /opt/recyclarr/recyclarr.yml β€” set new API keys, restart container
Any hardcoded use grep -rln 'd792444549|b117993eb50|3dc17d20ca664' ~/Sync/ED ~/scripts β€” find leftovers from the May 2026 leak
Bee Hub docs None should hardcode β€” but search anyway: grep -rln '<ApiKey>' ~/Sync/ED/homelab/bee_hub/content
Verify

Run the briefing helper and confirm it pulls real data:

python3 ~/scripts/arr-briefing-data.py --hours 168 | head -30

Should return JSON with non-empty sonarr.imports, etc. If you get {"error": "api key: ..."}, the key didn’t actually rotate or the container hasn’t restarted.

Record
Update ~/Sync/ED/SECRETS.md with the new keys (or note that they’re extracted at runtime, depending on the entry). If this was a leak-driven rotation, remove the rotation entry from TASKS.md once consumers are updated.
Syncthing not converging
Symptom
  • File you saved on Mac Studio not appearing on MacBook (or vice versa)
  • Syncthing UI shows “Out of Sync” for a folder with a large file count behind it
  • New SKILL edits aren’t showing up on MacBook even after waiting
Diagnose

Open both UIs side-by-side:

  • Mac Studio: http://127.0.0.1:8384
  • Proxmox (hub): http://192.168.8.221:8384
  • MacBook: http://127.0.0.1:8384 on the MacBook itself, or via ssh -L 8385:127.0.0.1:8384 macbook from Studio

Check:

Indicator Meaning
Folder shows “Out of Sync” Real sync work pending; let it run (or kick it)
Folder shows “Up to Date” on both ends but file is missing .stignore is filtering the file β€” check
One device offline Bring it online or wait
Folder shows huge queue size on one end Probably blocked behind an excluded path that previously synced
`.stignore` check

~/Sync/ED/.stignore excludes large subtrees from syncing (life_archive/data/, homelab/paperless-ngx/, etc.). If a file you want isn’t replicating, it might be under an excluded path.

cat ~/Sync/ED/.stignore

# Confirm whether a specific path is excluded
syncthing cli --home="$HOME/Library/Application Support/Syncthing" check ignore ~/Sync/ED/relative/path/to/file
# OR, simpler: just see if the file's parent is in stignore

To add an exclusion (e.g., a new bloat source):

# Edit ~/Sync/ED/.stignore β€” add the relative path (no leading slash)
echo 'new/bloat/path' >> ~/Sync/ED/.stignore
# Syncthing watches .stignore and reloads automatically

To remove an exclusion: edit the file, save. Next scan will pick up the previously-excluded content.

Force a rescan

When Syncthing thinks everything’s in sync but a file is clearly missing, force a rescan on the source device:

# Find the folder ID in the Syncthing UI URL or via the API
SYNC_API="$(grep -oE '<apikey>[^<]+' ~/Library/Application\ Support/Syncthing/config.xml | head -1 | sed 's/<apikey>//')"
curl -s -X POST -H "X-API-Key: $SYNC_API" 'http://localhost:8384/rest/db/scan?folder=<folder-id>'

For the Studio’s claude-ed folder, the typical sequence after a heavy-edit session is:

# Trigger rescan, then wait a minute and check that Proxmox picked it up
launchctl kickstart -k gui/$(id -u)/com.beedifferent.syncthing  # restart Syncthing if its index is stuck
Conflict files

If both devices edited the same file while disconnected, Syncthing keeps both as conflict copies. Find them:

find ~/Sync/ED -name '*.sync-conflict-*' | head -20

Resolve: review each, pick the right version, delete the loser, let Syncthing replicate the winner.