Office Workflow

The post-session pipeline merges Emlid CSV exports with Qwen3-ASR-transcribed voice descriptions, aligning by timestamp.

Directory Structure
~/Sync/farm/emlid-exports/
β”œβ”€β”€ emlid-merge.py          # Main voice↔GPS alignment script
β”œβ”€β”€ whisper-transcribe.py      # MLX-based ASR (Apple Silicon GPU)
β”œβ”€β”€ brownsville-property.py # Standalone QGIS project generator
β”œβ”€β”€ setup-project.py        # QGIS Python Console version
β”œβ”€β”€ brownsville-property.qgz # QGIS project file (QGIS now connects to PostGIS directly; GPKG retired)
β”œβ”€β”€ sessions/               # CSV exports from Emlid Flow
β”œβ”€β”€ voice-memos/            # JPR .m4a files + transcripts
β”‚   └── YYYY-MM-DD/
β”‚       β”œβ”€β”€ HH-MM-SS.m4a
β”‚       └── transcripts/
β”‚           └── HH-MM-SS.json   # Whisper large-v3 (MLX) output
└── processed/              # Merged output CSVs
    β”œβ”€β”€ all-points.csv      # Combined master file
    └── session-N-YYYY-MM-DD.csv
Processing Steps

1. Export CSV from Emlid Flow β†’ copy to sessions/

From Emlid Flow on iPhone: tap the project β†’ Export β†’ CSV. AirDrop or share to the Mac Studio, then move to sessions/.

2. Copy JPR recordings from iCloud:

cp ~/Library/Mobile\ Documents/iCloud~com~openplanetsoftware~just-press-record/Documents/YYYY-MM-DD/*.m4a \
   ~/Sync/farm/emlid-exports/voice-memos/YYYY-MM-DD/

3. Transcribe each .m4a with Whisper large-v3 (MLX):

source ~/venvs/whisper-asr/bin/activate
cd ~/Sync/farm/emlid-exports
python3 whisper-transcribe.py voice-memos/YYYY-MM-DD/*.m4a

Outputs Whisper-compatible JSON to voice-memos/YYYY-MM-DD/transcripts/. Processes ~2 hours of audio in under 60 seconds on Apple Silicon.

(Switched from Qwen3-ASR to Whisper large-v3 via mlx-whisper on 2026-05-18 β€” better accuracy, anti-hallucination params, venv was renamed qwen-asr β†’ whisper-asr across scripts/skills/memory.)

4. Run the merge:

cd ~/Sync/farm/emlid-exports
python3 emlid-merge.py sessions/<csv> voice-memos/YYYY-MM-DD/*.m4a --out processed

5. Review output in processed/session-N-YYYY-MM-DD.csv

emlid-merge.py Details

The merge script (~/Sync/farm/emlid-exports/emlid-merge.py) does the following:

  • Reads Emlid Flow CSV (columns: Name, Latitude, Longitude, Ellipsoidal height, Averaging start, etc.)
  • Groups points into sessions by 10-minute gaps in timestamps
  • For each JPR recording: parses start time from the filename (HH-MM-SS) + parent folder date (YYYY-MM-DD), converts to UTC
  • Loads transcript JSON, converts each segment’s relative offset to absolute UTC time
  • For each GPS point, finds the nearest voice segment within a configurable symmetric window (default Β±20s)
  • Parses “feature name β€” description” from the segment text
  • Writes per-session CSVs with columns: point_id, timestamp_utc, lat, lon, elev_m, h_rms_m, solution, samples, feature_name, description, voice_offset_s, source_recording

CLI flags: --out OUTDIR, --window SEC (match window, default 20), --tz IANA_NAME (default: America/New_York)

JPR Timestamp Handling

Critical: JPR’s creation_time metadata = end of recording, NOT start. The script uses the filename (HH-MM-SS) as the authoritative recording start time and the parent folder name (YYYY-MM-DD) for the date. These are in local time (America/New_York) and get converted to UTC internally.

If your file isn’t from JPR or doesn’t follow the HH-MM-SS.m4a naming convention, the script will skip it with a warning.

all-points.csv Master File

The processed/all-points.csv file combines all sessions into a single CSV for loading into QGIS or ArcGIS Pro. Columns:

point_id, timestamp_utc, lat, lon, elev_m, h_rms_m, solution, samples,
code, feature_name, description, session

This file is the primary input for the QGIS project (brownsville-property.qgz) and for future PostGIS ingestion.