Office Workflow

The post-session pipeline merges Emlid CSV exports with Qwen3-ASR-transcribed voice descriptions, aligning by timestamp.

Directory Structure

~/Sync/farm/emlid-exports/
├── emlid-merge.py          # Main voice↔GPS alignment script
├── whisper-transcribe.py      # MLX-based ASR (Apple Silicon GPU)
├── brownsville-property.py # Standalone QGIS project generator
├── setup-project.py        # QGIS Python Console version
├── brownsville-property.qgz # QGIS project file (QGIS now connects to PostGIS directly; GPKG retired)
├── sessions/               # CSV exports from Emlid Flow
├── voice-memos/            # JPR .m4a files + transcripts
│   └── YYYY-MM-DD/
│       ├── HH-MM-SS.m4a
│       └── transcripts/
│           └── HH-MM-SS.json   # Whisper large-v3 (MLX) output
└── processed/              # Merged output CSVs
    ├── all-points.csv      # Combined master file
    └── session-N-YYYY-MM-DD.csv

Processing Steps

1. Export CSV from Emlid Flow → copy to sessions/

From Emlid Flow on iPhone: tap the project → Export → CSV. AirDrop or share to the Mac Studio, then move to sessions/.

2. Copy JPR recordings from iCloud:

cp ~/Library/Mobile\ Documents/iCloud~com~openplanetsoftware~just-press-record/Documents/YYYY-MM-DD/*.m4a \
   ~/Sync/farm/emlid-exports/voice-memos/YYYY-MM-DD/

3. Transcribe each .m4a with Whisper large-v3 (MLX):

source ~/venvs/whisper-asr/bin/activate
cd ~/Sync/farm/emlid-exports
python3 whisper-transcribe.py voice-memos/YYYY-MM-DD/*.m4a

Outputs Whisper-compatible JSON to voice-memos/YYYY-MM-DD/transcripts/. Processes ~2 hours of audio in under 60 seconds on Apple Silicon.

(Switched from Qwen3-ASR to Whisper large-v3 via mlx-whisper on 2026-05-18 — better accuracy, anti-hallucination params, venv was renamed qwen-asr → whisper-asr across scripts/skills/memory.)

4. Run the merge:

cd ~/Sync/farm/emlid-exports
python3 emlid-merge.py sessions/<csv> voice-memos/YYYY-MM-DD/*.m4a --out processed

5. Review output in processed/session-N-YYYY-MM-DD.csv

emlid-merge.py Details

The merge script (~/Sync/farm/emlid-exports/emlid-merge.py) does the following:

Reads Emlid Flow CSV (columns: Name, Latitude, Longitude, Ellipsoidal height, Averaging start, etc.)
Groups points into sessions by 10-minute gaps in timestamps
For each JPR recording: parses start time from the filename (HH-MM-SS) + parent folder date (YYYY-MM-DD), converts to UTC
Loads transcript JSON, converts each segment’s relative offset to absolute UTC time
For each GPS point, finds the nearest voice segment within a configurable symmetric window (default ±20s)
Parses “feature name — description” from the segment text
Writes per-session CSVs with columns: point_id, timestamp_utc, lat, lon, elev_m, h_rms_m, solution, samples, feature_name, description, voice_offset_s, source_recording

CLI flags: --out OUTDIR, --window SEC (match window, default 20), --tz IANA_NAME (default: America/New_York)

JPR Timestamp Handling

Critical: JPR’s creation_time metadata = end of recording, NOT start. The script uses the filename (HH-MM-SS) as the authoritative recording start time and the parent folder name (YYYY-MM-DD) for the date. These are in local time (America/New_York) and get converted to UTC internally.

If your file isn’t from JPR or doesn’t follow the HH-MM-SS.m4a naming convention, the script will skip it with a warning.

all-points.csv Master File

The processed/all-points.csv file combines all sessions into a single CSV for loading into QGIS or ArcGIS Pro. Columns:

point_id, timestamp_utc, lat, lon, elev_m, h_rms_m, solution, samples,
code, feature_name, description, session

This file is the primary input for the QGIS project (brownsville-property.qgz) and for future PostGIS ingestion.

Edit Markdown files in /content/ · Auto-index regenerated every deploy · / or ⌘K to search