Most ambient AI medical scribes process audio only, omitting clinically important visual details. We developed a vision-enabled AI scribe using Google’s Gemini model and Ray-Ban Meta smart glasses to document medication histories — a task requiring both audio and visual input. Ten clinical pharmacists video-recorded 110 simulated medication history interviews. Following iterative prompt engineering on 10 training recordings, the scribe was evaluated on 100 test recordings (2160 data points) across patient details and medication-specific fields. The vision-enabled scribe achieved 98% overall accuracy (2114/2,160 data points), ranging from 96% for patient details to 99% for dosing directions and indication. Video input significantly outperformed audio-only processing (98% vs 81%, P < 0.001), primarily through reduced omissions (10 vs 358 errors). Vision-enabled AI scribes substantially improved documentation accuracy for tasks requiring visual input, demonstrating potential to markedly reduce omission errors in clinical documentation.
Vision-Enabled AI scribes reduce omissions in clinical conversations: evidence from simulated medication histories – npj Digital Medicine

