The role focuses on the camera flow inside our clients mobile apps. We dont do single-photo verification - our models run over the live camera stream, every frame, in real time, on the users phone. Whatever the user sees, and feels while theyre framing the shot is the focus of the problem youd be working on.
The interesting problem: model output is probabilistic and noisy, the scene is different every time, and the user is operating in the real world - on foot, in the rain, gloves on, one hand free. Visual UI alone usually isnt enough. Haptics, symbolic cues (think game HUD), and visual feedback all need to play together, and at any given moment you have to decide what to surface, on which channel, in what order.