Learning from Observer Gaze: Zero-Shot Attention Prediction Oriented by Human-Object Interaction Recognition

Sun Yat-sen University
This homepage is being updated. Stay tuned.

Interactive-Gaze Dataset (IG)

To study human visual attention oriented by HOI, we collect the Interactive Gaze (IG), the first large-scale interaction-centric gaze fixation dataset. IG comprises 6,299 interaction scenarios across 740 interaction categories, 80 object categories, and 132 action categories. IG captures the visual attention of 32 human observers during the cognition of these interaction scenarios, resulting in 530,000 corresponding fixation points. IG holds substantial potential to bridge the domains of visual attention and action understanding, serving as a catalyst to jointly promote these two areas of study.

If you are interested in accessing the Interactive Gaze (IG) dataset, please apply here. We will review your application and respond as soon as possible.


Most existing attention prediction research focuses on salient instances like humans and objects. However, the more complex interaction-oriented attention, arising from the comprehension of interactions between instances by human observers, remains largely unexplored. This is equally crucial for advancing human-machine interaction and human-centered artificial intelligence. To bridge this gap, we first collect a novel gaze fixation dataset named IG, comprising 530,000 fixation points across 740 diverse interaction categories, capturing visual attention during human observers’ cognitive processes of interactions. Subsequently, we introduce the zero-shot interaction-oriented attention prediction task (ZeroIA), which challenges models to predict visual cues for interactions not encountered during training. Thirdly, we present the Interactive Attention model (IA), designed to emulate human observers’ cognitive processes to tackle the ZeroIA problem. Extensive experiments demonstrate that the proposed IA outperforms other state-of-the-art approaches in both ZeroIA and fully supervised settings. Lastly, we endeavor to apply interaction-oriented attention to the interaction recognition task itself. Further experimental results demonstrate the promising potential to enhance the performance and interpretability of existing state-of-the-art HOI models by incorporating real human attention data from IG and attention labels generated by IA.


        title={Learning from Observer Gaze: Zero-Shot Attention Prediction Oriented by Human-Object Interaction Recognition},
        author={Zhou, Yuchen and Liu, Linkai and Gou, Chao},
        booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},

Visitor Map