Few-Shot Fine-Grained Action Recognition via Bidirectional Attention and Contrastive Meta-Learning

08/15/2021
by   Jiahao Wang, et al.
7

Fine-grained action recognition is attracting increasing attention due to the emerging demand of specific action understanding in real-world applications, whereas the data of rare fine-grained categories is very limited. Therefore, we propose the few-shot fine-grained action recognition problem, aiming to recognize novel fine-grained actions with only few samples given for each class. Although progress has been made in coarse-grained actions, existing few-shot recognition methods encounter two issues handling fine-grained actions: the inability to capture subtle action details and the inadequacy in learning from data with low inter-class variance. To tackle the first issue, a human vision inspired bidirectional attention module (BAM) is proposed. Combining top-down task-driven signals with bottom-up salient stimuli, BAM captures subtle action details by accurately highlighting informative spatio-temporal regions. To address the second issue, we introduce contrastive meta-learning (CML). Compared with the widely adopted ProtoNet-based method, CML generates more discriminative video representations for low inter-class variance data, since it makes full use of potential contrastive pairs in each training episode. Furthermore, to fairly compare different models, we establish specific benchmark protocols on two large-scale fine-grained action recognition datasets. Extensive experiments show that our method consistently achieves state-of-the-art performance across evaluated tasks.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset
Success!
Error Icon An error occurred

Sign in with Google

×

Use your Google Account to sign in to DeepAI

×

Consider DeepAI Pro