CAA

Yifan Ren, Yazhou Yao, Xing Xu, Huimin Lu, Fumin Shen

doi:10.1145/3474085.3475616

Temporal action detection aims to locate specific segments of action instances in an untrimmed video. Most existing approaches commonly extract the features of all candidate video segments and then classify them separately. However, they may neglect the underlying relationship among candidates unconsciously. In this paper, we propose a novel model termed Candidate-Aware Aggregation (CAA) to tackle this problem. In CAA, we design the Global Awareness (GA) module to exploit long-range relations among all candidates from a global perspective, which enhances the features of action instances. The GA module is then embedded into a multi-level hierarchical network named FENet, to aggregate local features in adjacent candidates to suppress background noise. As a result, the relationship among candidates is explicitly captured from both local and global perspectives, which ensures more accurate prediction results for the candidates. Extensive experiments conducted on two popular benchmarks ActivityNet-1.3 and THUMOS-14 demonstrate the superiority of CAA comparing to the state-of-the-art methods.

CAA

書誌事項

説明

収録刊行物

詳細情報詳細情報について

書き出し

問題の指摘

CAA

書誌事項

説明

収録刊行物

詳細情報 詳細情報について

書き出し

問題の指摘

詳細情報詳細情報について