Learning Multimodal Violence Detection under Weak Supervision
Not only look, but also listen: Learning multimodal violence detection under weak supervision
💡 We introduce a HL-Net to simultaneously capture long-range relations and
local distance relations, of which these two relations are based on similarity prior and proximity prior, respectively三个并行的branch捕捉视频片段和集成的特征之间的不同联系:
- holistic branch captures long-range dependencies using similarity prior,
- localized branch captures local positional relation using proximity prior,
- score branch dynamically captures the closeness of predicted score.