Xing Zhang, Zuxuan Wu, Yu-Gang Jiang: SAM: Modeling Scene, Object and Action With Semantics Attention Modules for Video Recognition. IEEE Trans. Multim. 24: 313-322 (2022)