Huazhe Xu, Boyuan Chen, Yang Gao, Trevor Darrell: Zero-shot Policy Learning with Spatial Temporal Reward Decomposition on Contingency-aware Observation. ICRA 2021: 10786-10792