Multi-level contextual rnns with attention model for scene labeling
H Fan, X Mei, D Prokhorov… - IEEE Transactions on …, 2018 - ieeexplore.ieee.org
IEEE Transactions on Intelligent Transportation Systems, 2018•ieeexplore.ieee.org
Image context in image is crucial for improving scene labeling. While the existing methods
only exploit local context generated from a small surrounding area of an image patch or a
pixel, the long-range and global contextual information is often ignored. To handle this
issue, we propose a novel approach for scene labeling by multi-level contextual recurrent
neural networks (RNNs). We encode three kinds of contextual cues, viz., local context,
global context, and image topic context in structural RNNs to model long-range local and …
only exploit local context generated from a small surrounding area of an image patch or a
pixel, the long-range and global contextual information is often ignored. To handle this
issue, we propose a novel approach for scene labeling by multi-level contextual recurrent
neural networks (RNNs). We encode three kinds of contextual cues, viz., local context,
global context, and image topic context in structural RNNs to model long-range local and …
Image context in image is crucial for improving scene labeling. While the existing methods only exploit local context generated from a small surrounding area of an image patch or a pixel, the long-range and global contextual information is often ignored. To handle this issue, we propose a novel approach for scene labeling by multi-level contextual recurrent neural networks (RNNs). We encode three kinds of contextual cues, viz., local context, global context, and image topic context in structural RNNs to model long-range local and global dependencies in an image. In this way, our method is able to “see” the image in terms of both long-range local and holistic views, and make a more reliable inference for image labeling. Besides, we integrate the proposed contextual RNNs into hierarchical convolutional neural networks, and exploit dependence relationships at multiple levels to provide rich spatial and semantic information. Moreover, we adopt an attention model to effectively merge multiple levels and show that it outperforms average- or max-pooling fusion strategies. Extensive experiments demonstrate that the proposed approach achieves improved results on the CamVid, KITTI, SiftFlow, Stanford Background, and Cityscapes data sets.
ieeexplore.ieee.org