Abstract:
Mid-level representations are used to map sets of local features into one global representation for a given media descriptor. In visual pattern recognition tasks, Bag-of-...Show MoreMetadata
Abstract:
Mid-level representations are used to map sets of local features into one global representation for a given media descriptor. In visual pattern recognition tasks, Bag-of-Words (BoW) is one popular strategy, among many methods available in literature, due mainly by the simplicity in concept and implementation. Despite the overall good results achieved by BoW in many tasks, the method is unstable in high dimensional feature space and quantization errors are usually ignored in the final representation. To cope with these problems, we propose a new pooling function based on feature points distribution around codewords. We propose to use the standard deviation associated with each codeword to measure attribution discrepancy and weight the impact that feature points will assume in the final representation. The main contribution of this article is the study of more discriminative representations, which amplify values of feature points close to codewords border regions. Experiments were conducted in human action classification task and results demonstrated that our pooling strategy has improved the classification rates in 25.6% for UCF Sports dataset and 21.4% for UCF 11 dataset, with respect to the original pooling function used in BoW.
Date of Conference: 17-20 October 2017
Date Added to IEEE Xplore: 07 November 2017
ISBN Information:
Electronic ISSN: 2377-5416