Abstract
How would one describe an image? Interesting? Pleasant? Aesthetic? A number of studies have classified images with respect to these attributes. A common approach is to link lower level image features with higher level properties, and train a computational model to perform classification using human-annotated ground truth. Although these studies generate algorithms with reasonable prediction performance, they provide few insights into why and how the algorithms work. The current study focuses on how multiple visual factors affect human perception of digital images. We extend an existing dataset with quantitative measures for human perception of 31 image attributes under 6 different viewing conditions: images that are intact, inverted, grayscale, inverted and grayscale, and images showing mainly low- or high-spatial frequency information. Statistical analyses indicate varying importance of holistic cues, color information, semantics, and saliency on different types of attributes. Building on these insights we build an empirical model of human image perception. Motivated by the empirical model, we designed computational models that predict high-level image attributes. Extensive experiments demonstrate that understanding human visual perception helps create better computational models.










Similar content being viewed by others
References
Sun R, Lian Z, Tang Y, Xiao J. Aesthetic visual quality evaluation of chinese handwritings. In: Twenty-Fourth International Joint Conference on Artificial Intelligence. 2015.
Majumdar A, Krishnan P, Jawahar C. Visual aesthetic analysis for handwritten document images. In: 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), IEEE. 2016; p. 423–428.
Adak, C., Chaudhuri, B.B., Blumenstein, M.: Legibility and aesthetic analysis of handwriting. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). Volume 1., IEEE (2017) 175–182.
Liu Y, Gu Z, Ko TH. Predicting media interestingness via biased discriminant embedding and supervised manifold regression. In: MediaEval. 2017.
Marquant G, Demarty CH, Chamaret C, Sirot J, Chevallier L. Interestingness prediction & its application to immersive content. In: 2018 International Conference on content-based multimedia indexing (CBMI), IEEE. 2018; p. 1–6.
De Heering A, Houthuys S, Rossion B. Holistic face processing is mature at 4 years of age: evidence from the composite face effect. J Exp Child Psychol. 2007;96(1):57–70.
Ke Y, Tang X, Jing F. The design of high-level features for photo quality assessment. In: CVPR. Volume 1., IEEE. 2006; p. 419–426.
Roy H, Yamasaki T, Hashimoto T. Predicting image aesthetics using objects in the scene. In: Proceedings of the 2018 International Joint Workshop on multimedia artworks analysis and attractiveness computing in multimedia, ACM. 2018; p. 14–19.
Machajdik J, Hanbury A. Affective image classification using features inspired by psychology and art theory. In: ACM Multimedia, ACM. 2010; p. 83–92.
Borth D, Ji R, Chen T, Breuel T, Chang SF. Large-scale visual sentiment ontology and detectors using adjective noun pairs. In: ACM Multimedia. 2013; p. 223–232.
Song K, Yao T, Ling Q, Mei T. Boosting image sentiment analysis with visual attention. Neurocomputing. 2018;312:218–28.
Khosla A, Raju AS, Torralba A, Oliva A. Understanding and predicting image memorability at a large scale. In: ICCV. 2015.
Jing P, Su Y, Nie L, Gu H, Liu J, Wang M. A framework of joint low-rank and sparse regression for image memorability prediction. In: IEEE Transactions on Circuits and Systems for Video Technology. 2018.
Zhu JY, Kr¨ahenb¨uhl P, Shechtman E, Efros. Learning a discriminative model for the perception of realism in composite images. arXiv preprint arXiv:1510.00477 2015.
Jou B, Chen T, Pappas N, Redi M, Topkara M, Chang SF. Visual affect around the world: A large-scale multilingual visual sentiment ontology. In: Proceedings of the 23rd Annual ACM Conference on Multimedia Conference, ACM. 2015; p. 159–168.
Sergent J. An investigation into component and configural processes underlying face perception. Br J Psychol. 1984;75(2):221–42.
Schwaninger A, Lobmaier JS, Wallraven C, Collishaw S. Two routes to face perception: evidence from psychophysics and computational modeling. Cognit Sci. 2009;33(8):1413–40.
Tan C. Towards a unified account of face (and maybe object) processing. PhD thesis, Massachusetts Institute of Technology (2012).
Maffei L, Fiorentini A. The visual cortex as a spatial frequency analyser. Vis Res. 1973;13(7):1255–67.
De Valois RL, Albrecht DG, Thorell LG. Spatial frequency selectivity of cells in macaque visual cortex. Vis Res. 1982;22(5):545–59.
Beck J, Sutter A, Ivry R. Spatial frequency channels and perceptual grouping in texture segregation. Comput Vis Graphics Image Process. 1987;37(2):299–325.
Campbell F, Maffei L. The influence of spatial frequency and contrast on the perception of moving patterns. Vis Res. 1981;21(5):713–21.
Moore RS, Stammerjohan CA, Coulter RA. Banner advertiser-web site context congruity and color effects on attention and attitudes. J Advers. 2005;34(2):71–84.
Li X. The application and effect analysis of colour in new media advertisement. In: 7th International Conference on management, education, information and control (MEICI 2017), Atlantis Press. 2017.
Isola P, Xiao J, Parikh D, Torralba A, Oliva. What makes a photograph memorable? Pattern analysis and machine intelligence. IEEE Trans. 2014;36(7):1469–82.
Datta, R., Li, J., Wang, J.Z.: Algorithmic inferencing of aesthetics and emotion in natural images: An exposition. In: ICIP, IEEE (2008) 105–108.
Wu Y, Bauckhage C, Thurau C: The good, the bad, and the ugly: Predicting aesthetic image labels. In: Pattern Recognition (ICPR), 2010 20th International Conference on, IEEE. 2010; p. 1586–1589.
Gygli M, Grabner H, Riemenschneider H, Nater F, Gool LV. The interestingness of images. In: ICCV, IEEE. 2013; p. 1633–1640.
Khosla A, Das Sarma A, Hamid R. What makes an image popular? In: Proceedings of the 23rd international conference on World wide web, International World Wide Web Conferences Steering Committee. 2014; p. 867–876.
Lalonde J, Efros A. Using color compatibility for assessing image realism. In: ICCV. 2007.
Fan S, Ng TT, Herberg JS, Koenig BL, Tan CYC, Wang R. An automated estimator of image visual realism based on human cognition. In: CVPR, IEEE. 2014; p. 4201–4208.
Lu X, Suryanarayan P, Adams Jr, RB, Li J, Newman MG, Wang JZ. On shape and the computability of emotions. In: Proceedings of the 20th ACM international conference on Multimedia, ACM. 2012; p. 229–238.
Yang J, She D, Lai YK, Rosin PL, Yang MH. Weakly supervised coupled networks for visual sentiment analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018; p. 7584–7592.
Dubey, R., Peterson, J., Khosla, A., Yang, M.H., Ghanem.: What makes an object memorable? In: Proceedings of the IEEE International Conference on Computer Vision. (2015) 1089–1097.
Chen T, Borth D, Darrell T, Chang SF. Deepsentibank: visual sentiment concept classification with deep convolutional neural networks. arXiv preprint arXiv:1410.8586 2014.
Attneave F. Some informational aspects of visual perception. Psychol Rev. 1954;61(3):183.
Gordon IE. Theories of visual perception. Hove: Psychology Press; 2004.
Rhodes G. The evolutionary psychology of facial beauty. Annu Rev Psychol. 2006;57:199–226.
Wagemans J, Elder JH, Kubovy M, Palmer SE, Peterson MA, Singh M, von der Heydt R. A century of gestalt psychology in visual perception: I. perceptual grouping and figure–ground organization. Psychol Bull. 2012;138(6):1172.
Bruce V, Young AW. Face perception. Hove: Psychology Press; 2012.
Tanaka J, Gauthier I. Expertise in object and face recognition. Psychol Learn Motiv. 1997;36:83–125.
Oliva A, Torralba A. Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis. 2001;42(3):145–75.
Peterson MA, Rhodes G. Perception of faces, objects, and scenes: Analytic and holistic processes. Oxford: Oxford University Press; 2003.
Rhodes G, Byatt G, Michie PT, Puce A. Is the fusiform face area specialized for faces, individuation, or expert individuation? J Cognit Neurosci. 2004;16(2):189–203.
Daugman JG. Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. JOSA A. 1985;2(7):1160–9.
DeValois RL, DeValois KK. Spatial vision. Oxford: Oxford University Press; 1990.
Harris CS. Visual coding and adaptability. Hove: Psychology Press; 2014.
Tamura H, Mori S, Yamawaki T. Textural features corresponding to visual perception. IEEE Transa Syst Man Cybern. 1978;8(6):460–73.
Watt R. Scanning from coarse to fine spatial scales in the human visual system after the onset of a stimulus. JOSA A. 1987;4(10):2006–21.
Bar M, Kassam KS, Ghuman AS, Boshyan J, Schmid AM, Dale AM, H¨am¨al¨ainen MS, Marinkovic K, Schacter DL, Rosen BR, et al. Top-down facilitation of visual recognition. Proc Natl Acad Sci USA. 2006;103(2):449–54.
Hussein A, Boix X, Poggio T. Training neural networks for object recognition using blurred images. In: APS march meeting abstracts. Volume 2019. 2019; p. G70.012. https://ui.adsabs.harvard.edu/abs/2019APS..MARG70012H.
Judd T, Durand F, Torralba A. Fixations on low-resolution images. J Vis. 2011;11(4):14–14.
R¨ohrbein F, Goddard P, Schneider M, James G, Guo K. How does image noise affect actual and predicted human gaze allocation in assessing image quality? Vis Res. 2015;112:11–25.
Posner MI, Petersen SE. The attention system of the human brain. Technical report, DTIC Document. 1989
Chun MM. Contextual cueing of visual attention. Trends Cognit Sci. 2000;4(5):170–8.
Lang PJ, Bradley MM. The international affective picture system (iaps) in the study of emotion and attention. In: Handbook of emotion elicitation and assessment, volume 29. New York, NY: Oxford University Press; 2007.
Fan S, Shen Z, Jiang M, Koenig BL, Xu J, Kankanhalli MS, Zhao Q. Emotional attention: A study of image sentiment and visual attention. In: Proceedings of the IEEE Conference on computer vision and pattern recognition. 2018; p. 7521–7531.
Cordel M, Fan S, Shen Z, Kankanhalli MS. Emotion-aware human attention. In: Proceedings of the IEEE Conference on computer vision and pattern recognition. 2019.
Wong LK, Low KL. Saliency-enhanced image aesthetics class prediction. In: 2009 16th IEEE International Conference on image processing (ICIP), IEEE. 2009; p. 997–1000.
Khosla A, Xiao J, Torralba A, Oliva A. Memorability of image regions. In: Advances in neural information processing systems. 2012; p. 296–304.
Mancas M, Le Meur O. Memorability of natural scenes: the role of attention. In: 2013 IEEE International Conference on Image Processing, IEEE. 2013; p. 196–200.
Fajtl J, Argyriou V, Monekosso D, Remagnino P. Amnet: memorability estimation with attention. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018; p. 6363–6372.
Paolacci G, Chandler J, Ipeirotis P. Running experiments on amazon mechanical turk. Judgm Decis Making. 2010;5(5):411–9.
Fan S, Jiang M, Shen Z, Koenig BL, Kankanhalli MS, Zhao Q. The role of visual attention in sentiment prediction. In: Proceedings of the 25th ACM international conference on Multimedia, ACM, 2017; p. 217–225.
Huang X, Shen C, Boix X, Zhao Q. Salicon: reducing the semantic gap in saliency prediction by adapting deep neural networks. In: The IEEE International Conference on Computer Vision (ICCV). 2015.
Russell BC, Torralba A, Murphy KP, Freeman WT. Labelme: a database and web-based tool for image annotation. Int J Comput Vis. 2008;77(1–3):157–73.
Young A, Hellawell D, Hay D. Configurational information in face perception. Perception. 1987;16(6):747–59.
Goffaux V, Rossion B. Faces are” spatial”–holistic face perception is supported by low spatial frequencies. J Exp Psychol Hum Percept Perform. 2006;32(4):1023.
Oliva A, Torralba A, Schyns PG. Hybrid images. In: ACM Transactions on Graphics (TOG), volume 25, ACM. 2006; p. 527–532.
Schyns PG, Oliva A. Dr. angry and Mr. smile: When categorization flexibly modifies the perception of faces in rapid visual presentations. Cognition. 1999;69(3):243–65.
Allahbakhsh M, Benatallah B, Ignjatovic A, Motahari-Nezhad HR, Bertino E, Dustdar S. Quality control in crowdsourcing systems: issues and directions. IEEE Internet Comput. 2013;17(2):76–81.
Ma X, Hancock JT, Mingjie KL, Naaman M. Self-disclosure and perceived trustworthiness of airbnb host profiles. In: CSCW. 2017; p. 2397–2409.
Rodríguez-Pardo C, Bilen H. Personalised aesthetics with residual adapters. In: Morales A, Fierrez J, Sánchez JS, Ribeiro B, editors. Pattern recognition and image analysis. Cham: Springer. 2019; p. 508–520. ISBN 978-3-030-31332-6.
Pardo A, Jovanovic J, Dawson S, Gaˇsevi´c D, Mirriahi N. Using learning analytics to scale the provision of personalised feedback. Br J Educ Technol. 2019;50(1):128–38.
Kreft IG, Kreft I, de Leeuw J. Introducing multilevel modeling. Newcastle upon Tyne: Sage; 1998.
Weiss NA, Weiss CA. Introductory statistics. London: Pearson Education USA; 2012.
Valdez P, Mehrabian A. Effects of color on emotions. J Exp Psychol Gen. 1994;123(4):394.
Sokolova MV, Fern´andez-Caballero A, Ros L, Latorre JM, Serrano JP. Evaluation of color preference for emotion regulation. In: Artificial computation in biology and medicine. Springer. 2015; p. 479–487.
Datta R, Joshi D, Li J, Wang JZ. Studying aesthetics in photographic images using a computational approach. In: European conference on computer vision. Berlin, Heidelberg: Springer; 2006. p. 288–301.
Moshagen M, Thielsch MT. Facts of visual aesthetics. Int J Hum Comput Stud. 2010;68(10):689–709.
Valentine T. A unified account of the effects of distinctiveness, inversion, and race in face recognition. Q J Exp Psychol Sect A. 1991;43(2):161–204.
Farah MJ, Tanaka JW, Drain HM. What causes the face inversion effect? J Exp Psychol Hum Percept Perform. 1995;21(3):628.
Delplanque S, Ndiaye K, Scherer K, Grandjean D. Spatial frequencies or emotional effects?: a systematic measure of spatial frequencies for iaps pictures by a discrete wavelet analysis. J Neurosci Methods. 2007;165(1):144–50.
Lang PJ, Bradley MM, Cuthbert BN. Emotion, attention, and the startle reflex. Psychol Rev. 1990;97(3):377.
Wells A, Matthews G. Attention and emotion. London: LEA; 1994.
Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell. 1998;11:1254–9.
Riesenhuber M, Poggio T. Hierarchical models of object recognition in cortex. Nat Neurosci. 1999;2(11):1019–25.
Rossion B, Gauthier I. How does the brain process upright and inverted faces? Behav Cognit Neurosci Rev. 2002;1(1):63–75.
Gomes CF, Brainerd CJ, Stein LM. Effects of emotional valence and arousal on recollective and non recollective recall. J Exp Psychol Learn Mem Cognit. 2013;39(3):663.
Poggio T, Girosi F. Networks for approximation and learning. Proc IEEE. 1990;78(9):1481–97.
Gauthier I, Tarr M, et al. Becoming a “greeble” expert: exploring mechanisms for face recognition. Vis Res. 1997;37(12):1673–82.
Wong YK, Folstein JR, Gauthier I. The nature of experience determines object representations in the visual system. J Exp Psychol Gener. 2012;141(4):682.
Cox D, Pinto N. Beyond simple features: a large-scale feature search approach to unconstrained face recognition. In: Automatic Face & Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on, IEEE, 2011, pp 8–15.
Isola P, Xiao J, Torralba A, Oliva A. What makes an image memorable? In: Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, IEEE, 2011, pp 145–152.
Xiao J, Hays J, Ehinger KA, Oliva A, Torralba A. Sun database: Large-scale scene recognition from abbey to zoo. In: Computer vision and pattern recognition (CVPR), 2010 IEEE conference on, IEEE, 2010, pp 3485–3492.
Oliva A, Torralba A. Building the gist of a scene: the role of global image features in recognition. Prog Brain Res. 2006;155:23–36.
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, volume 1, IEEE, 2005; pp 886–893.
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D. Object detection with discriminatively trained part-based models. TPAMI. 2010;32(9):1627–45.
Li LJ, Su H, Fei-Fei L, Xing EP. Object bank: a high-level image representation for scene classification & semantic feature sparsification. In: Advances in neural information processing systems. 2010; pp 1378–1386.
Liu T, Yuan Z, Sun J, Wang J, Zheng N, Tang X, Shum HY. Learning to detect a salient object. IEEE Trans Pattern Anal Mach Intell. 2011;33(2):353–67.
Srivastava A, Lee AB, Simoncelli EP, Zhu SC. On advances in statistical modeling of natural images. J Math Imaging Vis. 2003;18(1):17–33.
Chang CC, Lin CJ. Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol (TIST). 2011;2(3):27.
Lang, P.J., Bradley, M.M., Cuthbert, B.N.: International affective picture system (iaps): Affective ratings of pictures and instruction manual. Technical report A-8 (2008).
Mikels JA, Fredrickson BL, Larkin GR, Lindberg CM, Maglio SJ, Reuter-Lorenz PA. Emotional category data on images from the international affective picture system. Behav Res Methods. 2005;37(4):626–30.
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2017; pp 4700–4708.
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, et al. Imagenet large scale visual recognition challenge. Int J Comput Vis. 2015;115(3):211–52.
Chollet F. Keras. GitHub repository. 2015.
Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Dean ADJeffrey, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jozefowicz YJR, Kaiser L, Kudlur M, Levenberg J, Mane D, Monga R, Moore S, Olah DMC, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vasudevan VVV, Viegas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yuan Y, Xiaoqiang Z. Tensorflow: large-scale machine learning on heterogeneous distributed systems. 2016. arXiv preprint arXiv:1603.04467.
Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ, editors. Advances in neural information processing systems, vol. 25. Red Hook: Curran Associates Inc.; 2012. p. 1097–105.
Rock I, Palmer S. The legacy of gestalt psychology. Sci Am. 1990;263(6):84–91.
Sabour S, Frosst N, Hinton GE. Matrix capsules with em routing. In: 6th International Conference on Learning Representations, ICLR, 2018.
Arend L, Han Y, Schrimpf M, Bashivan P, Kar K, Poggio T, DiCarlo JJ, Boix X. Single units in a deep neural network functionally correspond with neurons in the brain: preliminary results. Technical report, Center for Brains, Minds and Machines (CBMM) (2018).
Acknowledgements
This research is supported by the National Research Foundation, Prime Minister’s Office, Singapore, under its Strategic Capability Research Centres Funding Initiative. The authors want to thank Dr. Cheston Tan for his contribution to empirical modeling, and Dr. Ming Jiang, Dr. Seng-Beng Ho, and Dr. Tian-Tsong Ng for helpful discussions.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Fan, S., Koenig, B.L., Zhao, Q. et al. A Deeper Look at Human Visual Perception of Images. SN COMPUT. SCI. 1, 58 (2020). https://doi.org/10.1007/s42979-019-0061-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-019-0061-5