CS5670: Intro to Computer Vision
Instructor: Noah Snavely
Instructor
• Noah Snavely (snavely@cs.cornell.edu)
• Research interests:
– Computer vision and graphics
– 3D reconstruction and visualization of Internet
photo collections
– Deep learning for computer graphics
– Virtual reality video
Today
1. What is computer vision?
2. Course overview
3. Image filtering
Today
• Readings
– Szeliski, Chapter 1 (Introduction)
Every image tells a story
• Goal of computer vision:
perceive the “story”
behind the picture
• Compute properties of
the world
– 3D shape
– Names of people or
objects
– What happened?
The goal of computer vision
Can the computer match human
perception?
• Yes and no (mainly no)
– computers can be better at
“easy” things
– humans are much better at
“hard” things
• But huge progress has
been made
– Accelerating in the last 4
years due to deep learning
– What is considered “hard”
keeps changing
Human perception has its
shortcomings
Sinha and Poggio, Nature, 1996
But humans can tell a lot about a
scene from a little information…
Source: “80 million tiny images” by Torralba, et al.
The goal of computer vision
The goal of computer vision
• Compute the 3D shape of the world
The goal of computer vision
• Recognize objects and people
Terminator 2, 1991
slide credit: Fei-Fei, Fergus & Torralba
sky
building
flag
face
banner
wall
street lamp
bus bus
cars slide credit: Fei-Fei, Fergus & Torralba
The goal of computer vision
• “Enhance” images
The goal of computer vision
• Forensics
Source: Nayar and Nishino, “Eyes for Relighting”
Source: Nayar and Nishino, “Eyes for Relighting”
Source: Nayar and Nishino, “Eyes for Relighting”
The goal of computer vision
• Improve photos (“Computational Photography”)
Low-light photography (credit: Hasinoff et al., SIGGRAPH ASIA 2016)
Super-resolution (source: 2d3)
Inpainting / image completion (image credit: Hays and Efros)
Why study computer vision?
• Billions of images/videos captured per day
• Huge number of useful applications
• The next slides show the current state of the art
Optical character recognition (OCR)
• If you have a scanner, it probably came with OCR software
Digit recognition, AT&T labs License plate readers
http://en.wikipedia.org/wiki/Automatic_number_plate_recognition
http://www.research.att.com/~yann/
Sudoku grabber
http://sudokugrab.blogspot.com/
Source: S. Seitz
Automatic check processing
Face detection
• Nearly all cameras detect faces in real time
– (Why?)
Face Recognition
Face recognition
Who is she? Source: S. Seitz
Vision-based biometrics
“How the Afghan Girl was Identified by Her Iris Patterns” Read the story
Source: S. Seitz
Leaf Recognition
Bird Identification
Merlin Bird ID (based on Cornell Tech technology!)
Special effects: camera tracking
Boujou, 2d3
Special effects: shape capture
The Matrix movies, ESC Entertainment, XYZRGB, NRC
Source: S. Seitz
Special effects: motion capture
Pirates of the Carribean, Industrial Light and Magic Source: S. Seitz
3D face tracking w/ consumer cameras
Snapchat Lenses
Face2Face system (Thies et al.)
Sports
Sportvision first down line
Nice explanation on www.howstuffworks.com
Source: S. Seitz
Vision-based interaction (and games)
Assistive technologies
Nintendo Wii has camera-based IR
tracking built in. See Lee’s work at
CMU on clever tricks on using it to
create a multi-touch display!
Kinect
Smart cars
• Mobileye
• Tesla Autopilot
• Safety features in many high-end cars
Self-driving cars
Google Waymo
Robotics
NASA’s Mars Curiosity Rover Amazon Picking Challenge
https://en.wikipedia.org/wiki/Curiosity_(rover) http://www.robocup2016.org/en/events
/amazon-picking-challenge/
Amazon Prime Air
Medical imaging
Image guided surgery
3D imaging
Grimson et al., MIT
MRI, CT
Source: S. Seitz
Virtual & Augmented Reality
6DoF head tracking Hand & body tracking
3D scene understanding 3D-360 video capture
My own work
• Automatic 3D reconstruction from Internet
photo collections
“Statue of Liberty” “Half Dome, Yosemite” “Colosseum, Rome”
Flickr photos
3D model
Photosynth
City-scale reconstruction
Reconstruction of Dubrovnik, Croatia, from ~40,000 images
Current state of the art
• You just saw examples of current systems.
– Most of these are less than 5 years old
• This is a very active research area, and rapidly
changing
– Many new apps in the next 5 years
• To learn more about vision applications and
companies
– David Lowe maintains an excellent overview of vision
companies
• http://www.cs.ubc.ca/spider/lowe/vision.html
Why is computer vision difficult?
Viewpoint variation
Scale
Illumination
Why is computer vision difficult?
Motion (Source: S. Lazebnik)
Intra-class variation
Background clutter Occlusion
Challenges: local ambiguity
slide credit: Fei-Fei, Fergus & Torralba
But there are lots of cues we can exploit…
Source: S. Lazebnik
Bottom line
• Perception is an inherently ambiguous problem
– Many different 3D scenes could have given rise to a
particular 2D picture
– We often need to use prior knowledge about the
structure of the world
Image source: F. Durand
CS5670: Introduction to Computer
VIsion
Teaching Assistant
• Zhengqi Li
(zl548@cornell.edu)
• Office hours:
When: TuTh 3:30 – 5pm
Where: Bear Hug
(starting next week)
Important notes
• Textbook:
Rick Szeliski, Computer Vision: Algorithms and
Applications
online at: http://szeliski.org/Book/
• Course webpage:
http://www.cs.cornell.edu/courses/cs5670/2017sp/
• Announcements/grades via Piazza/CMS
https://piazza.com/class#fall2013/cs46705670
https://cms.csuglab.cornell.edu/
Course requirements
• Prerequisites—these are essential!
– Data structures
– A good working knowledge of Python programming
– Linear algebra
– Vector calculus
• Course does not assume prior imaging experience
– computer vision, image processing, graphics, etc.
Course overview (tentative)
1. Low-level vision
– image processing, edge detection,
feature detection, cameras, image
formation
2. Geometry and algorithms
– projective geometry, stereo,
structure from motion, Markov
random fields
3. Recognition
– face detection / recognition,
category recognition, segmentation
4. Light, color, and reflectance
1. Low-level vision
• Basic image processing and image formation
* =
Filtering, edge detection
Feature extraction Image formation
Project: Hybrid images from image
pyramids
G 1/8
G 1/4
Gaussian 1/2
Project: Feature detection and matching
2. Geometry
Projective geometry
Stereo
Multi-view stereo Structure from motion
Project: Creating panoramas
Project: Photometric Stereo
3. Recognition
Face detection and recognition
Single instance recognition
Category recognition
Sources: D. Lowe, L. Fei-Fei
Project: Deep Learning for Recognition
4. Light, color, and reflectance
Light & Color Reflectance
Grading
• Occasional quizzes (at the beginning of class)
• One prelim, one final exam
– (considering final project instead of exam)
• Rough grade breakdown:
– Quizzes + class evaluation: ~5%
– Midterm: 15-20%
– Programming projects: 40-50%
– Final exam: 15-20%
Late policy
• Three free “slip days” will be available for the
semester
• Late projects will be penalized by 5% for first
late day, and 10% for each day it is late after,
and no extra credit will be awarded.
Academic Integrity
• Assignments will be done solo or in pairs (we’ll
let you know for each project)
• Please do not leave any code public on GitHub
(or the like) at the end of the semester!
• Please see the Cornell Code of Academic
Integrity (http://cuinfo.cornell.edu/aic.cfm)
Questions?