0% found this document useful (0 votes)
54 views27 pages

United States Patent (19) 11 Patent Number: 6,084,979: Kanade Et Al. (45) Date of Patent: Jul. 4, 2000

The document is a patent for a method of creating virtual reality from images of a real event. The method involves capturing images of a real event from multiple cameras at different angles. Each image is stored with intensity and color information. An internal representation is computed from the images and camera angle information. Virtual views of each time instant can be generated from any viewing angle using the internal representation. The virtual views can be displayed to simulate a 3D virtual reality environment that allows navigation and interaction.

Uploaded by

Adya Rizky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views27 pages

United States Patent (19) 11 Patent Number: 6,084,979: Kanade Et Al. (45) Date of Patent: Jul. 4, 2000

The document is a patent for a method of creating virtual reality from images of a real event. The method involves capturing images of a real event from multiple cameras at different angles. Each image is stored with intensity and color information. An internal representation is computed from the images and camera angle information. Virtual views of each time instant can be generated from any viewing angle using the internal representation. The virtual views can be displayed to simulate a 3D virtual reality environment that allows navigation and interaction.

Uploaded by

Adya Rizky
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

US006084979A

United States Patent (19) 11 Patent Number: 6,084,979


Kanade et al. (45) Date of Patent: *Jul. 4, 2000
54 METHOD FOR CREATING VIRTUAL 5,619,337 4/1997 Naimpally ................................. 386/83
REALITY 5,675,377 10/1997 Gibas ....... ... 348/47
5,714,997 2/1998 Anderson . ... 348/39
75 Inventors: Takeo Kanade, Pittsburgh, Pa.; P. J. 5,745,126 4/1998 Jain et al. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 345/952

Narayanan, Calicut Kerala, India; E: : Mezi et al. ...................... 3.


Peter W. Rander, Pittsburgh, Pa. 5,953,054 9/1999 Mercier. 34.8/50
73 Assignee: Carnegie Mellon University, OTHER PUBLICATIONS
Pittsburgh, Pa. Robert Skerjanc, Combined Motion and Depth Estimation
based on Mulitocular Image Sequences for 3DTV, SPIE vol.
* Notice: This patent issued on a continued pros- 2177, pp. 35-44, Jul. 1994.
ecution application filed under 37 CFR
1.53(d), and is subject to the twenty year Steven J. Gortler et al., The Lumigraph, Computer Graphics
patent term provisions of 35 U.S.C a a- -
Proceedings, Annual Conference Series, 1996, pp. 43-54,
Apr. 1996.
154(a)(2). p
(a)(2) N. L. Chang et al., Arbitrary View Generation for Three-Di
mensional Scenes from Uncalibrated Video Cameras, Inter
21 Appl. No.: 08/671,791 national Conference on Acoutics, Speech, and Signal Pro
22 Filed: Jun. 20, 1996 cessing, vol. 4, pp. 2455-2458, May 1995.
(51) Int. Cl. ............................................... G06T 17/00 (List continued on next page.)
52 U.S. Cl. .......................... 382/154; 382/285; 345/424; Primary Examiner Leo H. Boudreau
345/425; 348/48 Assistant Examiner Brian P. Werner
58 Field of Search ..................................... 382/103, 106, Attorney, Agent, or Firm Kirkpatrick & Lockhart LLP
382/153, 154, 173, 275, 100, 285, 293;
345/125, 419, 420, 421, 422, 423, 424, 57 ABSTRACT
425, 426, 427,358; 348/42, 47, 48, 552; A method of Virtualizing reality, i.e., a method of creating
434/23, 26, 37, 47, 79, 32, 63; 463/31, Virtual reality from images of a real event, is comprised of
32, 33; 364/468.04 the Steps of capturing a plurality of images of each time
instant of a real event using a plurality of cameras positioned
56) References Cited at a plurality of angles. Each image is stored as intensity
U.S. PATENT DOCUMENTS and/or color information. A Suitable internal representation
is computed from these images and the information regard
4,860,123 8/1989 McCalley et al. ...................... 358/342 ing the camera angles. An image of each time instant may be
5.
2 - a?
o R et al. ....
allah .......
- - - ;: -- --
generated from any viewing angle using the internal repre
5,065,252 11/1991 Yoshio et al. . . . 358/335 sentation of it. The virtual viewpoints could be displayed on
5,247,651 9/1993 Clarisse ......... ... 305/500 a single TV Screen or using a Stereoscopic display device for
5,320,538 6/1994 Baum ... ... 434/307 a true three-dimensional effect. The event thus virtualized
5,442,456 8/1995 Hansen .......... . . 358/432 can be navigated through, and interacted with, any virtual
5,452,435 9/1995 Malouf et al. ... ... 395/500 reality System.
5,499,146 3/1996 Donahue et al. ...................... 360/33.1
5,617,334 4/1997 Tseng et al. ....................... 364/715.02 14 Claims, 16 Drawing Sheets

START: Select first


woxel, first image
44

Project voxel
into depth image
45

Create triangle
from three pixels
nearest projection first woxel

ls
triangle flat
enough

Compute signed Zero-crossing extraction


distance between from voxel walues
triangle and woxel (isosurface extraction)

| 1
Add the signed distance
to the current walue at
End
the current woxel
6,084,979
Page 2

OTHER PUBLICATIONS Leonard McMillan & Gary Bishop, Plenoptic Modeling. An


Narayanan, et al., "Synchronous Capture of Image Image-Based Rendering System, Siggraph95, Aug. 6, 1995,
Sequences from Multiple Cameras”, CMU Technical Report pp. 39–46.
CMU-RI-TR-95-25, Dec. 1995. Roger Y. Tsai, “A Versatile Camera Calibration Technique
Hilton, A., “On Reliable Surface Reconstruction from Mul for High-Accuracy 3D Machine Vision Metrology Using
Off-the-Shelf TV Cameras and Lenses'-IEEE Journal of
tiple Range Images”, University of Surry Technical Report
VSSP TR-5/95, Oct. 1995. Robotics and Automation, vol. RA-3, No. 4, Aug. 1987.
David A. Simon et al., Real-Time 3-D Pose Estimation Masatoshi Okutomi, “A Multiple-Baseline Stere'-IEEE
Using a High-Speed Range Sensor, Robotics and Automa Transactions on Pattern Analysis and Machine Intelligence,
tion, 1994 IEEE International Conference, p.p. 2235-2241, vol. 15, No. 4, Apr. 1993.
1994. William E. Lorensen, Harvey E. Cline, “Marching Cubes: A
Satoh et al., “Passive Depth Acquisition for 3D Image High Resolution 3D Surface Construction Algorithm”-
Displays.” IEICE Trans. Inf. & Syst., vol. E77-D, No. 9, pp. Computer Graphics, vol. 21, No. 4, Jul. 1987.
949–957, Sep. 1994. Takeo Kanade, “User Viewpoint-MasPar News, vol. 2,
Robert Skerjanc, Combined Motion and Depth Estimation No. 2, Nov. 1991.
Based on Multiocular Image Sequences for 3DTV, Proceed Hugues Hoppe, Tony DeRose, Tom Duchamp, Mark Hal
ings of SPIE, Conference #20585, pp. 35-44, Jul. 1994. stead, Hubert Jin, John McDonald, Jean Schweitzer, Werner
Steven J. Gortler et al., The Lumigraph, Siggraph 96, Aug. Stuetzle, "Piecewise Smooth Surface Reconstruction',
4-9, 1996, p.p. 43–54. Computer Graphics SIGGRAPH '94.
U.S. Patent Jul. 4, 2000 Sheet 1 of 16 6,084,979

L'HO Z

|
U.S. Patent Jul. 4, 2000 Sheet 2 of 16 6,084,979

Extraction
&
StOrade
Camera
Centered Graphic Monitor
Triangle Work
Mesh Station (CRT)
Model

Surface
Extraction 32
FIG. 4
U.S. Patent Jul. 4, 2000 Sheet 3 of 16 6,084,979
U.S. Patent Jul. 4, 2000 Sheet 4 of 16 6,084,979

36
For each neighboring
Camera used for
Stereo Computations
37

Map a WXW window of


reference image Centered
at current pixel to other image

40
Compute a measure of
- similarity and save it
for each depth level.
41
For each depth level,
add similarity measures
of all Cameras.

42
Depth level for which
the Sum is minimum
give pixel's depth.
U.S. Patent Jul. 4, 2000 Sheet S of 16 6,084,979

&
U.S. Patent Jul. 4, 2000 Sheet 6 of 16 6,084,979

START: Select first


voxel, first image
44

Project voxel
into depth image
52

Create triangle Move to MOWe to


from three pixels next VOXel next image,
nearest projection first VOXel

YES
51
ls More
triangle flat images
enough

YES 47 53

Compute signed Zero-Crossing extraction


distance between from VOXel Values
triangle and voxel (isosurface extraction)
48

Add the signed distance End


to the Current value at
the Current VOXel

FIG. 8
U.S. Patent Jul. 4, 2000 Sheet 7 of 16 6,084,979

Extracted Surface
FIG 10

Depth 2 Depth 1
Camera Center

Depth 3
U.S. Patent Jul. 4, 2000 Sheet 8 of 16 6,084,979

8
&

8 k
U.S. Patent Jul. 4, 2000 Sheet 9 of 16 6,084,979
U.S. Patent Jul. 4, 2000 Sheet 10 of 16 6,084,979

FIG.15
U.S. Patent Jul. 4, 2000 Sheet 11 of 16 6,084,979

66
Start with empty
triangle mesh. Step
through each depth pixel

67
Consider the 2x2 Section
of it with current pixel as
top-left corner

68
Form two triangles by
connecting top-right
Corner to bottom-left

69
Compute depth difference
between neighbors of
each triangle

7O 72

Difference > Yes


threshold Discard triangle

NO 71

Add triangle to the mesh.


Go next depth pixel.
FG 16
U.S. Patent Jul. 4, 2000 Sheet 12 of 16 6,084,979
U.S. Patent Jul. 4, 2000 Sheet 13 of 16 6,084,979

Get position of viewer


and viewing direction
from interface

Find reference and


Supporting Camera-based
models

Render image
from reference
model

Mark hole
regions
explicity

Replace marked hole


region by rendering
supporting models

FIG. 19
U.S. Patent Jul. 4, 2000 Sheet 14 of 16 6,084,979

Compute angle between


viewing direction with
each Camera direction

60
Find minimum angle
less than 18O
Select that model
as reference.

61
Check triangles formed
by reference camera
and two neighbors

62
Find the triangle
interSected
by the extension of
viewing direction

63
Select corresponding
neighboring Camera
models as supporting

FIG. 20
U.S. Patent Jul. 4, 2000 Sheet 15 of 16 6,084,979
U.S. Patent Jul. 4, 2000 Sheet 16 of 16 6,084,979

FG, 24
6,084,979
1 2
METHOD FOR CREATING VIRTUAL camera angles, that must have been recorded to enable the
REALITY user to assume Such positions. Similarly, images of objects
are recorded from predetermined Vantages, thereby limiting
This invention was made with support from the United examination of the objects to just those pre-recorded images.
States Government under Grant Number NO0014-95-1-0591 Apple Computer has developed a Software product called
awarded by the Department of the Navy. The United States Quicktime VR(R) which allows a user to navigate within a
Government has certain rights in the invention. Scene. Scenes can be made from photographs, video Stills, or
computer renderings. To photograph a Quicktime VR(R)
BACKGROUND OF THE INVENTION Scene, the photographer places a camera on a tripod and
1. Field of the Invention shoots a Series of pictures, turning the camera thirty degrees
after each exposure. The images are digitized and input to
The present invention is directed to the domain of virtual the Quicktime VR(R) software which electronically warps the
reality and, more particularly, to processes for creating images, maps the overlapping features, and Stitches the
virtual world models from real world images. images together. Warping makes the Stitching possible, and
2. Description of the Background 15 it may appear that images from intermediate Viewing posi
Virtual reality and the concept of virtual worlds has tions are generated, but the resultant images are incorrect
received much attention in the popular preSS and captured and contain distortion. In effect, Straight lines become
the imagination of the public at large. The ability to navigate curved. When you open a scene with the Quicktime VR(R)
through a world Seen only on your computer Screen, or player, the player corrects for the distortion by unwarping
through a special headset or visor, opens the door for an the part of the image being displayed. AS you move around,
incredible variety of experiences. Add to the ability to the part of the image being displayed changes So as to keep
navigate through a virtual environment the capability of up with your movements. With respect to objects, objects are
picking up objects, or otherwise interacting with objects composed of a large number of images all taken from a
found in the virtual environment, and the basis for the Slightly different angle. AS you turn the object or tilt it up and
enthusiasm for the technology becomes readily apparent. 25 down virtually, the Quicktime VR(R) Software responds to
To date, navigable virtual environments with interactive your movements and displays the appropriate images.
objects embedded therein have been extremely simplistic The fundamental limitation of these approaches is that
due to the tremendous amount of modeling that is required they do not have three-dimensional models of the environ
to even begin to approximate realistic-looking virtual ment. The patent to Hansen and the Apple Quick Time VR
objects much less a realistic-looking and realistic behaving Software overcome the problems associated with attempting
Virtual environment. When one considers size, color, and to model complicated objects and environments by Simply
texture for even a simple object together with how the Storing large numbers of images. However, they both Suffer
appearance of the object changes from different vantages, it from the limitation that what can be viewed by the user is
quickly becomes apparent that the process of creating just 35
limited to the images that are pre-recorded. Accordingly, the
Virtual objects is a daunting task. As a result, today's virtual need exists for a method which combines the ability to
environments and objects are either very Simplistic looking quickly capture a tremendous amount of detail in the way
as they are created using Simplistic CAD models, or are which photographs can capture detail, but which allows the
extremely expensive, and Sometimes both. user to view the objects from an infinite variety of Vantage
One project jointly developed by Apple Computer, Inc., 40
points the way a mathematically-modeled object or envi
The Massachusetts Institute of Technology, and the Univer ronment can be viewed.
sity of Regina in Canada called the “Virtual Museum SUMMARY OF THE INVENTION
Project' has produced a Successful navigable video envi
ronment. The virtual museum is a computer-based rendering The present invention is directed to a method of virtual
of a museum which contains various objects of interest. The 45 izing reality, i.e., a method of creating virtual reality from
user can interactively move through the virtual museum and images of a real event. The method is comprised of the Steps
approach individual objects which can be viewed. The of capturing a plurality of images of each time instant of a
objects can be selected and viewed from a variety of real event using a plurality of cameras positioned at a
perspectives. A complete description of the Virtual Museum plurality of angles. Each image is Stored as intensity and/or
Project is found in “The Virtual Museum: Interactive 3D 50 color information (hereinafter "image information”). A Suit
Navigation of a Multi-Media Database,” by Miller, et al. able internal representation is computed from these images
U.S. Pat. No. 5,442,546 entitled Method and Apparatus and the information regarding the camera angles. An image
for Multi-Level Navigable Video Environment issued on of each time instant may be generated from any viewing
Aug. 15, 1995 to Hansen. The Hansen Patent describes a angle using the internal representation of it. The Virtual
multi-level apparatus to “navigate' an environment consist 55 Viewpoints could be displayed on a single TV Screen or
ing of Video data. Using a touch-Sensitive Screen, the user using a Stereoscopic display device for a true three
can navigate through the Video database of offerings in an dimensional effect. The event thus virtualized can be navi
interactive manner. Once the viewer has maneuvered to a gated through, and interacted with, like any virtual reality
certain position, a separate data track, for example, data System.
about a particular object, may be viewed. 60 One embodiment of the internal representation is in terms
Although Such Systems may seem Similar to virtual reality of a plurality of camera-centered descriptions, i.e., explicit
Systems, they can more accurately be characterized as a depth information in association with the intensity informa
Video-based hypertext interface to a network of Video data. tion from the camera angle of the Scene. These may be
Although it is interactive, it brings up only previously stored obtained using a passive depth recovery method like Stereo.
Scenes in response to user inputs. The Systems lack the 65 The View generation in this embodiment is comprised of a
ability to allow the user to move freely within the environ method to Select the best camera-centered description for a
ment because of the infinite number of perspectives, i.e. given Viewing angle, plus one or two Supporting descrip
6,084,979
3 4
tions to fill the holes generated by moving away from the FIG. 10 illustrates the extraction of the final Surface at the
reference camera's Viewpoint. Zero crossings of the values accumulated in the Voxels;
Another embodiment of the internal representation is in FIG. 11 illustrateS projecting each VOXel into the image
terms of a single object-centered model describing the three plane; the range image is interpolated to compute the
dimensional Structure of the event. One way to obtain Such distance from the camera to the Surface, from which the
a model is by merging a plurality of depth information signed distance from the VOXel to the Surface is computed;
computed from a plurality of camera angles. The view FIGS. 12a-12k illustrate eleven frames of a person Swing
generation Strategy in this embodiment comprises a Standard ing a bat,
rendering tool, Similar to those used in Virtual reality. FIGS. 13a and 13b illustrate the motion between image
Object-centered models, like CAD models, can be modified, fields;
interacted with, and transported to novel Situations.
The present invention is also directed to an apparatus for FIGS. 14a and 14b illustrate a typical range image and the
creating virtual reality. The apparatus is comprised of a corresponding intensity image, respectively;
plurality of cameras Supported by a frame which enables the FIGS. 15a-15l illustrate the sequence of extracted meshes
cameras to be oriented at any desired angle with respect to 15 from the eleven frame sequence of FIGS. 12a-12k,
the object, event, or environment being virtualized. A first FIG. 16 is a block diagram of a process for computing a
circuit generates a signal for Synchronizing the operation of camera centered model from a depth map;
all the cameras. A Second circuit generates a time Stamp FIG. 17 illustrates a reconstructed Scene without discon
Signal having a plurality of values. VCR's or other Storage tinuity compensation;
devices are provided for capturing a sequence of images FIG. 18 illustrates the reconstructed Scene of FIG. 17 with
from each of the plurality of cameras, with each of the the discontinuity omitted, i.e. a hole;
images being associated with a value of the time Stamp FIGS. 19 and 20 are block diagrams that describe the
Signal. An internal representation, of the type described rendering Scheme using multiple camera centered models,
above, is prepared from the captured images. A view gen
erator may be used to reconstruct an object, event, or 25 FIG. 21 illustrates the results of filling the holes of FIG.
environment from any viewing angle from the internal 18 using one Supporting view;
representation. FIGS. 22 and 23 illustrate the same baseball scene as
The present invention overcomes two of the primary shown in FIG. 21 but from viewpoints very different from
problems found in the prior art. In the present invention, the the reference angle;
Virtual world is created from images taken by cameras. FIGS. 24a–24g illustrate seven scenes from a basketball
Because of that, the virtual world has the fine detail which Sequence from the reference angle; and
is found in the real world. However, because the information FIGS. 25a-25g illustrate the same seven scenes as FIGS.
is stored as Some form of internal representation, view 19a–19g, respectively, but from a synthetically-generated
generators, Such as graphic WorkStations, can generate a moving viewpoint.
35
View of the virtualized image from any angle or viewing DESCRIPTION OF THE PREFERRED
position. That frees the user to explore the virtualized world EMBODIMENTS
from Virtually any Vantage point, not just the prerecorded Introduction
Vantage points. That permits the fine detail captured by the
cameras to be viewed in a way that heretofore only CAD The present invention is directed to a new visual medium
modeled environments could be viewed. The present 40 which we refer to as virtualized reality. The visual media
invention, therefore, represents a Substantial advance over available today, e.g., paintings, photographs, moving
the prior art. pictures, television, Video recordings, etc. Share one com
mon aspect. That common aspect is that the View of the
BRIEF DESCRIPTION OF THE DRAWINGS Scene is decided by a “director” while recording or tran
For the present invention to be clearly understood and 45 Scribing the event, independent of the viewer. In Virtualized
readily practiced, the present invention will be described in reality, the Selection of the viewing angle is delayed until
conjunction with the following Figures wherein: View time at which time the viewing angle is Selected by the
FIG. 1 is a block diagram illustrating how images USC.
recorded by a plurality of cameras may be represented and To generate data for the Virtualized reality medium,
Viewed according to the teachings of the present invention; 50 images are recorded using cameras positioned to cover the
FIG. 2 illustrates three different internal representations event from all sides. AS used herein, images could be of
which can be used in conjunction with FIG. 1; discrete objects, environments, objects interacting with an
FIG. 3 illustrates a dome and camera arrangement used in environment, i.e., an event, etc. Each camera produces a
conjunction with the method of the present invention; Series of images with each image being comprised of a
FIG. 4 is a block diagram of the present invention;
55 plurality of pixels. The cameras used are passive in the Sense
FIGS. 5a-5e illustrate five images captured by five dif that no light grid or other reference need be projected onto
the image. A time-varying three-dimensional Structure of the
ferent cameras which can be used to compute one Scene image, described in terms of the depth of each point in an
description; image, and aligned with the pixels of the image, is computed
FIG. 6 is a block diagram of a structure extraction proceSS 60 for a few of the camera angles, preferably using a Stereo
using an MBS technique, method. The camera angles used for that computation are
FIG. 7 illustrates a depth map constructed from the referred to as transcription angles. We refer to the combi
images of FIGS. 5a-5e, nation of depth and the corresponding pixel information as
FIG. 8 is a block diagram of an isosurface extraction a Scene description. The collection of a number of Scene
proceSS, 65 descriptions, each from a different transcription angle, is
FIG. 9 illustrates how two Surface estimates contribute to called the virtualized world. Once a real world object,
a row of voxels; environment, event, etc. has been virtualized, graphics tech
6,084,979
S 6
niques can render the Virtualized model from any viewpoint. dome 15 providing views from angles Surrounding the
The Scene description from the transcription angle closest to event. A prototype dome 15 has been built which is hemi
the viewer's position can be chosen dynamically for display Spherical in shape, five meters in diameter, and constructed
in real-time by tracking the position and orientation of the from nodes of two types and rods of two lengths. In the
Viewer. The viewer, wearing a Stereo viewing System, can prototype dome, fifty-one cameras have been provided.
freely move about in the virtualized world and observe it Fifty-one different depth maps for each Scene are extracted
from a viewpoint chosen dynamically at View time. The using a clustering of neighboring cameras for each camera.
depth information can be further manipulated to produce The cameras are mounted on Special L-shaped aluminum
object centered descriptions of everything within an image. brackets which can be clamped onto the rods of the hemi
Once Such object centered descriptions are produced, the Spherical dome anywhere on the rods of the dome.
objects are essentially reduced to CAD models, views of To Synchronously acquire a Set of Video Streams, a Single
which can then be generated from any viewing angle. control signal can be Supplied to the cameras to Simulta
neously acquire images and to the digitizing equipment to
A System 1 constructed according to the teachings of the Simultaneously capture the imageS. To implement that
present invention is illustrated in FIG. 1. The system 1 is approach directly in digital recording hardware, the System
comprised of a plurality of cameras 2 generating image 15
would need to handle the real-time Video Streams from many
information. Capturing the images is represented in FIG. 1 cameras. For a single monochrome camera providing thirty
by a box 3 labeled scene transcription hardware. The infor images per Second, 512x512 pixels per image with eight bits
mation captured by the Scene transcription hardware 3 is per pixel, the system would need to handle 7.5 Mbytes of
used by an internal representation generator 4 to create an image data per Second. A Sustained bandwidth to Store the
internal representation. The internal representation generator captured data onto a Secondary Storage device is beyond the
4 may be an appropriately programmed computer with the capabilities of typical image capture and digital Storage
Scene transcription data being Supplied thereto. The type of Systems, even with the best loSS-leSS compression technol
Scene transcription performed may be related to the intended ogy available today. For example, our current System-a
internal representation. The internal representation is then Sun Sparc 20 workstation with a KT model V300
used by a view generator 5 which produces an image, that 25 digitizer-can capture and Store only about 750 Kbytes per
may be viewed on an interactive Screen 6, or a Stereo pair, Second. Specialized hardware could improve the throughput
that may be viewed on a stereoscopic headset 7. The viewer but at a Substantially higher cost. Replicating Such a Setup to
provides position and other information, e.g., for picking up capture many video channels Simultaneously is prohibitively
an object, through the interactive Screen 6 or other type of expensive.
input/interface device. The view generator 5 generates the We developed an off-line system illustrated in FIG. 4 to
View or Stereo pair in response to the position input received Synchronously record frames from multiple cameras 2. The
from the user. The operation of the system illustrated in FIG. cameras 2 are first Synchronized to a common Sync signal
1 is now further explained in conjunction with FIG. 2. produced by a Sync signal generator 16. The output of each
In FIG. 2, three different internal representations are camera 2 is time-Stamped with a common Vertical Interval
illustrated. A plurality of images 9 are produced by the 35 Time Code (VITC) produced by a time code generator 17
cameras 2. In the Simplest internal representation repre and recorded on tape using one VCR 18 for each camera 2.
sented by arrow 10, a user 11 simply views one of the The resulting tapes 20 thus have each frame time Stamped.
plurality of imageS 9 which corresponds most closely to the The tapes 20 are digitized individually off-line in A/D device
user's virtual location. In effect, arrow 10 represents the 22 which comprises a frame grabber, analog to digital
prior art in that the user 11 only SeeS prerecorded images. 40 converter, and software that interprets the VITC time code
The method and apparatus of the present invention go embedded in each frame. We can capture all frames of a tape
beyond the prior art because of a unique Scene transcription 20 by playing the tape as many times as the Speed of the A/D
method and apparatus and because of the internal represen device 22 necessitates. The time code also allows us to
tations which may be produced. As shown in FIG. 2, an correlate the frames acroSS cameras, which is crucial when
internal representation in the form of depth information is 45 transcribing moving events. Interested readers can refer to a
extracted from the plurality of images 9 which may be used Separate report by Narayanan et al. entitled "Synchronizing
to generate camera-centered models 12. The camera cen and Capturing Every Frame from Multiple Cameras”,
tered models, i.e., the depth information, can be used to Robotics Technical Report, CMU-RI-TR-95-25, Carnegie
produce a view of an image which can be provided to the Mellon University 1995, which is hereby incorporated by
user 11 even when none of the plurality of images 9 50 reference, for more details on the Synchronous multi-camera
represents the View Seen from the Virtual position of the recording and digitizing Setup. The digitized information is
viewer 11. then Stored in any Suitable Storage media 24. The compo
The method of the present invention may be carried a step nents enclosed within the dotted box generally correspond to
further in which the internal representation 4 is implemented the box labelled scene transcription hardware 3 in FIG. 1.
by creating object centered models 13 from the depth 55 Structure Extraction
information. When object centered models are created, in FIGS. 5a-5e illustrate a frame as seen by five different
effect CAD models of the objects are created. Thus, a view cameras. An extraction technique, box 26 in FIG. 4, is
of the object from any viewing angle can be generated using performed on the Stored image information. In the presently
Standard CAD techniques. Both internal representations 12 preferred embodiment, a Stereo algorithm, which computes
and 13 enable the viewer 11 to move through the virtualized 60 estimates of Scene depth from correspondence among
environment in a manner heretofore available only in SyS images of the Scene, is used to extract depth information
tems which have been laboriously created using CAD mod from the images of FIGS. 5a-5e. In FIG. 4, the line
els. Thus, the fine detail captured by cameras can be used to connecting Storage device 24 with extraction technique box
rapidly create a virtual world. 26 is shown with a break to indicate that the extraction
Scene Transcription 65 technique, which is implemented in Software in the preferred
FIG. 3 illustrates a dome 15 which can be used to embodiment, may be performed on a machine physically
Virtualize an event. Cameras 2 are placed all around the distinct from the Storage device 24.
6,084,979
7 8
In the preferred embodiment of the present invention, to-camera mapping, and project the converted three dimen
steps 35 through 42 of a multi-baseline stereo (MBS) Sional point into the other image. AS with the parallel
technique, shown in FIG. 6, are used to extract a three camera configuration, the full Search is conducted by
dimensional Structure from the multi-camera images of matching each reference image point to the other imageS for
FIGS. 5a-5e. The MBS extraction technique is well known each possible c. We then add the match error curves from a
in the art, See Okutomi and Kanada, “A multiple-baseline Set of image pairs and Search for the minimum of the
stereo", IEEE Transactions on Pattern Analysis and combined error function.
Machine Intelligence, 15(4):353–363 (1993), which is FIG. 7 illustrates the depth map recovered by applying
hereby incorporated by reference, Such that it need not be that approach to the input images shown in FIGS. 5a-5e.
described herein. The depth map has Seventy-four levels for a depth range of
The present invention is not intended to be limited to the two meters to five meters. The depth map is Stored in a
use of the MBS technique or even limited to a stereo manner Such that it is Stored in association with the image
technique. The choice of the MBS technique was motivated information, e.g., intensity map, for the image from which
primarily by two factors. First, the MBS technique recovers the depth information was extracted. The process could be
dense depth maps, i.e., a depth estimate corresponding to 15 carried out five times, Sequentially using each of the five
every pixel in the intensity images, which is useful for image images of FIGS. 5a-5e as the reference image to obtain five
reconstruction. Second, the MBS technique takes advantage Slightly different depth maps, with each depth map being
of the large number of cameras which we are using for Scene Stored in association with the intensity map for the reference
transcription to thereby increase precision and reduce errors image for that iteration.
in depth estimation. Stereo techniques used to extract the Scene Structure
In our embodiment of the MBS technique, rather than require images corresponding to precisely the Same time
send the MBS-computed depth maps directly on to the next instant from every camera to be fed to them to accurately
processing Stage, we manually edit the depth map to correct recover three-dimensional Scene Structure. The need to Vir
for errors that occur during automatic processing. While a tualize every frame in Video Streams containing fast moving
good window Size helps by reducing the number of errors to 25 events potentially exists to Satisfactorily reproduce the
be corrected, it is leSS important in our approach because the motion. Therefore, the facility discussed hereinabove under
user has the opportunity to correct the errors in the depth the heading Scene Transcription which is set up to acquire
maps. Such manual editing, however, is not necessary. Scene descriptions should cover the action from all angles
Rather, it is implemented in our embodiment to improve and should have the capability to record and digitize every
quality. frame of each Video Stream Synchronously.
Before the MBS technique can be performed, the cameras Recovery of Dynamic Scene Structures
must be calibrated. We perform both intrinsic and extrinsic A Scene description consists of a depth map providing a
camera calibration to obtain epipolar line constraints using dense three dimensional Structure of the Scene aligned with
an approach from Tsai. See R. Tsai, “A versatile camera the image information, e.g., intensity map, of the Scene. A
calibration technique for high-accuracy 3D machine vision 35 point (i,j) in the depth map gives the distance of the intensity
metrology using off-the-shelf TV cameras and lenses”, IEEE image pixel (i, j) from the camera. One type of internal
Journal of Robotics and Automation, 3(4):323-344, 1987, representation that can be created from the Scene descrip
which is hereby incorporated by reference. That approach tions is an object-centered model which is created from the
accounts for camera pose in a world coordinate System (3D Steps of image fusion 30 and Surface extraction 32 as shown
rotation and translation) Scaling (focal length and aspect 40 in FIG. 4. The purpose of the step of image fusion 30 is to
ratio), shifting of the image (image center), and length integrate all the Scene descriptions into one object-centered
distortion (a single coefficient for radial length distortion). 3D model of the scene, a problem for which numerous
Rather than calibrate all parameters of each camera at the algorithms have been proposed. With errors in the depth
Same time, we separate the proceSS into two steps, one for maps, however, the fusion algorithm must be robust, even
intrinsic parameters (focal length, aspect ratio, image center, 45 when the errors are Systematic (as is the case with Stereo)
and lens distortion), and the Second for extrinsic parameters rather than random. Based on its resilience to noise, Sim
(rotation and translation). In the first step, we have each plicity of design, and non-iterative operation, we use a
camera image a calibration object with known locations of technique that is Somewhat Similar to Several known
points in a 3D volume. We then extract the image projections techniques, e.g., A. Hilton, “On Reliable Surface Recon
of those 3D points. The calibration process adapts the 50 struction From Multiple Range Images.” Technical Report
camera model So that the 3D points project to the corre VSSP-TR-5/95, University of Surry (October 1995), and H.
sponding image points. When that process is complete, we Hoppe et al., “Piecewise Smooth Surface Reconstruction,”
position the cameras in the recording environment and Computer Graphics SIGGRAPH '94, 295-302 (1994),
perform the final calibration Step, determining the camera which are hereby incorporated by reference. A weight may
pose relative to a coordinate System common to all the 55 also be attached to each range estimate, allowing easy
cameras. We calibrate using a portion of the lab floor visible incorporation of range estimate reliability into the fusion
to all cameras, having laid out marks on the floor with proceSS.
known Separation to provide 3D calibration points as before. The Surface extraction Step 32 is shown in greater detail
Using the recovered calibration, any point in the three in FIG.8. At step 44, each voxel in an object-centered 3D
dimensional coordinate System of the reference camera can 60 Space is projected onto a tessellated Surface which results
be mapped to a point in the three dimensional coordinate from the fusion process. A triangle is created at Step 45 from
System of any of the other cameras. the three pixels nearest to the projection. If the triangle is
To find correspondences, we again match a reference sufficiently flat as determined by decision step 46, the
region to another image as a function of inverse depth c. To weighted, Signed distance between the triangle and the VOXel
find the position in the Second image corresponding to the 65 is computed at step 47 and accumulated at step 48. The
inverse depth, we convert the reference point and inverse accumulation proceSS is shown in FIG. 9. After accumulat
depth into a three dimensional coordinate, apply the camera ing acroSS all Voxels and imageS as shown by Steps 49, 50,
6,084,979
9 10
51, 52, the voxels implicitly represent the surface by the Zero with all of its immediate neighbors to create clusters of
crossings of their values. By finding the Zero crossings, cameras wherein each cluster contained four to Seven cam
which is performed at Step 53, the Surface is extracted as eras. For each image in each Sequence, we used these
shown in FIG. 10. clusters to compute range images at 245x320 resolution,
This process, implicit Surface (or isoSurface) extraction, is with depth resolution ranging from 155-264 levels, depend
well Studied and has standard Solutions Such as the marching ing on the actual geometric configuration of each cluster.
cubes algorithm, W. Lorensen and H. Cline, “Marching
Cubes: a High Resolution 3D surface Construction The depth search range began at 750 mm and ended at 5000
Algorithm," Computer Graphics SIGGRAPH '87, 163-170 mm from each reference camera, measured along its optical
(July 1987); J. Bloomenthan, “An Implicit Surface axis. A typical range image and the corresponding intensity
Polygonizer,” Graphics Gems IV, ed. P. Heckbert, 324–349 image are shown in FIGS. 14a and 14b, respectively.
(1994) (ftp://ftpgraphics.standard.edu/pub/Graphics/ With Stereo processing complete, we passed the computed
GraphicsGems/GemIV/GGems.IV.tar.Z), both of which are range images directly to the fusion process. Although the
hereby incorperated by reference. Such algorithms generate process can easily combine reliability estimates into the
a 3D triangle mesh representations of the implicit Surfaces. 15
fusion process, we eliminated all weights for simplicity, thus
In approach, we allow the algorithm to adjust all VOXels treating all Samples equally. The fusion process was run over
in front of the Surface as viewed from the Sensor generating
the surface. For voxels far in front of the surface, we clip the the same 3D volume (a space of 6000 mmx6000 mmx6000
weighted, signed distance contribution of each viewpoint to mm, which includes all of the Sensor positions in addition to
a maximum So that this single view does not overwhelm all the scene) for each time frame. The process used 300x300x
others in the fusion process. That modification gives Sig 300 Voxels to represent the Space, corresponding to a VOXel
nificant improvement in the ability of the algorithm to reject size of 20 mmx20 mmx20 mm.
the numerous outliers in our images, while not significantly Note that the Voxels in our approach are treated
degrading the recovered shape. In Summary, we forward independently, allowing us to process individual VOXels or
map (i.e., project) each voxel into the image using the 25
groups of Voxels separately. This freedom allows us to group
known camera models and interpolate depth in the images,
as shown in FIG. 11. the Voxels and the range images for effective use of available
We tested our System on an 11-frame image Sequence memory and of parallel hardware. By allocating only a few
captured by 51 cameras mounted on a geodesic dome five planes of Voxels at a time, for example, we can greatly
meters in diameter. The Scene contained one perSon Swing reduce memory usage during fusion. By distributing VOXel
ing a baseball bat, as shown in the Sequence of images in planes over a number of processors—or completely inde
FIGS. 12a-12k. The 11 frames represent a sampling rate of pendent computers—we easily achieve Substantial parallel
Six images per Second, Sufficient to capture the dominant Speedups. In this initial Set of eXperiments, We decomposed
motion in the Scene. All 30 images per Second could have Several large Voxel Spaces into Sets of planes and process the
been captured using our System, but we limited the initial 35 Sets on independent computers. That Strategy provides
tests to the slower Sampling for practical considerations. nearly linear Speedups with the number of computers used.
Note that the sampling interval was not /6 second but rather With the fusion proceSS complete, we next applied the
/60th second, corresponding to one field of an NTSC video aforementioned Bloomenthal implementation of the March
frame. Sampling for '/6 Second would have caused tremen ing CubeS algorithm to extract the isoSurface representing
dous motion blur, making the imageS relatively useleSS. 40
Each camera underwent a two-stage calibration process, as the Scene Structure. We then decimated the mesh using a
discussed above. Each camera had fixed gain and their Simple mesh decimation program. The Sequence of meshes
Shutters Set to open for the full Sampling interval (one field is shown in FIGS. 15a-15l, where we have removed the
time, or '/60 second). The lenses were set to manual focus and background So that the perSon is more clearly visible. The
iris control, and adjusted to bring the Scene into reasonable 45 player is clearly extracted from each time instant, although
focus and intensity. The images from each camera have the bat is recovered poorly because the voxel size (2 cm on
approximately 90 degrees horizontal by 70 degrees vertical a side) is nearly as large as the bat. This behavior highlights
field of view. the need for voxels to be much smaller than the Smallest
To capture the images, we used the Synchronized multi feature to be extracted from the Scene.
camera recording System discussed earlier, which captures 50 Returning to FIG. 4, another type of internal representa
NTSC video onto S-VHS VCRs. For this test, we deter tion is in terms of a plurality of camera-centered models 33,
mined that we only needed images at approximately 6 Hz, each computed from a Scene description. FIG. 16 demon
So we digitized the necessary frames, at 490 rows by 640 Strates how to convert the depth map into a camera centered
columns resolution, using the approach of repeatedly play model. Starting at Step 66, we generate a triangle mesh at
ing the Videotape until the necessary frames were captured 55
steps 67, 68 by converting every 2x2 section of the depth
for each camera. Because NTSC video is interlaced, the map into two triangles. Table I illustrates how the mesh is
even and odd rows of each NTSC frame are sampled one defined. The (x, y, z) coordinates of each point in the image
NTSC field time apart (/60 second). The change in dynamic are computed from the image coordinates and the depth
scenes during that time (as shown in FIGS. 13a and 13b) using the intrinsic parameters of the imaging System. Each
may be significant for Stereo processing, So we separated 60
each video frame into the two fields, each with resolution vertex of the triangle also has a texture coordinate from the
245x640, and discarded the second field. We then Smoothed corresponding intensity image. That method results in
and subsampled the first field by a factor of two horizontally, 2x(m-1)x(n-1) triangles for a depth map of size mxn. The
resulting in images of resolution 245x320. These are the number of triangles for the depth map shown in FIG. 7 is
images shown in FIGS. 12a-12k. 65 approximately 200,000. Though this is a large number of
With all of the images collected, we next computed range triangles, the regularity makes it possible to render them
images using the MBS technique. We grouped each camera efficiently on graphics WorkStations.
6,084,979
11 12
FIG. 1. The graphic workstation 28 is preferably used in
TABLE I conjunction with the interactive monitor 6. The graphic
WorkStation 28 may be physically part of the same machine
Triangle Mesh and Storing the internal representation, or it may simply be in
Texture Coordinate Definition
communication with the device Storing the representation
through a modem, broadcast channel, or any other Suitable
(u1, V1) (u2, v2) means of communication.
In general, the Scene description is translated into an
object type, Such as a polygonal mesh. The image
information, i.e., intensity and/or color information, is tex
ture mapped onto the polygonal mesh thereby generating
Visually realistic images of the Scene. Graphic WorkStations
have been chosen because of their specialized hardware to
(u3, v3) (u4, V4) render images quickly. For example, a Silicon Graphics
(x3, y3, Z3) (x4, y4, Z4) 15 Onyx/RE2 can render close to one million texture mapped
triangles per Second.
Triangle 1: The following description explains in detail how views
Vertex 1: (x1, y1, Z1) texture coord: (u?m, V/n) are generated from a plurality of camera centered models.
Vertex 2: (x2, y2, Z2) texture coord: (u2/m, v2/n) The reader should understand that the following description
Vertex 3: (x3, y3, Z3) texture coord: (us/m, V/n) is one technique for creating realistic renderings from the
Triangle 2: Stored information and that the present invention is not
Vertex 1: (x2, y2, Z2) texture coord: (u2/m, v2/n) limited to the disclosed embodiment.
Vertex 2: (x3, y3, Z3) texture coord: (us/m, V/n) As shown in FIG. 19, the first step 56 in the view
Vertex 3: (x4, y4, Z4) texture coord: (u/m, V/n) generation proceSS is to determine the virtual position of the
25 viewer. The next step, step 57, is to find the reference and
After the triangles have been formed, the depth difference Supporting camera centered models. The details of Step 57
between neighbors of each triangle is computed at Step 69. are shown in FIG. 20.
If the difference is greater that a predetermined threshold as In FIG. 20, the first step, step 59, is to compute the angle
determined by decision step 70, the triangle is discarded at between the viewing direction and each camera direction.
step 72. If the difference is less than the predetermined Thereafter, at Step 60, the model having a minimum angle
threshold, the triangle is added to the mesh and the proceSS less than 180 is selected as the reference and the associated
proceeds to the next depth pixel as shown by Step 71. camera Selected as the reference camera. At Step 61, tri
The number of triangles in a Scene description can be angles formed by the reference camera and two neighboring
reduced by adopting an algorithm developed by Garland and cameras are checked and the triangle intersected by an
Heckbert that Simplifies a general dense elevation/depth map 35 extension of the viewing angle is found at Step 62. Finally,
into planar patches. See M. Garland and P. S. Heckbert “Fast at Step 63, the corresponding neighboring camera models are
Polygonal Approximation of Terrains and Height field”, Selected as the Supporting models.
Computer Science Tech Report, CMU-CS-95-181, Carnegie Returning to FIG. 19, the reference model is rendered at
Mellon University, (1995), which is hereby incorporated by step 73. Step 73 may be performed by commercially avail
reference. The algorithm computes a triangulation using the 40 able graphic WorkStations. The WorkStation, upon producing
Smallest number of Vertices given a measure for the maxi a triangular mesh, uses a texture mapping technique to paste
mum deviation from the original depth map. The procedure an intensity and/or a color image onto the rendered polygon
starts with two triangles defined by the outer four vertices. to generate a visually realistic image of the Scene from an
It repeatedly grows the triangle mesh by adding the vertex arbitrary viewing angle. The holes in the image are explicitly
of maximum deviation and the corresponding triangle edges 45 marked in step 74. The marked hole regions are filled in by
until the maximum deviation condition is reached. Using corresponding renderings from the Supporting models at Step
that technique, we have reduced mesh Size by factors of 20 75 to produce the image shown in FIG. 21.
to 25 on typical Scenes without effecting the Visual quality There are at least two reasons for relying upon Supporting
of the output. models for generating new views. The first reason is to fill
The rendering using the mesh described in Table I creates 50 in holes caused by depth discontinuities. The Second reason
artificial Surfaces at depth discontinuities. The decision Step is that the intensity image used for texturing gets com
70 of FIG. 16 is intended to correct that. If the triangle is pressed or Stretched when the viewing angle is far from the
discarded in the “yes” path, there will be no artificial reference angle, resulting in poor quality of the Synthesized
Surfaces, but there will be holes. In FIG. 17, for instance, the image. If the viewer Strays too far from the Starting position,
perSon and the wall appear to be connected. We, therefore, 55 Selection of the most direct reference angle for each viewing
delete those artificial Surfaces by not rendering the triangles angle minimizes Such degradation.
that overlap discontinuities, resulting in “holes' as Seen in One way to fill in holes is to combine the Scene descrip
FIG. 18. Fixing such holes is discussed in the view genera tions from all transcription angles ahead of time to generate
tion proceSS next discussed. a model of the Scene that contains all of the necessary detail.
View Generation 60 Several methods are available to register and model objects
The present invention is capable of Synthesizing an event, from multiple range imageS. See Hoppe, et al., “Piecewise
object, or environment from arbitrary viewing angles using Smooth Surface Reconstruction,” Computer Graphics, SIG
the internal representations previously discussed. To render GRAPH '94, 295-302 (1994), which is hereby incorporated
an object, event, or environment from Viewing angles other by reference. Such a consolidated model attempts to give
than those where cameras are actually located from Scene 65 one grand description of the entire World. We require only
descriptions, a graphic workstation 28 (FIG. 4) may be the best partial description of the world visible from a
provided to perform the function of the view generator 5 of particular viewing angle at any time. Such a partial descrip
6,084,979
13 14
tion is likely to be more accurate due to its limited Scope. What is claimed is:
Inaccuracies in the recovery of the portion not seen will not 1. A method of creating virtual reality, comprising:
affect it. Such a partial description is likely to be simpler positioning at least three cameras Surrounding a Visual
than a consolidated model of the Scene, lending easily to real object to be virtualized;
time view generation.
In our method, we prefer not to combine the triangle Synchronizing the operation of Said at least three cameras,
meshes generated using the reference and Supporting Scene generating a plurality of digitized images for each
descriptions into one triangle mesh. We render most of the Sequence of shots of Said Visual object by a correspond
View using the reference Scene description in the first pass. ing one of Said at least three cameras,
While doing so, the pixels which would be mapped to holes, creating a plurality of discrete depth maps, wherein each
i.e., the pixels corresponding to triangles at depth disconti depth map is extracted from an associated one of Said
nuities that we opt not to render, are identified and marked plurality of digitized images,
as discussed in conjunction with step 74 of FIG. 19. The generating a plurality of weighted depth maps by attach
View is rendered from the Supporting Scene descriptions in ing reliability values to each of Said plurality of discrete
Subsequent passes, limiting the rendering to those pixels in 15
depth maps, and
the identified holes. Comparing FIGS. 21 and 18, in FIG.21 preparing an object-centered model of Said visual object
the background pattern and the right shoulder of the perSon including:
have been filled properly. FIGS. 22 and 23 show the same fusing Said each of Said plurality of digitized images
baseball scene from viewpoints very different from the and a weighted depth map associated therewith,
reference angle. The holes left in the image correspond to the
portion of the scene occluded from both the reference and thereby creating a tessellated Surface for Said Visual
Supporting transcription angles. object, and
Image generation is more Straight forward where the extracting a virtualized Surface for Said visual object by
internal representation is an object centered description. In Selectively projecting each voxel in an object
that case, Standard CAD modeling techniques can be used to 25 centered three-dimensional Space onto Said tessel
render an image from any viewing angle. lated Surface and finding one or more Voxels with
The discussion to this point has been focussed on gener Zero-crossings.
ating a single, Static Scene. It is also possible to virtualize 2. The method of claim 1 further comprising Storing Said
and generate moving Scenes by Virtualizing each frame object-centered model.
Separately. The resulting virtualized reality movie can be 3. The method of claim 1, further comprising manually
played with the viewer standing still anywhere in the virtual editing at least one of Said plurality of discrete depth maps
World and rendering each frame from the viewer's position. prior to generating Said plurality of weighted depth maps.
The virtualized reality movie can also be played with the 4. The method as in claim 1, wherein generating Said
viewer moving through the virtual world independently of plurality of digitized images includes:
the motion in the scene. FIGS. 24a–24g illustrate seven 35 capturing a plurality of analog images from each of Said
frames of a basketball Sequence from the reference tran at least three cameras in parallel;
scription angle while FIGS. 25a-25g illustrate the same Storing each of Said plurality of analog images along with
Seven frames from a Synthetically-created moving View a time-Stamp therefor; and
point. generating Said plurality of digitized images by digitizing
Conclusion 40 Said each of Said plurality of analog images.
Virtualized reality, because it starts with a real world 5. The method of claim 4, further comprising correlating
image and virtualizes it, allows viewers to move through a each of Said plurality of digitized images using time-Stamps
virtual world that contains all of the fine detail found in the asSociated with corresponding analog imageS prior to cre
real world. If Sufficient cameras are provided for capturing ating Said plurality of discrete depth maps.
the real world data, the Scene can be viewed from any 45 6. The method of claim 1, wherein preparing Said object
location by using a “Soft' camera. centered model includes:
There are many applications of Virtualized reality. Train pre-computing visibility information for each shot of Said
ing can be made Safer and more effective by enabling the Visual object by each of Said at least three cameras, and
trainee to move freely in a virtualized environment. A Storing Said visibility information as part of Said object
Surgery, recorded in a manner in accordance with the teach 50
centered model.
ings of the present invention, could be revisited by medical 7. The method of claim 6, further comprising generating
Students repeatedly, Viewing the Surgery from positions of
their choice. Tele-robotics maneuvers can be rehearsed in a a three-dimensional virtualized image of Said Visual object
Virtualized environment providing tactile feedback So that from at least one viewing angle using Said object-centered
model thereof.
the rehearsal feels every bit as real as the actual maneuver 55
8. The method of claim 7, wherein generating said three
in the real world. True telepresence can be achieved by dimensional virtualized image includes rendering Said three
performing transcription and view generation in real time.
An entirely new generation of entertainment media can be dimensional visualized image using CAD techniques.
developed. Basketball enthusiasts, broadway aficionados, 9. The method as in claim 7, wherein generating said
etc. can be given the feeling of watching the event from their 60 three-dimensional virtualized image includes:
preferred Seat, or from a Seat that changes with the action. initially rendering a reference Scene description of Said
While the present invention has been described in con three-dimensional virtualized image while omitting
junction with preferred embodiments thereof, many modi rendering of pixels that map to depth discontinuities,
fications and variations will be apparent to those of ordinary and
skill in the art. The foregoing description and the following 65 Subsequently rendering Said pixels that map to depth
claims are intended to cover all Such modifications and discontinuities using Supporting Scene descriptions of
variations. Said three-dimensional virtualized image.
6,084,979
15 16
10. A virtual reality System, comprising: an internal representation generator in communication
at least three cameras positioned around a visual object to with Said Scene transcription hardware, Said internal
be virtualized; representation generator creating a plurality of discrete
Scene transcription hardware in communication with Said depth maps by extracting each depth map from an
cameras, wherein Said Scene transcription hardware asSociated one of Said digitized images, and wherein
includes: Said internal representation generator is configured to
a Sync signal generator in communication with Said at create an object centered model of Said Visual object to
least three cameras, wherein Said Sync signal gen facilitate generation of a three-dimensional virtualized
erator produces a Sync signal that Synchronizes the view thereof.
operation of Said at least three cameras,
at least three Storage devices, wherein each of Said at 11. The system of claim 10, further comprising a view
least three Storage devices is coupled to a corre generator in communication with Said internal representa
sponding one of Said at least three cameras to record tion hardware.
analog imageS Supplied by Said corresponding one of 15
12. The System of claim 11, further comprising a Stereo
Said at least three cameras on a respective magnetic Scopic headset in communication with Said View generator.
Storage medium, 13. The system of claim 11, further comprising a CRT
a time code generator coupled to Said at least three Screen in communication with Said view generator.
Storage devices, wherein Said time code generator
time Stamps each of Said analog images, and 14. The system of claim 10, wherein each storage device
an A/D conversion device configured to operate on is a Video cassette recorder and the respective Storage
each Said magnetic Storage medium off-line, thereby medium therefor is a video cassette.
converting Said analog images into corresponding
digitized images; and
UNITED STATES PATENT AND TRADEMARK OFFICE
CERTIFICATE OF CORRECTION
PATENT NO. : 6,084,979 Page 1 of 1
DATED : July 4, 2000
INVENTOR(S) : Kanade et al.

It is certified that error appears in the above-identified patent and that said Letters Patent is
hereby corrected as shown below:

Title page,
OTHER PUBLICATIONS, delete "Acoutics" and replace therewith -- Acoustics --; and
in the Masatoshi Okutami reference, delete "Stere" and replace therewith -- Stereo --.
Column 9
Line 14, delete "incorperated" and replace therewith -- incorporated --.
Line 16, after "In", insert -- our --.
Column 14
Line 55, after "dimensional", delete "visualized" replace therewith -- virtualized --.

Signed and Sealed this


Twenty-eighth Day of May, 2002

JAMES E. ROGAN
Attesting Officer Director of the United States Patent and Trademark Office

You might also like