0% found this document useful (0 votes)
48 views18 pages

Method AS

This document introduces the CWO++ library, a C++ class library for computational wave optics. The library allows for diffraction calculations, such as the angular spectrum method and Fresnel diffraction, to be performed on CPUs and GPUs. It provides classes for general diffraction calculations ("CWO") and point light source-based calculations for holograms and computer-generated holograms ("cwoPLS" and "gwoPLS"). The library aims to improve the performance of large-scale diffraction calculations through parallelization on GPU hardware.

Uploaded by

carlosj114
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views18 pages

Method AS

This document introduces the CWO++ library, a C++ class library for computational wave optics. The library allows for diffraction calculations, such as the angular spectrum method and Fresnel diffraction, to be performed on CPUs and GPUs. It provides classes for general diffraction calculations ("CWO") and point light source-based calculations for holograms and computer-generated holograms ("cwoPLS" and "gwoPLS"). The library aims to improve the performance of large-scale diffraction calculations through parallelization on GPU hardware.

Uploaded by

carlosj114
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Computational wave optics library for C++: CWO++ library

Tomoyoshi Shimobabaa,∗, Jian Tong Wenga , Takahiro Sakuraia , Naohisa Okadaa , Takashi Nishitsujia , Naoki Takadab , Atsushi
Shirakic , Nobuyuki Masudaa, Tomoyoshi Itoa
a Department of Artificial Systems, Graduate School of Engineering, Chiba University, 1-33 Yayoi-cho, Inage-ku, Chiba, Chiba 263-8522, Japan
b Department of Informatics and Media Technology, Shohoku College, 428 Nurumizu, Atsugi, Kanagawa, 243–8501 Japan
c Department of Imformation and Computer Engineering, Kisarazu National College of Technology, Kiyomi-dai Higashi 2-11-1, Kisarazu, Chiba, 292-0041 Japan
arXiv:1107.5578v1 [physics.optics] 27 Jul 2011

Abstract
Diffraction calculations, such as the angular spectrum method, and Fresnel diffractions, are used for calculating scalar light prop-
agation. The calculations are used in wide-ranging optics fields: for example, computer generated holograms (CGHs), digital
holography, diffractive optical elements, microscopy, image encryption and decryption, three-dimensional analysis for optical de-
vices and so on. However, increasing demands made by large-scale diffraction calculations have rendered the computational power
of recent computers insufficient. We have already developed a numerical library for diffraction calculations using a graphic pro-
cessing unit (GPU), which was named the GWO library. However, this GWO library is not user-friendly, since it is based on C
language and was also run only on a GPU. In this paper, we develop a new C++ class library for diffraction and CGH calculations,
which is referred as to a CWO++ library, running on a CPU and GPU. We also describe the structure, performance, and usage
examples of the CWO++ library.
Keywords:
Diffraction, Digital holography, Digital holographic microscopy, Graphics processing unit, GPGPU, GPU computing, Holography,
Real-time holography, Scalar light propagation

1. Introduction 13]. To reconstruct 3D object information from a hologram in a


computer, we need to calculate diffraction calculations. Appli-
Scalar light propagation is calculated using several diffrac- cations of digital holography include Digital Holographic Mi-
tion calculations. These calculations, e.g. the angular spectrum croscopy (DHM) [14, 15], Digital Holographic Particle Track-
method and Fresnel diffractions, are used in wide-ranging op- ing Velocimetry (DHPIV) [16, 17] and so forth.
tics fields [1, 2], ultrasonic [3], X-ray [4] and so on. In op- Phase retrieval algorithm retrieves phase information of an
tics, its applications include Computer Generated Holograms object light from intensity patterns captured by CCD camera.
(CGH), digital holography, phase retrieval, image encryption In optics, Gerchberg-Saxton (GS) algorithm [18] and modified
and decryption, steganography, three-dimensional (3D) analy- algorithms, e.g. Fresnel ping-pong [19] and Yang-Gu [20] algo-
sis for optical devices, Diffractive Optical Elements (DOE), and rithms, are widely used for phase retrieval. The algorithms are
so on. based on iterative optimization: namely, they gradually retrieve
In CGH and digital holography, diffraction calculations are phase information by calculating diffraction calculations be-
vital. CGH are generated by calculating a diffraction calcula- tween certain intensity patterns (normally more than two) while
tion from a 3D object to a hologram plane on a computer. If we subject to amplitude and phase constraints. The applications of
apply CGH to a 3D display technique [5, 6], the technique be- the algorithms include, for example, wavefront reconstruction
comes attractive because a wavefront reconstructed from CGH [21, 22, 23], holographic projection [24, 25, 26, 27, 28, 29] and
is almost equivalent to an object light. However, the computa- so on.
tional time required for the diffraction calculation involved in Diffraction based encryption and decryption [30, 31, 32],
CGH hampers the realization of a practical 3D display using and steganography [33] were proposed . Diffraction-based tech-
CGH. Many methods have thus been proposed for accelerating niques have an interesting feature, and handle not only 2D but
the computational time [7, 8, 9, 10, 11]. also 3D images.
Digital holography is a well-known method for electroni- In 3D analysis for optical devices, such as optical Micro
cally recording existing 3D object information on a hologram, Electro Mechanical Systems (MEMS), DOE and so on, we can
which is captured by a Charge Coupled Device (CCD) and Com- obtain 3D light distribution to stack multiple two-dimensional
plementary Metal Oxide Semiconductor (CMOS) cameras [12, (2D) diffraction calculations along depth-direction [34]. For ex-
ample, several research works have analyzed the optical charac-
∗ Correspondingauthor teristics of a Digital Micro-mirror Device (DMD), which is one
Email address: shimobaba@faculty.chiba-u.jp (Tomoyoshi of the MEMS devices, using Fresnel diffraction and the angular
Shimobaba)

Preprint submitted to Elsevier July 29, 2011


spectrum method [35, 36]. In this paper, we develop a C++ class library for computa-
As mentioned, diffraction calculations are practically used tional wave optics involved in diffraction and CGH calculations,
in wide-ranging optics fields. The former can also accelerate which is referred to as a CWO++ library, running on CPU and
computational time using the Fast Fourier Transform (FFT) al- GPU. The CWO++ library, unlike the GWO library, is devel-
gorithm; however, if we wish to realize real-time 3D recon- oped using C++ and its structure, performance, and usage ex-
struction from holograms in digital holography, generate a large- amples are described.
area CGH, and obtain 3D light field from an optical device, re- In Section 2, we describe diffraction calculations, the struc-
cent computers lack sufficient computational power. ture of the CWO++ library, the class “CWO” for diffraction
Using hardware is an effective means to further boost com- calculations, and the subclass “cwoPLS” for Point Light Source
putational speed for CGH and diffraction calculations. In fact, (PLS)-based diffraction and CGH calculations on a CPU. In
we showed dramatically increased computational power to de- Section 3, we describe the “GWO” and “gwoPLS” classes for
sign and build special-purpose computers for CGH targeting a diffraction and CGH calculations on a GPU. In Section 4, we
3D display and named HOlographic ReconstructioN (HORN), describe the implementation of the “CWO” and “GWO” classes.
in order to overcome the computational cost of CGH. The HORN In Section 5, we describe field types which are held in the
computers designed by pipeline architecture can calculate light classes. In section 6, we show the performance of diffraction
intensities on CGH at high speed [37, 38, 39, 40, 41, 42]. The and CGH calculations on a CPU and GPU. In Section 7, we
HORN computers were implemented on a Field Programmable show the applications of the CWO++ library to holography. In
Gate Array (FPGA) board, except HORN-1 and -2. To date, we Section 8, we conclude this work.
have constructed six HORN computers, which have been able
to attain several thousand times the computational speed of re- 2. Detail of the CWO++ library
cent computers. Researchers also developed a special-purpose
computer, FFT-HORN, in order to accelerate Fresnel diffraction The CWO++ library mainly consists of two C++ classes:
in DHPIV [16, 43]. The FFT-HORN was able to reconstruct CWO and GWO. The CWO class calculates diffraction calcu-
256 × 256 images from holograms, which were captured by a lations on a CPU, has auxiliary functions and allows us to cal-
DHPIV optical system, at a rate of 30 frames per second. The culate the following diffractions:
FPGA-based approaches for both CGH and diffraction calcu- 1. Fresnel diffraction(Convolution form)
lations showed excellent computational speed, but are subject 2. Fresnel diffraction(Fourier form)
to the following restrictions: the high cost of developing the 3. Shifted Fresnel diffraction
FPGA board, long development term, long compilation of the 4. Angular spectrum method
hardware description language and mapping to FPGA times, 5. Shifted angular spectrum method
and technical know-how required for the FPGA technology. 6. 3D diffraction calculation
Conversely, recent GPUs allow us to use as a highly paral- The first to fifth diffractions are primary diffraction calculations,
lel processor, because the GPUs have many simple processors, on which the sixth diffraction calculation is based. The above
which can process 32- or 64-bit floating-point addition, multi- diffraction calculations can also be calculated on a GPU using
plication and multiply-add instructions. The approach of accel- the GWO class.
erating numerical simulations using a GPU chip is referred to as Table 1 shows the structure of the CWO++ library. The
General-Purpose computation on GPU (GPGPU) or GPU com- class “CWO” is the top class of the CWO++ library, while the
puting. The merits of GPGPU include its high computational other classes, which are “GWO”, “cwoPLS” and “gwoPLS”,
power, the low cost of the GPU board, short compilation time, are inherited from “CWO”. The “cwoPLS” and “gwoPLS”
and the short development term. classes are for PLS-based diffraction and CGH calculations on
We have already developed a numerical library for diffrac- a CPU and GPU, respectively.
tion calculations using a GPU, which was named the GWO “cwoComplex” and “cwoObjPoint” are data structures and
library [44]. The purpose of the GWO library is to facilitate their auxiliary functions for complex number and object points
access to GPU calculation power for optics engineers and re- for PLS, respectively and are distributed as open-source codes.
searchers lacking GPGPU. The GWO library has already been
distributed via the Internet and used to report some papers. For 2.1. Diffraction calculation
example, we reported on a real-time DHM system [45], diffrac- Figure 1 shows a diffraction calculation by monochromatic
tion calculations in a computer-aided design tool for develop- wave, whose wavelength is λ, between a source plane (aperture
ing a holographic display [46], a fast CGH calculation [47, 48] function) u1 (x1 , y1 ) and a destination plane u2 (x2 , y2 ).
and a DHM observable in multi-view and multi-resolution [49]. The CWO++ library allows us to calculate FFT-based diffrac-
Moreover, researchers studied Airy beams generation and their tion calculations. In addition, diffraction calculations are cate-
propagation feature in the simulation using the GWO library gorized into convolution and Fourier forms. The former cat-
[50, 51]. However, the GWO library is not user-friendly be- egory includes Fresnel diffraction (convolution form), Shifted
cause it is based on C language, e.g. the library user must man- Fresnel diffraction, Angular spectrum method and Shifted an-
age the CPU and GPU memory allocation personally and so gular spectrum method. The latter category includes Fresnel
on. In addition, the library is run on only a GPU, namely the diffraction (Fourier form). In the following subsections, we de-
diffraction and CGH calculations are not calculated on a CPU. scribe these diffractions.
2
Table 1: Classes of the CWO++ library. They are distributed as open-source codes.

Class Role Parent class Related source files


Diffraction calculation cwo.h
cwo on CPU None cwo.pp
Diffraction calculation gwo.h
gwo on GPU cwo gwo.cpp
PLS-based diffraction
and CGH calculations cwoPLS.h
cwoPLS on CPU cwo cwoPLS.cpp
PLS-based diffraction
and CGH calculations gwoPLS.h
gwoPLS on GPU gwo gwoPLS.cpp
cwoComplex Complex number None cwo.h
cwoObjPoint PLS None cwo.h
cwoVect Vector operations None cwo.h

In the numerical implementation, we need to discretize each


spatial variable and use FFT instead of Fourier transforms as
follows: The discretizing space variables are (x1 , y1 ) = (m1 ∆x1 ,
n1 ∆y1 ), where ∆x1 and ∆y1 are the sampling pitches and m1 , n1
are integer indices on the source plane. The discretizing space
variables are (x2 , y2 ) = (m2 ∆x2 , n2 ∆y2 ), where ∆x2 and ∆y2 are
the sampling pitches and m2 , n2 are integer indices on the desti-
nation plane. The ranges of integer indices are as follows:

Nx Nx
− ≤ m1 , m2 < − − 1,
2 2
Ny Ny
− ≤ m1 , m2 < − −1 (4)
2 2
where, N x and Ny are the numbers of horizontal and vertical
pixels on the source and destination planes, respectively.
Figure 1: Diffraction calculation by monochromatic wave, whose wavelength is
λ, between a source plane (aperture function) u1 (x1 , y1 ) and a destination plane
The discretized Fresnel diffraction of the convolution form
u2 (x2 , y2 ). is as follows:
exp(i 2π
λ z)
  
2.1.1. Fresnel diffraction (convolution form) u2 (m2 , n2 ) = FFT −1 FFT u(m1 , m1 )
iλz (5)
The convolution form of Fresnel diffraction is expressed by:  
FFT hF (m1 , m1 )
exp(i 2π
Z Z +∞
λ z)
u2 (x2 , y2 ) = u1 (x1 , y1 )
iλz −∞ (1)
π π
2 2
exp(i ((x2 − x1 ) + (y2 − y1 ) ))dx1 dy1 hF (m, n) = exp(i ((m∆x1 )2 + (n∆y1 )2 )) (6)
λz λz
The above equation is the convolution form, and can be ex- Note that the sampling pitches on the destination planes
pressed relative to the following equation according to convo- are the same as those on the source plane after the diffraction,
lution theorem: namely ∆x2 = ∆x1 and ∆y2 = ∆y1 .
exp(i 2π
λ z) −1
    
u2 (x2 , y2 ) = F F u(x1 , y1 ) F hF (x1 , y1 ) (2) 2.1.2. The Fresnel diffraction (Fourier form)
iλz
We can obtain the Fourier form of the Fresnel diffraction to
where, the operators F [·] and F −1 [·] indicate Fourier and in-
expand the quadratic term in Eq. (1). The form is expressed by:
verse Fourier transforms, respectively, hF (x, y) is the impulse
response function (also known as the point spread function) of exp(i 2π λ z) π
Eq. (1) as follows, u(x2 , y2 ) = exp(i (x22 + y22 ))
iλz λz (7)
Z Z +∞

π u′ (x1 , y1 ) exp(−i (x2 x1 + y2 x1 ))dx1 dy1 ,
hF (x, y) = exp(i (x2 + y2 )) (3) −∞ λz
λz
3
where u′ (x1 , y1 ) is defined as, 2.1.4. Angular spectrum method
The angular spectrum method can be devised from the Helm-
(x21 + y21 )
u′ (x1 , y1 ) = u(x1 , y1 ) exp(iπ ). (8) holtz equation and is suitable for computing diffraction at short
λz distance, which is impossible for Fresnel and Shifted Fresnel
This form is rewritten by the Fourier transform. The dis- diffractions. The angular spectrum method is expressed by:
cretized Fourier form of Fresnel diffraction is expressed by:
ZZ +∞
exp(i 2π
λ z) π u(x, y) = A( f x , fy , 0)HA( f x , fy )
u(m2 , n2 ) = exp(i ((m2 ∆x2 )2 + (n2 ∆y2 )2 )) −∞ (15)
iλz λz (9)
  exp(i 2π( f x x + fy y))d f x d fy ,
FFT u′ (m1 , n1 )
where, f x and fy are spatial frequencies, A( f x , fy , 0) is defined
Note that the sampling spacing on the destination plane is by A( f x , fy , 0) = F [u(x1 , y1 )], and HA ( f x , fy ) is the transfer
scaled to ∆x2 = ∆x1 /(λz) and ∆y2 = ∆y1 /(λz). function,
q
2.1.3. Shifted Fresnel diffraction HA ( f x , fy ) = exp(iz k2 − 4π2 ( f x2 + fy2 )) (16)
Shifted-Fresnel diffraction [52] enables arbitrary sampling
pitches to be set on the source and destination planes as well where, k = 2π/λ is the wave-number.
as a shift away from the propagation axis, which is referred as The discretizing frequencies are ( f x , fy ) = (m1 ∆ f x , n1 ∆ fy ),
to off-axis propagation. The equation is derived from a com- where ∆ f x and ∆ fy are the sampling pitches on the frequency
bination of Fourier form Fresnel diffraction and scaled Fourier domain. Therefore, the discretized angular spectrum method is
transform [53]. The same methods were proposed in somewhat expressed by:
different areas [54, 55]. The discretized shifted Fresnel diffrac-    
tion is expressed by the following equations: u(m, n) = FFT −1 FFT u(m, n) HA (m1 , n1 ) . (17)
    
u2 [m2 , n2 ] = CS FFT −1 FFT u′1 [m1 , n1 ] FFT hS [m1 , n1 ] 2.1.5. Shifted angular spectrum method
The angular spectrum method calculates the exact solution
(10)
in scalar light propagation because it is derived from the Helmholtz
equation without using any approximation, but only calculates
light propagation when close to a source plane due to the alias-
exp(ikz) π ing problem. In addition, the angular spectrum method does not
CS = exp(i (x21 + y21 )) calculate off-axis propagation.
iλz λz
2π The shifted angular spectrum method [56] method enables
exp(−i (O x0 p x0 x1 + Oy0 py0 y1 )) exp(−iπ(S x m21 + S y n21 )) off-axis propagation by applying a band-limited function to Eq.
λz
(11) (16). In addition, although the angular spectrum method, as
mentioned, triggers an aliasing error when calculating a long
propagation distance [57], the shifted angular spectrum method
π 2
u′1 [m1 , n1 ] = u1 [m1 , n1 ] exp(i
(x + y20 )) overcomes the aliasing problem, making it an excellent means
λz 0 of performing diffraction calculations. The discretized shifted
exp(−i2π(m0S x O x1 + n0 S y Oy1 )) (12)
angular spectrum method is expressed by:
exp(−iπ(S x m20 + S y n20 ))   
u(m2 , n2 ) = FFT −1 FFT u(m1 , n1 ) HS A (m1 , n1 )
m1 − c x n 1 − cy  (18)
h[m, n] = exp(iπ(S x m2 + S y n2 )) (13) Rect( )Rect( )
wx wy
where, (O x0 , Oy0 ) and (O x1 , Oy1 ) are the shift distances away
from the propagation axis and S x , S y , x0 , y0 , x1 , y1 are defined where, HS A is the shifted transfer function of the same,
as follows:
HS A (m1 , n1 ) = HA (m1 , n1 ) exp(2π(O xm1 ∆ f x + Oy n1 ∆ fy ))
p x0 p x1 py py
Sx = , Sy = 0 1 , (19)
λz λz
x 0 = m0 p x 0 + O x 0 , y 0 = n 0 p y 0 + Oy 0 , (14)
Two rectangle functions are band-limit functions with hori-
x 1 = m1 p x 1 + O x 1 , y 1 = n 1 p y 1 + Oy 1 . zontal and vertical band-widths of w x , wy and shift amounts of
c x , cy . See Ref. [56] for the determination of these parameters.
For more details, see Ref. [52].

4
2.1.6. Summary of diffraction calculation where, the operators max{·} and min{·} take the maximum and
As mentioned, many numerical diffraction calculations have minimum values in the argument. Finally, we save the intensity
been proposed to date, a classification of which is shown in Ta- field with 256 steps as a bitmap file “lena512x512 diffract.bmp”.
ble 2. Each diffraction must be used in terms of the number of Of course, we can also save the intensity field in other image
FFTs (namely, which is proportional to the calculation time), formats. Figure 2 (b) shows the diffracted image.
required memory, propagation distance, sampling pitches and Note that the sample code does not explicitly indicate the
on- or off-axis propagation. For example, if we calculate the wavelength and sampling pitches on the source and destina-
diffraction calculation with different sampling pitches on source tion planes. The default wavelength and sampling rates are 633
and destination planes, we must use Shifted Fresnel diffraction. nm and 10µm × 10µm. If we change the wavelength and sam-
Note that, in the diffraction calculations of the convolution pling pitch, we can use the CWO member functions “SetWave-
form, the area of the source and destination planes must be ex- Length” and “SetPitch”.
panded from N x × Ny to 2N x × 2Ny during the calculation, be-
cause we avoid aliasing by circular convolution. After the cal-
culation, we extract the exact diffracted area on the destination
plane with the size of N x × Ny . Therefore, we need FFTs and
memories of size 2N x × 2Ny during the calculation.

2.2. CWO class: Simple example using the CWO++ library of


the Fresnel diffraction calculation on the CPU
The CWO class provides diffraction calculations and auxil-
iary functions. We used the FFTW library [58] in the diffraction
calculations in the CWO class.
We show the following source-code of the Fresnel diffrac-
tion calculation on a CPU using the CWO++ library:
Listing 1: Fresnel diffraction calculation on a CPU using the CWO class. Figure 2: (a) Original image with 512 × 512 pixels. (b) Diffracted image with
the wavelength of 633 nm and the sampling pitches of 10µm × 10µm on the
1 CWO c; source and destination planes.
2 c.Load("lena512x512.bmp");
3 c.Diffract(0.2, CWO FRESNEL CONV);
4 c.Intensity(); 2.3. Three-dimensional diffraction calculations
5 c.Scale(255); Certain methods for calculating a scalar 3D field were pro-
6 c.Save("lena512x512_diffract.bmp"); posed [34, 59]. In Ref. [59], the method can calculate a 3D
diffraction calculation in a low numerical aperture (NA) system
In line 1, we define the instance “c” of the CWO class. In
using once 3D-FFT, so that the method is effective in terms of
the next line, using the member function of “Load” in the CWO
computational cost and memory amount. In the CWO++ li-
class, we store an original image (Fig.2(a)) of a bitmap file
brary, we provide a calculation of a scalar 3D diffraction field
“lena512 x512.bmp” with 512×512 pixels into the instance “c”.
by stacking 2D diffraction calculations as mentioned in a depth
“Load” can also read jpeg tiff formats, and so forth. See more
direction [34]. Figure 3 shows a scalar 3D diffraction field
details in Appendix B. In line 3, the CWO member function
by stacking 2D diffraction calculations. Unlike 2D diffraction
“Diffract” calculates the diffraction calculation according to the
calculations, the 3D diffraction calculation requires a sampling
first and second arguments of the function, which are the propa-
pitch ∆z in a depth direction.
gation distance and type of diffraction calculation, respectively.
List 2 shows the 3D diffraction calculation from a hologram,
Here, we select the propagation distance of 0.2 m and Fresnel
on which two small circles are recorded by the angular spec-
diffraction of the convolution form (CWO FRESNEL CONV).
trum method. In the calculation, the conditions are as follows:
The calculation results in a complex amplitude field.
the number of pixels in 3D diffracted field is 512 × 512 × 512,
To observe the light intensity field of the complex amplitude
the horizontal and vertical sampling pitches are 10µm × 10µm,
field as image data, the initial step involves the CWO mem-
the sampling pitch in a depth direction ∆z is 1cm, the distance
ber function “Intensity” calculating the absolute square of the
between the two small circles and the hologram is 10cm and
complex amplitude field in line 4. Namely, the calculation is
the radii of the two small circles are 270µm. We need to set the
expressed by:
large sampling pitch in a depth direction compared with the hor-
I(m2 , n2 ) = |u2 (m2 , n2 )|2 . (20) izontal and vertical sampling pitches because the optical system
has a low NA.
In the next, the CWO member function “Scale” converts In line 1, we prepare two instances “c1” and “c2” because
the intensity field with a large dynamic range into that with 256 we need to maintain a hologram and a 3D diffracted field, in-
steps. dividually. In line 2, we load the hologram to the instance c1.
I(m2 , n2 ) − min{I(m2 , n2 )} In line 3, we allocate the memory of 512 × 512 × 512 pixels
I256 (m2 , n2 ) = × 255, (21)
max{I(m2 , n2 )} − min{I(m2 , n2 )}
5
Table 2: Classification of diffraction calculation. In the diffraction calculations of the convolution form, the area of the source and destination planes must be
expanded from Nx × Ny to 2Nx × 2Ny during the calculation, because we avoid aliasing by circular convolution. After the calculation, we extract the exact diffracted
area on the destination plane with the size of Nx × Ny . Therefore, we need FFTs and memories of size 2Nx × 2Ny during the calculation.

Sampling Sampling
pitch on pitch on Number
Propagation source destination Off-axis of Required
Diffraction distance plane plane calculation FFTs memory
Fresnel
diffraction
(Convolution form) Fresnel ∆x × ∆y ∆x × ∆y N.A. 3 2N × 2N
Fresnel
diffraction
∆x ∆y
(Fourier form) Fresnel ∆x × ∆y λz × λz N.A. 1 N×N
Shifted
Fresnel
diffraction Fresnel Arbitrary Arbitrary Available 3 2N × 2N
Angular
spectrum
method Short distance ∆x × ∆y ∆x × ∆y N.A. 2 2N × 2N
Shifted
angular
spectrum method All ∆x × ∆y ∆x × ∆y Available 2 2N × 2N

for the 3D diffraction field. In line 4, we set horizontal, verti- Listing 2: 3D diffraction calculation based on the stack of 2D diffracted planes.
cal and depth-direction sampling pitches, respectively. In line 1 CWO c1,c2;
5, the 3D diffraction field is calculated by the member func- 2 c1.Load("hologram.bmp");
tion “Diffract3D”. The first argument is set to the instance of 3 c2.Create(512, 512, 512);
the CWO class maintaining the hologram. The second argu- 4 c2.SetPitch(10e−6, 10e−6, 0.01);
ment is the distance z as shown in Fig. 3 between the hologram 5 c2.Diffract3D(a1, −0.1, CWO ANGULAR);
and 3D diffracted field. The third argument concerns the type 6 c2.Save("3d.cwo");
of diffraction calculation. In line 6, we save the calculated 3D
diffraction field as a cwo file, which includes the 3D diffraction
field in complex amplitude in binary form. Figure 4 shows the
volume rendering from the cwo file. In the left figure, we can
see the two circles reconstructed head-on in the volume. In the
right figure, we can see a different view of the reconstructed
two circles.

Figure 4: Volume rendering of the 3D diffraction calculation by List 2.

2.4. Calculations of a complex amplitude field and a computer


generated hologram from point light sources
Current CGH calculations are mainly categorized into two
approaches: polygon-based [10, 64] and Point Light Sources
(PLSs) approaches [7, 9, 61]. Polygon-based approaches are
based on diffraction calculations using FFTs. In the subsection,
we focus on the PLS approach, which treats a 3D object as a
composite of multiple PLSs, and generates a CGH from a 3D
object more flexibly than polygon-based approaches.
The class “cwoPLS” calculates a complex amplitude field
and CGH from PLSs. In Fig. 5, the calculations assume that a
Figure 3: Scalar 3D diffraction field by stacking 2D diffraction calculations.
3D object is composed of PLSs with a number of N.
6
CGH and a 3D object of 0.2 m. In line 7, the 3D object data of
the dinosaur is read, while in the next line, the coordinates of
the 3D object are normalized from -4 mm (-1000 pixels) to 4
mm (1000 pixels). In line 9, the CGH is calculated by Eq. (23),
and in the next line, the calculated CGH pattern is normalized
to 256 steps by Eq. (21).
Figure 5(a) and (b) show the CGH generated by List.3 and
the reconstructed image from (a) using the CWO class.
Listing 3: CGH generation using the class “cwoPLS”.
1 cwoPLS c;
2 float p=4e−6;
3 float z=0.2;
4 c.Create(2048,2048);
5 c.SetDstPitch(p,p);
6 c.SetOffset(2500∗p, −512∗p, z);
7 c.Load("tyranno_000.3df");
8 c.ScalePoint(p∗1000);
9 c.PLS(CWO PLS FRESNEL CGH);
Figure 5: PLS-based diffraction and CGH calculations. 10 c.Scale(255);
11 c.Save("cgh.bmp");
We can calculate the complex amplitude field or CGH to
superimpose PLS distributions from one PLS to the destination
plane. The complex amplitude field u(x, y) from PLSs by Fres- 3. GWO and gwoPLS classes: Diffraction and CGH calcu-
nel diffraction is expressed as, lations on GPU

The current CWO++ library provides diffraction and CGH


N
exp(i 2π
λ z)
X 2π (x − x j )2 + (y − y j )2 calculations on NVIDIA GPU chips by the GWO and gwoPLS
u(x, y) = A j exp( ( )) classes. In this subsection, we briefly describe an NVIDIA
iλz j
λ 2z j
GPU, and subsequently show the source code of Fresnel diffrac-
(22) tion and CGH calculations on a GPU using the GWO and gwoP
LS classes.
where, (x, y) and (x j , y j , z j ) are coordinates on the destination
Although initially GPU chips were mainly used for render-
plane and a 3D object, A j is the light intensity of the 3D object.
ing 3D computer graphics, recent GPU chips have also been
A CGH calculation based on Fresnel diffraction is expressed as
used for numerical computation. In 2007, NVIDIA released a
[61],
new GPU architecture and its software development environ-
N
X 2π (x − x j )2 + (y − y j )2 ment, Compute Unified Device Architecture (CUDA). CUDA
I(x, y) = A j cos( ( )) (23) can be used to facilitate the programming of numerical compu-
j
λ 2z j
tations more than software previously developed, such as HLSL,
where, I(x, y) is the CGH pattern. The computational complex- Cg language and so forth. Since its release, many papers using
ity of the above formulas are O(NN x Ny ), where N x and Ny are NVIDIA GPU and CUDA have been published in optics. In
the horizontal and vertical pixel numbers of the CGH. particular, calculations of CGH [62, 63, 64, 65, 66] and recon-
List 3 shows CGH generation using the class “cwoPLS”. struction in digital holography [44, 45, 49, 67, 68, 69, 70] have
We generate a 2, 048 × 2, 048 pixels CGH from the 3D object of successfully accelerated these calculations.
a dinosaur with 11,646 points. The sampling pitch on the CGH, Figure 7 shows the structure of an NVIDIA GPU, featuring
wavelength (default value) and distance between the CGH and GPU chips with some Multi-Processors (MP). Moreover, the
the 3D object is 4µm × 4µm, 633nm and 0.2 m, respectively. MP includes Stream Processors (SP), where the number of SPs
The class treats the 3D object as the source and the CGH as the per MP differs depending on the version of the NVIDIA GPU
destination plane. chip. One SP can operate 32- or 64-bit floating-point addition,
In line 4 of List 3, the member function “Create” allocates multiplication and multiply-add instructions. SPs in an MP op-
the required memory of the CGH. In line 5, the member func- erate the same instruction in parallel, and process different data.
tion “SetDstPitch” is set to the sampling pitches on the destina- Namely, an MP is a Single Instruction Multiple Data (SIMD)
tion plane, namely the CGH. In line 6, the first and second argu- fashion processor. In addition, each multiprocessor can operate
ments of the member function “SetOffset” are set to the offsets the same or different processing, thus allowing the GPU chip to
of the 3D object, namely horizontal and vertical offsets of 1 cm be used as a highly parallel processor. The specifications of the
(2500 pixels) and -2 mm (-512 pixels) from the origin, respec- GPUs used in this paper are shown in Table 3.
tively. In addition, the third argument sets the distance between
7
Figure 6: (a) CGH generated by the class cwoPLS. (b) Reconstructed image from (a) using the class CWO.

Table 3: Specifications of the GPUs used in this paper.

NVIDIA Geforce GTX 460M GeForce GTX295 (1 chip) GeForce GTX 580
CUDA processors 192 240 512
Memory amout (GBytes) 1.5 0.896 1.53
Core clock(GHz) 1.35 1.24 1.54
Memory clock (GHz) 1.25 1 2
Memory band width (GByte /sec) 60 223.8 192.4

The host computer controls the GPU board and the commu- itly execute the following steps, which are hidden to CWO++
nication between the host computer and the GPU board via the library users:
PCI-express bus. The host computer can also directly access
the device memory on the GPU board, which is used for stor- 1. Allocating the required memories for calculation on the
ing input data, namely the location of the source plane or 3D device memory.
object, and the destination plane as computed by the GPU. The 2. Sending the input data (source plane or 3D object data)
multiprocessor has a shared memory, which is limited, but low to the allocated device memory.
latency and faster than the device memory. In the classes, these 3. Executing kernel functions. The result (complex ampli-
memories are used for fast calculation. tude field or CGH on the destination plane) is stored in
The CUDA compiler compiles a C-like language source the allocated device memory.
code for the instruction set of the GPU chip, which is referred 4. Receiving the result from the allocated device memory.
to as a “Kernel”. We download this “Kernel” to the GPU chip 5. Releasing the allocated device memories.
via the PCI-express bus. The kernel codes in the CWO++ li-
brary are collected in the form of a library and DLL files on 3.1. Fresnel diffraction and CGH calculations on a GPU using
Windows OS, named “gwo lib.lib” and “gwo lib.dll”. When the GWO and gwoPLS classes
we use the GWO or gwoPLS classes, we need to link these files We show the source-code of a Fresnel diffraction calcula-
to our program. See more details in Appendix A. tion on a GPU using the GWO class. This example is the same
The diffraction calculations, as mentioned, require some as Section 2.2 except using GPU, but somewhat different in
FFT operations. The CUDA compiler allows us to accelerate comparison to List 1.
the FFT algorithm on an NVIDIA GPU chip, which is named
Listing 4: Fresnel diffraction calculation on a GPU using class GWO.
CUFFT. It is similar to the FFTW library [58], and we use
CUFFT for the GWO and gwoPLS classes. 1 CWO c;
When the GWO and gwoPLS classes are used, they implic- 2 GWO g;
3 c.Load("lena512x512.bmp");
8
8 c.Load("tyranno_000.3df");
9 g.ScalePoint(p∗1000);
10 g.Send(c);
11 g.PLS(CWO PLS FRESNEL CGH);
12 g.Scale(255);
13 g.Recv(c);
14 c.Save("cgh.bmp");

4. Implementations of the CWO and GWO classes

Now, we describe the implementations of the CWO and


GWO classes. For simplicity, we select the implementations of
the Fresnel diffraction (Eq. (5)) on the CWO and GWO classes
as an example.
Figure 8 shows the implementations. Figure 8(a) shows a
portion of the CWO member function “Diffract”, the functions
in which are defined as virtual functions of C++. The functions
“FFT”, “FresnelConvProp”, “Mult”, “IFFT” and “FresnelCon-
Figure 7: Structure of NVIDIA GPU chip. vCoeff” calculate FFT, the impulse response of Eq. (3), com-
plex multiplication, inverse FFT and the coefficient of Eq. (5).
When the function “Diffract” in the CWO class is called,
4 g.Send(c); the virtual functions defined in the CWO class are called (Fig.
5 g.Diffract(0.2, CWO FRESNEL CONV); 8(b)). Moreover, these functions also call the functions defined
6 g.Intensity(); in the library and DLL files of “cwo lib” (Fig. 8(d)), which are
7 g.Scale(255); activated on a CPU.
8 g.Recv(c); Conversely, when the function “Diffract” in the GWO class
9 c.Save("lena512x512_diffract.bmp"); is called, the virtual functions defined in the GWO class are
called (Fig.8(c)). Moreover, these functions call those defined
As the initial change, we define the instance “g” of the in the library and DLL files of “gwo lib” (Fig.8(e)). These func-
GWO class, in line 2. As the second change, the source plane, tions are activated on a GPU. Therefore, the function “Diffract”
namely Fig.2(a), is sent to a GPU by using the GWO class in the GWO class calculates the Fresnel diffraction on a GPU.
member function “Send”. When calling the member function The classes shown in the Table 1 are distributed as open-
“Send”, the first step involves the automatic allocation of the re- source code, while the functions in the library and DLL files
quired device memory on the GPU to the instance “g”, via the of “cwo lib” and “gwo lib”, which are closed source code, are
CUDA API function “cudaMalloc”. Subsequently, the source distributed as binary files.
plane on the host memory is sent to the device memory on the The current CWO++ library is compatible with Intel and
GPU using the CUDA API function “cudaMemcpy”. AMD x86 CPUs and NVIDIA GPUs. If we port the CWO++
In lines 5 to 7 of List 4, we change the instance names in library to other hardware, we need to inherit the CWO class to
lines 3 to 5 of List 1 from the instance “c” of the CWO class a new class indicating the new hardware, then re-define virtual
to the instance “g” of the GWO class. Therefore, these func- functions and write functions corresponding to “gwo lib”.
tions calculate the diffraction, intensity and normalization on For instance, new GPUs of the HD5000 and HD6000 series
the GPU. Finally, we receive the calculation result from the made by AMD were released. These GPUs have new archi-
GPU to the host in line 8. tecture and software environment, Open Computing Language
Next, we show the source-code of CGH calculation on a (OpenCL). The architectures also differ from NVIDIA GPUs.
GPU using the GWO class. This example is the same as Section We have already reported the CGH calculation of Eq. (23) [71]
2.4 except using GPU. There are also some changes as com- and Fresnel diffraction calculation [72]. Although we do not
pared with List 3. implement the CWO++ library on the AMD GPUs, we will
Listing 5: CGH calculation on a GPU using class gwoPLS. make a new class for AMD GPUs using the above mentioned
method.
1 cwoPLS c;
2 gwoPLS g;
3 float p=4e−6; 5. Field type
4 float z=0.2;
5 c.Create(2048,2048); The classes in the CWO++ library, namely the CWO, GWO,
6 c.SetDstPitch(p,p); cwoPLS and gwoPLS classes, have field types indicating cur-
7 c.SetOffset(2500∗p, −512∗p, z); rent fields. There are three field types: complex amplitude

9
Figure 8: Implementations of Fesnel diffraction in CWO and GWO.

field (the predefined macro: CWO FLD COMPLEX ), inten-


sity field (the predefined macro: CWO FLD INTENSITY ) and
phase field (the predefined macro: CWO FLD PHASE).
The above classes hold the field u(x, y) for the source or des-
tination plane. If the field type is CWO FLD COMPLEX, the
field u(x, y) maintains a complex amplitude field as a complex
number array,

u(x, y) = re(x, y) + i im(x, y) (24)

where re(x, y) and im(x, y) indicate real and imaginary com-


ponents of the complex value on (x, y). Because the current
CWO++ library has single floating point precisions for the real
and imaginary components, the memory amount for the field
u(x, y) is 8N x Ny bytes where N x and Ny are the number of pix-
els in the field.
Figure 9: Field types and mutual conversions between each field.
If the field type is CWO FLD PHASE, the class maintains
the phase field θ(x, y) as a real number array,
2. If we use member functions “Phase()” when the current
−1 im(x, y)
θ(x, y) = tan (25) field type is a complex amplitude field, the field is con-
re(x, y) verted from this to an argument of the same and the field
If the field type is CWO FLD INTENSITY, the class main- type is set to CWO FLD PHASE.
tains a real number array a(x, y) except the phase field. The real 3. If we use member functions “Complex()” when the cur-
number arrays include, for example, image data, amplitude, real rent field type is an intensity field, the field is converted
or imaginary, only part of the complex amplitude field and so from this to a complex amplitude field according to u(x, y)
on. The memory required for the intensity and phase fields is = re(x, y) + i im(x, y), where re(x, y) = a(x, y) is the in-
4N x Ny bytes. tensity field and im(x, y) is zero and the field type is set
Figure 9 shows each field type and mutual conversions be- to CWO FLD COMPLEX.
tween each field. We briefly describe the mutual conversions as 4. If we use member functions “Complex()” when the cur-
follows: rent field type is a phase field, the field is converted from
this to a complex amplitude field according to u(x, y) =
1. If we use the member functions “Re()” when the current re(x, y)+i im(x, y), where re(x, y) = cos(θ(x, y)) and im(x, y)
field type is a complex amplitude field, the field is con- = sin(θ(x, y)) and the field type is set to CWO FLD COM
verted from this to a real part only and the field type is PLEX.
set to CWO FLD INTENSITY. 5. If we use member functions “Complex(CWO &a, CWO
&b)” when the current field types of classes “a” and “b”
10
are the intensity and phase fields respectively, the fields 20 c.Phase();
are converted to a complex amplitude field according to 21 c.Scale(255);
u(x, y) = a(x, y)cos(θ(x, y)) + i a(x, y)sin(θ(x, y)), where 22 c.Save("phase.bmp");
“a” and “b” hold a(x, y) and θ(x, y), respectively and the 23

field type is set to CWO FLD COMPLEX. 24 b=a;


6. If we use member functions “Complex(CWO &a, CWO 25 c=a;
&b)” when the current field types of classes “a” and “b” 26 b.Amp();
are both intensity fields, the fields are converted to a com- 27 c.Phase();
plex amplitude field according to u(x, y) = re(x, y) 28 a.Complex(b,c);
+i im(x, y), where re(x, y) and im(x, y) are the fields of “a” 29 a.Diffract(−0.1,CWO ANGULAR);
and “b”, respectively and the field type is set to CWO FLD 30 a.Re();
COMPLEX. 31 a.Scale(255);
32 a.Save("complex.bmp");
List 6 and Fig.10 show examples of mutual conversions be-
tween each field type and their results, respectively. Figure 10
(a) is an original image. We calculate the diffracted light of the
figure in lines 2 and 3 of List 6 because we observe the real 6. Performance
part, imaginary part, amplitude and phase of the diffracted field
In this section, we show the calculation times of each diffrac-
respectively.
tion calculation and CGH calculation on an Intel CPU and NVID
The real part of the diffracted light (Fig.10(b)) is obtained
IA GPUs. We used an Intel CPU, which was a Core i7 740
in lines 8 to 10, and, the imaginary part of the diffracted light
QM (with CPU clock frequency of 1.7GHz), and three GPUs,
(Fig. 10(c)) is obtained in lines 11 to 13. The amplitude
namely NVIDIA GeForce GTX 460M, GerForce GTX 295 and
q GeForce GTX580. The GPU specifications are shown in Table
re(x, y)2 + im(x, y)2 3.
In the diffraction calculations, the impulse response and trans-
of the diffracted light (Fig.10(d)) is obtained in lines 17 to 19, fer functions of each diffraction are only sufficient to calculate
and, the phase of the diffracted light (Fig. 10(e)) is obtained in them once when the parameters, which are the propagation dis-
lines 20 to 22. In lines 26 to 28, we show the generation of the tance, the sampling pitches, wavelength, offsets and so on, are
complex amplitude field from the instances “a” and “b”, which unchanged. Of course, we need to re-calculate the impulse re-
hold amplitude (CWO FLD INTENSITY) and phase (CWO FL sponse and transfer functions when the parameters are changed.
D PHASE) respectively. In lines 29 to 32, we show the result Therefore, we evaluated the calculation times of each diffrac-
of the back propagation result (Fig.10(f)) from the position of tion in both cases of re-calculation and once-calculation.
the complex amplitude field to that of the original image. In the For the evaluation, we used Lists 1 and 4 and changed the
result, although we observe a diffraction effect to some extent image size and diffraction type. Table 4 shows the calculation
at the edges of the figure, the result is almost the same as the times of diffraction calculations on the CPU and GPUs with
Fig.10(a). recalculation of the impulse and transfer functions. Table 5
Listing 6: Example of mutual conversions of field types. shows the calculation times of diffraction calculations on the
CPU and GPUs with the once-calculation of the impulse and
1 CWO a,b,c;
transfer functions. In the table, except for the diffraction type
2 a.Load("lena512x512.bmp");
CWO FRESNEL FOURIER, we expand the area of the source
3 a.Diffract(0.1,CWO ANGULAR);
and destination planes from N x × Ny to 2N x × 2Ny during the
4
calculation and avoid aliasing by circular convolution. In the
5 b=a;
CPU calculations, we measured the time in line 3 of List 1. In
6 c=a;
the GPU calculations, we measured the time in lines 4 and 5 of
7
List 4.
8 b.Re();
Table 6 shows the calculation times of CGH calculations on
9 b.Scale(255);
the CPU and GPUs. For the evaluation, we used Lists 3 and 5
10 b.Save("re.bmp");
and changed the number of PLSs. In the CPU calculations, we
11 c.Im();
measured the time in line 9 of List 3. In the GPU calculations,
12 c.Scale(255);
we measured the time in lines 10 and 11 of List 5.
13 c.Save("im.bmp");
The calculation times of each diffraction calculation and
14
CGH calculation on the GPUs were much faster than those of
15 b=a;
the CPU.
16 c=a;
17 b.Amp();
18 b.Scale(255);
19 b.Save("amp.bmp");

11
Table 4: Calculation times of each diffraction calculation on the CPU and GPUs with the re-calculation of the impulse and transfer functions.

Resolution CPU (ms) GPU (ms)


Intel Core i7 740QM GeForce GTX 460M GeForce GTX295 (1 chip) GeForce GTX580
Fresnel diffraction convolution form (CWO FRESNEL CONV)
512 × 512 248 15 5 3
3
1024 × 1024 1.24 × 10 47 15 10
2048 × 2048 6.12 × 103 177 67 38
Fresnel diffraction Fourier form (CWO FRESNEL FRESNEL)
512 × 512 51.7 2.4 1 1
1024 × 1024 227 6 2 2
2048 × 2048 984 19 9 8
Shifted Fresnel diffraction (CWO SHIFTED FRESNEL)
512 × 512 477 15 5 3
1024 × 1024 2.07 × 103 48 16 10
2048 × 2048 9.48 × 103 186 71 40
Angular spectrum method (CWO ANGULAR)
512 × 512 260 12 3 2
1024 × 1024 1.17 × 103 36 11 8
2048 × 2048 5.56 × 103 135 47 29
Shifted angular spectrum method (CWO SHIFTED ANGULAR)
512 × 512 269 15 4 3
1024 × 1024 1.23 × 103 44 14 9
2048 × 2048 5.66 × 103 157 58 35

Table 5: Calculation times of each diffraction calculation on the CPU and GPUs with the once-calculation of the impulse and transfer functions.

Resolution CPU (ms) GPU (ms)


Intel Core i7 740QM GeForce GTX 460M GeForce GTX295 (1 chip) GeForce GTX580
Fresnel diffraction convolution form (CWO FRESNEL CONV)
512 × 512 117 10 3 2
1024 × 1024 620 29 11 6
2048 × 2048 3.30 × 103 104 48 26
Fresnel diffraction Fourier form (CWO FRESNEL FRESNEL)
512 × 512 52 2 1 1
1024 × 1024 229 5.4 2 2
2048 × 2048 993 19 9 8
Shifted Fresnel diffraction (CWO SHIFTED FRESNEL)
512 × 512 346 10 3 2
1024 × 1024 1.50 × 103 30 11 7
2048 × 2048 6.91 × 103 110 51 28
Angular spectrum method (CWO ANGULAR)
512 × 512 121 9.5 3 2
1024 × 1024 624 26 10 6
2048 × 2048 3.30 × 103 97 44 24
Shifted angular spectrum method (CWO SHIFTED ANGULAR)
512 × 512 128 10 3 2
1024 × 1024 665 31 12 7
2048 × 2048 3.51 × 103 119 54 29

12
Figure 10: Resluts of mutual conversions between each field. (a) original image (b) real part of the diffracted light (c) imaginary part of the diffracted light (d)
amplitude of the diffracted light (e) phase of the diffracted light and (f) back propagation from the complex amplitude of (d) and (e).

Table 6: Calculation times of CGH calculation on the CPU and GPUs

CPU (ms) GPU (ms)


Number of PLSs Intel Core i7 740QM GeForce GTX 460M GeForce GTX295 (1 chip) GeForce GTX580
248 1.3 × 104 87 67 31
4596 2.2 × 105 650 562 230
11646 5.9 × 105 1.7 × 103 1.43 × 103 579

7. Applications to holography to the complex amplitude field (CWO FLD COMPLEX). In


lines 8 to 11, we calculate the reconstructed image from the
In this section. we show some applications to holography kinoform using the back propagation relative to the position of
using the CWO++ library and its performances on the CPU the original image. Figure 11 (a) and (b) show the kinoform
and GPUs. pattern and the reconstructed image from the kinoform.
Listing 7: Inline phase-only CGH
7.1. Inline phase-only CGH (Kinoform)
In this subsection, we show an example of generating an 1 CWO c;
inline phase-only CGH, also known as kinoform. A kinoform 2 c.Load("lena512x512.bmp");
is calculated only by extracting the phase of a diffracted light 3 c.SetRandPhase();
onto the kinoform plane. List 7 shows the generation of an 4 c.Diffract(0.1, CWO FRESNEL CONV);
inline phase-only CGH with 512 × 512 pixels from the original 5 c.Phase();
6
image (Fig. 2 (a)).
In line 3, we add a random phase to the original image using 7 c.Complex();
the function “SetRandPhase()” to spread the light. The function 8 c.Diffract(−0.1, CWO FRESNEL CONV);
automatically sets to the complex amplitude field (CWO FLD C 9 c.Intensity();
OMPLEX). The random phase is generated by Xorshift RNGs 10 c.Scale(255);
algorithm [73]. In lines 4 and 5, we calculate the kinoform by 11 c.Save("kinoform_reconst.bmp");
diffracting the original image at the propagation distance of 0.1
m, and subsequently extract the phase only from the diffracted 7.2. GS algorithm
light, which is the kinoform. In the subsection, we implement the GS algorithm on the
After line 7, these codes are for reconstruction from the ki- CPU and GPUs using the CWO++ library and show the per-
noform. In line 7, we convert the phase field (CWO FLD PHASE) formances of the algorithm on the CPU and GPUs. Although

13
In lines 15 and 16, we recalculate a new kinoform from the
new complex amplitude generated by line 13. Repeating the
above processes, the GS algorithms gradually improve the qual-
ity of the reconstructed images. In lines 19 to 23, we calculate
a final reconstructed image from the kinoform.
Listing 8: GS algorithm on CPU.
1 CWO a1,a2;
2 a1.Load("lena2048x2048.bmp");
3 a1.Sqrt();
4 a2=a1;
5 a2.SetRandPhase();
6 a2.Diffract(0.1,CWO ANGULAR);
Figure 11: (a) Kinoform and (b) reconstructed image from the kinoform.
7 a2.Phase();
8 for(int i=0;i<ite num;i++){
it is possible to obtain a complete reconstructed image from a 9 a2.Complex();
complex amplitude field, unfortunately, we lack an appropri- 10 a2.Diffract(−0.1,CWO ANGULAR);
ate electric device to display the amplitude and phase of the 11 a2.Phase();
complex amplitude field simultaneously. Therefore, we need to 12

select either the amplitude or the phase components of the com- 13 a2.Complex(a1,a2);
plex amplitude field, which will cause the reconstructed image 14

to deteriorate due to lack of information on the complex am- 15 a2.Diffract(0.1,CWO ANGULAR);


plitude field. We employ the GS algorithm as an iterative al- 16 a2.Phase();
gorithm [18, 19] in order to improve the deterioration of the 17 }
reconstructed image. 18

Figure 12 shows typical a GS algorithm. In the GS algo- 19 a2.Complex();


rithm for Fourier holograms, Fourier and inverse Fourier trans- 20 a2.Diffract(−0.1,CWO ANGULAR);
forms correspond to reconstructions from a hologram and holo- 21 a2.Intensity();
gram generation, respectively. In the subsection, instead of 22 a2.Scale(255);
Fourier and inverse Fourier transforms, we use the angular spec- 23 a2.Save("gs_on_cpu.bmp");
trum method and the back propagation of the same.
We start the iteration by adding a random phase to an input List 9 shows an example of the GS algorithm on a GPU.
image, and calculate the diffraction calculation from the latter. The example is almost the same as to List 8. The iteration of
We extract only the phase components (“Phase constraints” in lines 11 to 20 is executed on a GPU, so that the example will be
Fig.12) from the diffracted lights to generate a kinoform. The calculated faster than the CPU version of List 8.
kinoforms are reconstructed by inverse diffraction calculation. Listing 9: GS algorithm on GPU.
We replace the amplitude of the reconstructed light with the
1 CWO c1,c2;
original input image (“Amplitude constraint” in Fig.12). Re-
2 GWO g1,g2;
peating the above processes, the GS algorithms gradually im-
3 c1.Load("lena2048x2048.bmp");
prove the quality of the reconstructed images.
4 c1.Sqrt();
List 8 shows an example of the GS algorithm on a CPU. In
5 c2=c1;
lines 1 to 7, we load an input image, calculate its square root, set
6 c2.SetRandPhase();
a random phase to it, calculate the diffracted light to a kinoform
7 g1.Send(c1);
plane, and subsequently generate a kinoform only by extracting
8 g2.Send(c2);
the phase of the diffracted light. The information of the original
9 g2.Diffract0.1,CWO ANGULAR);
image is maintained in the instance “a1”, while instance “a2” is
10 g2.Phase();
used for the forward and back propagations.
11 for(int i=0;i<ite num;i++){
In lines 8 to 17, we execute the iteration, the number of
12 g2.Complex();
which is decided by “ite num”. In lines 9 and 11, we calculate
13 g2.Diffract(−0.1,CWO ANGULAR);
the reconstructed image by the back propagation of the angular
14 g2.Phase();
spectrum method at a propagation distance of -0.1 m and extract
15
only the phase information of the reconstructed light. In line
16 g2.Complex(g1,g2);
13, we replace the amplitude of the reconstructed light with the
17
original input image (“Amplitude constraint” in Fig.12), where
18 g2.Diffract(0.1,CWO ANGULAR);
instances “a1” and “a2” hold the field type of CWO FLD INTE
19 g2.Phase();
NSITY and CWO FLD PHASE, respectively.
20 }

14
Figure 12: GS algorithm in Fresnel region in order to improve the deterioration of a reconstructed image.

21 fairs and Communications, Strategic Information and Commu-


22 g2.Complex(); nications R&D Promotion Programme (SCOPE)(09150542), 2009.
23 g2.Diffract(−0.1,CWO ANGULAR);
24 g2.Intensity();
References
25 g2.Scale(255);
26 g2.Recv(c1); [1] J.W.Goodman, “Introduction to Fourier Optics (3rd ed.),” Robert & Com-
27 c1.Save("gs_on_gpu.bmp"); pany (2005).
[2] Okan K. Ersoy, “Diffraction, Fourier Optics And Imaging,” Wiley-
Interscience (2006).
Changing the resolution of the input image and the iteration [3] E. G. Williams, “Fourier Acoustics – Sound Radiation and Nearfield
number, we compare the calculation times of the GS algorithm Acoustical Holography,” Academic Press (1999).
on the CPU and GPUs, which are shown in Table 7. In the CPU [4] D.M. Paganin, “Coherent X-Ray Optics,” Oxford University Press
calculations, we measured the time in lines 3 to 22 of List 8. In (2006).
[5] C. Slinger, C. Cameron, M. Stanley, M, “Computer-Generated Hologra-
the GPU calculations, we measured the time in lines 4 to 26 of phy as a Generic Display Technology,” Computer 38, 46–53 (2005).
List 9. The calculation times on GPUs were much faster than [6] S. A. Benton et al., “Holographic Imaging,” Wiley-Interscience (2008).
those on the CPU. [7] S.C. Kim and E.S. Kim, “Effective generation of digital holograms of
Figures 13 (a), (b) and (c) show the reconstructed images three-dimensional objects using a novel look-up table method,” Appl.
Opt. 47, D55–D62 (2008) .
when the resolution of the original image was 2, 048 × 2, 048 [8] H. Sakata and Y. Sakamoto, “Fast computation method for a Fresnel holo-
pixels and the iteration numbers were 5, 20, and 40 respectively. gram using three-dimensional affine transformations in real space,” Appl.
Opt. 48, H212–H221 (2009).
[9] H. Yoshikawa, T. Yamaguchi, and R. Kitayama, “Real-Time Generation
8. Conclusion of Full color Image Hologram with Compact Distance Look-up Table,”
OSA Topical Meeting on Digital Holography and Three-Dimensional
We developed the CWO++ library using the C++ class li- Imaging 2009, DWC4 (2009).
[10] K. Matsushima and S. Nakahara, “Extremely high-definition full-parallax
brary to calculate the 2D and 3D diffraction and CGH calcu-
computer-generated hologram created by the polygon-based method,”
lations on CPU and GPU. Our previous C-language based li- Appl. Opt. 48, H54–H63 (2009).
brary, GWO, was not user-friendly because, for example, GWO [11] Y. Liu, J. Dong, Y. Pu, H. He, B. Chen, H. Wang, H. Zheng and Y. Yu,
library users have to manage the CPU and GPU memory al- “Fraunhofer computer-generated hologram for diffused 3D scene in Fres-
nel region,” Opt. Lett. 36, 2128–2130 (2011).
location by themselves and so on. The CWO++ library re- [12] U.Schnars and W. Juptner, “Direct recording of holograms by a CCD
mains user-friendly by concealing troublesome programming target and numerical Reconstruction,” Appl.Opt., 33, 2, 179–181 (1994).
within classes and the GPU calculation power while unaware [13] U.Schnars and W.Jueptner, “Digital Holography - Digital Holo-
of the GPGPU technique. Applications capable of applying the gram Recording, Numerical Reconstruction, and Related Techniques,”
Springer (2005).
CWO++ library cover a wide range of optics, ultrasonic and [14] M. K. Kim, “Principles and techniques of digital holographic mi-
X-ray fields and so on. In this paper, applications to hologra- croscopy,” SPIE Reviews 1, 018005 (2010).
phy are shown. The CWO++ library will be distributed from [15] M. Gustafsson, M. Sebesta, B. Bengtsson, S. G. Pettersson, P. Egel-
http://brains.te.chiba-u.jp/~shimo/cwo/. berg, and T. Lenart, “High-resolution digital transmission microscopy:
a Fourier holography approach,” Opt. Lasers Eng. 41, 553–563 (2004).
[16] N. Masuda, T. Ito, K. Kayama, H. Kono, S. Satake, T. Kunugi and Kazuho
Sato, “Special purpose computer for digital holographic particle tracking
Acknowledgments velocimetry,” Opt. Express 14, 587–592 (2006).
[17] S. Satake, H. Kanamori, T. Kunugi, K. Sato, T. Ito, and K. Yamamoto,
This research was partially supported by Japan Society for “Parallel computing of a digital hologram and particle searching for
the Promotion of Science (JSPS), Grant-in-Aid for Young Sci- microdigital-holographic particle-tracking velocimetry,” Appl. Opt. 46,
entists (B), 23700103, 2011, and, the Ministry of Internal Af- 538–543 (2007).

15
Table 7: Calculation times of the GS algorithm on the CPU and GPUs.

Number of iterations CPU (ms) GPU (ms)


Intel Core i7 740QM GeForce GTX 460M GeForce GTX295 (1 chip) GeForce GTX580
Resolution of Input image : 512 × 512
5 3.60 × 103 173 236 188
10 6.53 × 103 285 273 210
20 1.22 × 104 516 384 253
40 2.36 × 104 984 482 338
Resolution of Input image : 1, 024 × 1, 024
5 1.59 × 104 539 920 731
10 2.88 × 104 883 1.06 × 103 797
20 5.56 × 104 1.59 × 103 1.36 × 103 930
5 3
40 1.07 × 10 3.02 × 10 1.73 × 103 1.19 × 103
Resolution of Input image : 2, 048 × 2, 048
5 7.30 × 104 2.03 × 103 3.85 × 103 2.91 × 103
5 3
10 1.33 × 10 3.36 × 10 4.25 × 103 3.16 × 103
5 3
20 2.53 × 10 6.01 × 10 5.29 × 103 3.65 × 103
5 4
40 4.98 × 10 1.12 × 10 7.17 × 103 4.65 × 103

Figure 13: the reconstructed images when the resolution of the original image is 2, 048 × 2, 048 pixels and the numbers of iteration are 5, 20, and 40 respectively.

[18] J. R. Fienup, “ Phase retrieval algorithms: a comparison, ” Appl. Opt. 21, [27] M. Makowski, M. Sypek, I. Ducin, A. Fajst, A. Siemion, J. Suszek, and A.
2758–2769 (1982). Kolodziejczyk, “Experimental evaluation of a full-color compact lensless
[19] R. G. Dorsch, A. W. Lohmann, and S. Sinzinger, “Fresnel ping-pong algo- holographic display, ” Opt. Express 17, 20840–20846 (2009).
rithm for two-plane computer-generated hologram display, ” Appl. Opt. [28] T. Shimobaba, T. Takahashi, N. Masuda, and T. Ito, “Numerical study of
33, 869–875 (1994). color holographic projection using space-division method,” Opt. Express
[20] G. Yang, B. Dong, B. Gu, J. Zhuang, and O. K. Ersoy, “Gerchberg-Saxton 19, 10287-10292 (2011) .
and Yang-Gu algorithms for phase retrieval in a nonunitary transform sys- [29] T. Shimobaba, A. Gotchev, N. Masuda and T. Ito, “Proposal of zoomable
tem: a comparison, ” Appl. Opt. 33, 209–218 (1994). holographic projection method without zoom lens,” IDW’11 (The 18th
[21] G. Pedrini, W. Osten, and Y. Zhang, “Wave-front reconstruction from a international Display Workshop) (to be appeared in Dec. 2011).
sequence of interferograms recorded at different planes, ” Opt. Lett. 30, [30] O. Matoba and B. Javidi, “Encrypted optical memory system using three-
833–835 (2005). dimensional keys in the Fresnel domain, ” Opt. Lett. 24, 762–764 (1999).
[22] D. Zheng, Y. Zhang, J. Shen, C. Zhang and G. Pedrini, “ Wave field recon- [31] E. Tajahuerce and B. Javidi, “Encrypting three-dimensional information
struction from a hologram sequence, ”Opt. Communications 249 73–77 with digital holography, ” Appl. Opt. 39, 6595–6601 (2000).
(2005). [32] B. Javidi and Takanori Nomura, “Securing information by use of digital
[23] A. Grjasnow, A. Wuttig and R. Riesenberg, “ Phase resolving microscopy holography, ” Opt. Lett. 25, 28–30 (2000).
by multi-plane diffraction detection, ” J. Microscopy 231, 115–123 [33] H. Hamam, “Digital holography-based steganography, ” Opt. Lett. 35,
(2008). 4175–4177 (2010).
[24] E. Buckley, “Holographic Laser Projection, ” J. Display Technol. 99, 1–6 [34] R. Piestunand and J. Shamir, “Synthesis of three-dimensional light fields
(2010). and applications,” Proc. IEEE 90, 222–244 (2002).
[25] E. Buckley, “Holographic projector using one lens, ” Opt. Lett. 35, 3399– [35] T. P. Kurzweg, S. P. Levitan, P. J. Marchand, J. A. Martinez, K. R.
3401 (2010). Prough, D. M. Chiarulli, “A CAD Tool for Optical MEMS, ” Proc.36th
[26] M. Makowski, M. Sypek, and A. Kolodziejczyk, “Colorful reconstruc- ACM/IEEE conf. on Design automation, 879–884 (1999).
tions from a thin multi-plane phase hologram, ” Opt. Express 16, 11618– [36] T. P. Kurzweg, S. P. Levitan, J. A. Martinez, M. Kahrs, D. M. Chiarulli,
11623 (2008). “An Efficient Optical Propagation Technique for Optical MEM Simula-

16
tion, ” Fifth International Conference on Modeling and Simulation of Mi- [61] M. Lucente, “Interactive Computation of holograms using a Look-up Ta-
crosystems (MSM2002), 352–355 (2002). ble,” J. Electron. Imaging 2, 28–34 (1993).
[37] T. Ito, T. Yabe, M. Okazaki and M. Yanagi, “Special-purpose computer [62] M. Lucente and T. A. Galyean, “Rendering Interactive Holographic Im-
HORN-1 for reconstruction of virtual image in three dimensions,” Com- ages,” Proc. of SIGGRAPH 95 387–394 (1995).
put.Phys.Commun. 82, 104–110 (1994). [63] N. Masuda, T. Ito, T. Tanaka, A. Shiraki and T. Sugie, “Computer gen-
[38] T. Ito, H. Eldeib, K. Yoshida, S. Takahashi, T. Yabe and T. erated holography using a graphics processing unit,” Opt. Express 14,
Kunugi, “Special-Purpose Computer for Holography HORN-2,” Com- 587–592 (2006).
put.Phys.Commun. 93, 13–20 (1996). [64] L. Ahrenberg, P. Benzie, M. Magnor, J. Watson, “Computer generated
[39] T.Shimobaba, N.Masuda, T.Sugie, S.Hosono, S.Tsukui and T.Ito, holography using parallel commodity graphics hardware,” Opt. Express
“Special-Purpose Computer for Holography HORN-3 with PLD technol- 14, 7636–7641 (2006).
ogy,” Comput. Phys. commun. 130, pp. 75–82, (2000). [65] H. Kang, F. Yaras, and L. Onural, “Graphics processing unit accelerated
[40] T. Shimobaba, S. Hishinuma and T.Ito, “Special-Purpose Computer for computation of digital holograms,” Appl. Opt. 48, H137–H143 (2009).
Holography HORN-4 with recurrence algorithm,” Comput. Phys. Com- [66] Y. Pan, X. Xu, S. Solanki, X. Liang, R. Bin A. Tanjung, C. Tan, and T. C.
mun. 148, 160–170 (2002). Chong, “Fast CGH computation using S-LUT on GPU,” Opt. Express 17,
[41] T. Ito, N. Masuda, K. Yoshimura, A. Shiraki, T. Shimobaba and T. Sugie, 18543–18555 (2009).
“A special-purpose computer HORN-5 for a real-time electroholography,” [67] L. Ahrenberg, A. J. Page, B. M. Hennelly, J. B. McDonald, and T. J.
Opt. Express 13 1923-1932 (2005). Naughton, “Using Commodity Graphics Hardware for Real-Time Digital
[42] Y. Ichihashi, H. Nakayama, T. Ito, N. Masuda, T. Shimobaba, A. Shiraki Hologram View-Reconstruction,” J. Display Technol. 5, 111–119 (2009).
and T. Sugie, “HORN-6 special-purpose clustered computing system for [68] D. Carl, M. Fratz, M. Pfeifer, D. M. Giel, and H. Hofler, “Multiwave-
electroholography,” Opt. Express 17, 13895–13903 (2009). length digital holography with autocalibration of phase shifts and artificial
[43] Y. Abe, N. Masuda, H. Wakabayashi, Y. Kazo, T. Ito, S. Satake, T. Kunugi wavelengths,” Appl. Opt. 48, H1–H8 (2009).
and K. Sato, “Special purpose computer system for flow visualization [69] N. Pandey, D. P. Kelly, T. J. Naughton and B. M. Hennellyr, “Speed up of
using holography technology,” Opt. Express 16, 7686–7692 (2008). Fresnel transforms for digital holography using pre-computed chirp and
[44] T. Shimobaba, T. Ito, N. Masuda, Y. Abe, Y. Ichihashi, H. Nakayama, GPU processing,” Proc. SPIE 7442, 744205 (2009).
N. Takada, A.Shiraki and T. Sugie, “Numerical calculation library for [70] C. Trujillo, John F. Restrepo and J. Garcia-Sucerquia, “Real time nu-
diffraction integrals using the graphic processing unit: the GPU-based merical reconstruction of digitally recorded holograms in digital in-line
wave optics library,” Journal of Optics A: Pure and Applied Optics, 10, holographic microscopy by using a graphics processing unit,” Photonics
075308, 5pp, (2008). Letters of Poland 2, 177–179 (2010).
[45] T.Shimobaba, Y.Sato, J.Miura, M.Takenouchi, and T.Ito, “Real-time dig- [71] T. Shimobaba, T. Ito, N. Masuda, Y. Ichihashi, and N. Takada, “Fast cal-
ital holographic microscopy using the graphic processing unit,” Opt. Ex- culation of computer-generated-hologram on AMD HD5000 series GPU
press 16, 11776-11781 (2008) and OpenCL,” Opt. Express 18, 9955–9960 (2010).
[46] T.Shimobaba, J.Miura and T.Ito, “A computer aided design tool for de- [72] T. Nishitsuji, T. Shimobaba, T. Sakurai, N. Takada, N. Masuda, and T. Ito,
veloping an electroholographic display,” Journal of Optics A: Pure and “Fast calculation of Fresnel diffraction calculation using AMD GPU and
Applied Optics, 11, 085408 (5pp) (2009) OpenCL,” in Digital Holography and Three-Dimensional Imaging, OSA
[47] T. Shimobaba, N. Masuda and T. Ito, “Simple and fast calclulation algo- Techinal Digest (CD) (Optical Society of America, 2011), paper DWC20.
rithm for computer-generated hologram with wavefront recording plane,” [73] G. Marsaglia, “Xorshift RNGs,” Journal of Statistical Software 8, 1–6
Opt. Lett. 34, 3133–3135 (2009). (2003).
[48] T. Shimobaba, H. Nakayama, N. Masuda and T. Ito, “Rapid calcula- [74] CImg homepage, http://cimg.sourceforge.net/
tion of Fresnel computer-generated-hologram using look-up table and [75] ImageMagik homepage, http://www.imagemagick.org/script/index.php
wavefront-recording plane methods for three-dimensional display,” Op-
tics Express, 18, 19, 19504-19509 (2010).
[49] T. Shimobaba, N. Masuda, Y. Ichihashi and T. Ito, “Real-time digital holo- Appendix A. System requirements and installation
graphic microscopy observable in multi-view and multi-resolution,” Jour-
nal of Optics, 12, 065402 (4pp) (2010) System requirements for the CWO++ library are as follows:
[50] H. T. Dai, X. W. Sun, D. Luo, and Y. J. Liu, “Airy beams generated by
a binary phase element made of polymer-dispersed liquid crystals,” Opt. 1. OS : for Windows XP(32/ 64 bit), 7
Express 17, 19365-19370 (2009)
[51] D. Luoa, H.T . Dai , X. W. Sun, and H. V. Demira, “Electrically switch- 2. CUDA : CUDA 4.0 (32 bit version) (if the GWO class or
able finite energy Airy beams generated by a liquid crystal cell with pat- gwoPLS is used)
terned electrode,” Optics Communications, 283, 3846-3849 (2010).
[52] R. P. Muffoletto, J. M. Tyler, and J. E. Tohline, “Shifted Fresnel diffraction The installation of the CWO++ library involves the follow-
for computational holography,” Opt. Express 15, 5631–5640 (2007). ing steps:
[53] D. H. Bailey and P. N. Swarztrauber, “The Fractional Fourier Transform
and Applications,” SIAM Review 33, 389–404 (1991). 1. Create a project file of Visual C++.
[54] M. Leutenegger, R. Rao, R. A. Leitgeb, and T. Lasser, “Fast focus field 2. Ensure the following dll and library files are placed in
calculations,” Opt. Express 14, 11277–11291 (2006).
[55] J. F. Restrepo and J. Garcia-Sucerquia, “Magnified reconstruction of digi- your project directory:
tally recorded holograms by Fresnel Bluestein transform,” Appl. Opt. 49, (a) cwo.dll, cwo.lib, libfftw3f-3.dll (libfftw3f-3.dll can
6430–6435 (2010). be download from Ref. [58])
[56] K. Matsushima, “Shifted angular spectrum method for off-axis numerical
propagation,” Opt. Express 18, 18453–18463 (2010).
(b) gwo.dll, gwo.lib (if you use GPU version of the
[57] K. Matsushima and T. Shimobaba, “Band-limited angular spectrum CWO++ library)
method for numerical simulation of free-space propagation in far and near 3. Set library files (*.lib) to your VISUAL C++ project.
fields,” Opt. Express 17, 19662–19673 (2009).
[58] FFTW Home Page, http://www.fftw.org/.
4. Set the following C++ and header files to your project:
[59] J. Lin, X.-C. Yuan, S. S. Kou, C. J. R. Sheppard, O. G. Rodriguez-Herrera (a) cwo.h, cwo.cpp, cwo lib.h
and J. C. Dainty, “Direct calculation of a three-dimensional diffracted (b) gwo.h, gwo.cpp, gwo lib.h (if you use the GPU ver-
field,” Opt. Lett. 36, 1341–1343 (2011).
[60] L. Ahrenberg, P. Benzie, M. Magnor and J. Watson, “Computer generated
sion of the CWO library)
holograms from three dimensional meshes using an analytic light trans-
port model,” Appl. Opt. 47, 1567–1574 (2008).

17
Appendix B. Image formats

Using CImag library [74], CWO library allows us to read


and write image data in the following formats:
1. Bitmap
2. Jpeg
3. Png
4. Tiff
If you read and write image formats other than bitmap format,
you must install ImageMagik [75] on your computer.

18

You might also like