Oe 31 6 10114
Oe 31 6 10114
Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10114
10.1364/opticaopen.22009556
Abstract: Digital holography is a 3D imaging technique by emitting a laser beam with a plane
wavefront to an object and measuring the intensity of the diffracted waveform, called holograms.
The object’s 3D shape can be obtained by numerical analysis of the captured holograms and
recovering the incurred phase. Recently, deep learning (DL) methods have been used for more
accurate holographic processing. However, most supervised methods require large datasets
to train the model, which is rarely available in most DH applications due to the scarcity of
samples or privacy concerns. A few one-shot DL-based recovery methods exist with no reliance
on large datasets of paired images. Still, most of these methods often neglect the underlying
physics law that governs wave propagation. These methods offer a black-box operation, which
is not explainable, generalizable, and transferrable to other samples and applications. In this
work, we propose a new DL architecture based on generative adversarial networks that uses a
discriminative network for realizing a semantic measure for reconstruction quality while using a
generative network as a function approximator to model the inverse of hologram formation. We
impose smoothness on the background part of the recovered image using a progressive masking
module powered by simulated annealing to enhance the reconstruction quality. The proposed
method exhibits high transferability to similar samples, which facilitates its fast deployment in
time-sensitive applications without the need for retraining the network from scratch. The results
show a considerable improvement to competitor methods in reconstruction quality (about 5 dB
PSNR gain) and robustness to noise (about 50% reduction in PSNR vs noise increase rate).
© 2023 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement
1. Introduction
Digital holography (DH) is a commonly-used technique to exploit the 3D shape of microscopic
objects, something not feasible with regular cameras. This powerful technique is used in various
applications, including micro-particle measurement [1,2], biology [3], encryption [4], and visual
identification tags [5]. The core idea behind DH is that a laser beam with a plane wavefront
experiences diffraction and phase shift when it encounters a microscopic object. The interfering
wave intensity, also called hologram, is captured by a charge-coupled device (CCD) sensor array.
The goal of DH is to reconstruct the object’s 3D shape by processing the captured holograms
[6,7]. In short, if O(x, y) and H(x, y) are the object wave and the captured hologram, the goal
is recovering O(x, y) (or equivalently the 3D facet of the sample) from H(x, y), which involves
twin-image removal.
Compared to off-axis holography, digital inline holography (DIH) entails a much easier
hologram rendering method by emitting only one beam through the object and processing the
diffracted wave. However, it requires more complex numerical methods for phase recovery to
#480894 https://doi.org/10.1364/OE.480894
Journal © 2023 Received 18 Nov 2022; revised 3 Feb 2023; accepted 5 Feb 2023; published 6 Mar 2023
-- Compressed PDF version --
Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10115
deconvolve the spatially overlapping zero-order and cross-correlated holographic terms. Some
methods rely on taking multiple images at different positions to enhance the phase recovery
performance [8]. In this work, we use transparent inline holography (Fig. 1) with single-shot
imaging and numerical reconstruction for its more straightforward design and potential for
developing low-cost compact and portable readers appropriate for Internet of Things (IoT) and
supply chain applications [9], especially for dendritic tags, our custom-designed visual identifiers
[10].
where Tf is the forward propagator, ∥.∥2 is the 2nd norm, ∥ · ∥tv is the total variation norm, and τ
is a tuning parameter. This method is more efficient than the iterative methods, hence is used
as a benchmark method in some recent papers [5,12] including our comparative results in this
paper. However, it suffers from a few technical issues. For example, imposing explicit sparsity
constraints can cause the edge distortion problem. Moreover, the results are sensitive to the
choice of τ.
Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10116
Although their one-shot inference was fast, the training time was fairly long, using about 6,000
image pairs (30,000 pairs after data augmentation).
Supervised DL methods, including the aforementioned methods, offer superior performance
in phase recovery. Nevertheless, they usually suffer from the obvious drawback of reliance on
relatively large datasets for training purposes. For instance, the models in [16,17] require about
10,000 training pairs. This requirement becomes problematic since such huge DH datasets rarely
exist for different sample types. Even if such datasets exist, the training time can be prohibitively
long for time-sensitive applications. For instance, the training time with a typical GPU: GeForce
GTX 1080 is about 14.5 hours for the model proposed in [18]. Since the training process is not
transferable and should be repeated for different setups and sample types, such a long training
phase is not practically desirable. In some other applications, such as authenticating objects
using nano-scaled 3D visual tags, data sharing can be prohibited for security reasons [10].
To address the scarcity of paired DH samples, some recent works utilize unpaired data
(unmatched holograms and samples) to train their network [24]. Specifically, a cycle-generative
adversarial network (CycleGAN) is employed in [24] to reconstruct the object wave from the
hologram by training the model with holograms (denoted as domain X) and unmatched objects
(denoted as domain Y). Particularly, two generators are used to learn the functions X → Y
and Y → X. A consistency loss is used to enforce the training progress X → Y → X̂ ≈ X. A
similar method based on CycleGAN, called PhaseGAN, is proposed in [25], which used unpaired
data for training. The near-field Fresnel propagator [26] is employed as part of their framework.
Although these methods do not require matched object-hologram samples, they still need large
datasets of unmatched hologram samples in the training phase.
Considering the difficulties of developing large DH datasets, some attempts have been made
recently to create unsupervised learning frameworks [5,27,28]. Most of these frameworks utilize
CNN architectures as their backbones since they can capture sufficient low-level image features
to reproduce uncorrupted and realistic image parts [29]. Often, a loss function is employed
to minimize the distance between the captured hologram and the artificial hologram obtained
by forward-propagating the recovered object wave. For example, our previous work [5] uses
an hourglass encoder-decoder structure to reconstruct the object wave from DIH holograms.
Inspired by the Deep decoder concept proposed in [27], the reconstruction algorithm in [28]
abandoned the encoder part and only used a decoder with a fixed random tensor as its input.
Some classical regularization methods, such as total variation (TV) loss and weight decay, are
applied to partially solve the noisy and incomplete signal problem. PhysenNet used a U-net
architecture [30] to retrieve the phase information [31]. Most recently, an untrained CNN-based
network is employed in dual-wavelength DIH, which benefits from the CNN’s capability of image
reconstruction and denoising and the Dual-wavelength setup’s capability of phase unwrapping
[12].
Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10117
in our network is textitfunction approximator to model the inverse of the hologram generation
process (i.e., mapping the hologram to complex-valued object wave), as opposed to general
GANs, where the generator network learns the data distribution to create new samples from noise.
Another drawback of most aforementioned DL-based methods is their lack of interpretability
and ignorance of physics knowledge. Therefore, there are always two risks (i) over-fitting and
(ii) severe performance degradation under minor changes to sample characteristics and test
conditions. We address these issues in two different ways. First, we incorporate forward and
backward propagation into our model, following some recent works [5,28,31]. Secondly, we
implement a new spatial attention module using an adaptive masking process to split the object
pattern into foreground and background regions and impose smoothness on the image background.
The background mask update is performed based on the reconstructed object wave quality to be
regulated by simulated annealing (SA) optimization to start from more aggressive updates and
settle with more conservative changes when the network is converged. Imposing smoothness
constraint on the gradually-evolving background area, makes our method fundamentally different
than some iterative methods that enforce physics-driven hologram formation equations on the
support region (i.e., the foreground) [34,35] or the entire image [36].
We show that our framework is generic and independent of the choice of the generator network.
In particular, we tested our framework with two recently developed generators, the fine-tuned
version of DeepDIH [5] and the deep compressed object decoder (DCOD) [28]. We also show
that adding a super-resolution layer to the utilized auto-encoder (AE) improves the quality of the
phase recovery.
This paper is organized as follows. Section 2 reviews the hologram formation process and
recasts it as a nonlinear inverse problem. Section 3 elaborates on the details of the proposed DL
method for phase recovery, highlighting its key features and differences from similar methods.
Experimental results for simulated holograms, publicly available samples, and our dendrite
samples are presented in Section 4 followed by concluding remarks in Section 5.
2. Problem formulation
The goal of this work is to design an unsupervised physics-driven DL network to reconstruct
the 3D surface of microscopic objects, especially dendrites, micro-scaled security tags used
to protect supply chains against cloning and counterfeit attacks (see Section 4.4 for details of
dendrites).
The incident wave passing through a thin transparent object can be characterized as a
complex-valued value
O(x, y; z = 0) = R(x, y; z = 0)t(x, y), (1)
where R(x, y; z = 0) is the reference wave (i.e., the incident wave if the object is not present)
and t(x, y) = A(x, y)exp(jϕ(x, y)) is the incurred perturbation term caused by the object. t(x, y)
includes attenuation A(x, y) and phase shift ϕ(x, y) [37]. After performing forward-propagation
described by the angular spectrum method at distance z = d, O(x, y; z = d) is formed as follows
O(x, y; z = d) = p(λ, z = d) ⊛ O(x, y; z = 0)
(2)
= F −1 {P(λ, z = d) · F {O(x, y; z = 0)}},
where λ represents the wavelength and ⊛ is the convolution operator. F {·} and F −1 {·} denote
the direct and inverse Fourier transforms, respectively. Here, P(λ, z) = F {p(x, y, z)} is the transfer
function, defined as
2πjz
(︃ √︂ )︃
(︁ )︁ 2
P(λ, z) = exp 1 − (λfx )2 − λfy , (3)
λ
where fx and fy denote the spatial frequencies. The formed hologram in the detector plane is
H(x, y; λ, z) = |p(λ, z = d) ⊛ (O(x, y; z = 0) + R(x, y; z = 0))| 2 . (4)
-- Compressed PDF version --
Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10118
Our ultimate goal is to recover the object-related perturbation t(x, y) or equivalently the complex-
values object wave O(x, y) from the captured hologram H(x, y), that is consistent with Eqs. (1–4).
3. Proposed method
The essence of our method relies on using a GAN-based architecture with several key modifications.
F(·) Pz (·) P−1
z (·) GW (·) F̃(·) Pz (·)
More specifically, consider a chain O −→ H0 −→ H −→ H0 −→ Ô −→ Ĥ0 −→ Ĥ (Fig. 2),
where O ∈ Rh×w×2 is the inaccessible and unknown complex-valued object wave with height h
and width w, H0 ∈ Rh×w×1 is the produced hologram in the object plane, and H ∈ Rh×w×1 is the
hologram in the sensor plane. Similarly, Ô, Ĥ0 , Ĥ, are the reconstructed versions of the object
wave, the hologram in the object plane, and the hologram in the sensor plane. It is noteworthy
that a classic phase unwrapping algorithm based on fast Fourier transform [38] is applied to the
phase of Ô. Forward and backward angular spectrum propagation (ASP) according to Eqs. (2)
and (3) are represented by Pz (·) and P−1z (·). Likewise, F(·) : R
h×w×2 ↦→ Rh×w×1 represents
the hologram formation according to Eqs. (1)-(4). Our goal is to develop a generator network
Gw (·) : Rh×w×1 ↦→ Rh×w×2 that models the inverse of the hologram formation process to reproduce
the object wave Ô as close as possible to O under some distance measure d(Ô, O). However,
we can not quantify d(Ô, O) since O is inaccessible. To address this issue and noting that the
hologram formation process F(·) is known, we apply the same (︁ process to the )︁reconstructed wave
Ô() to obtain a corresponding reproduced hologram Ĥ = Pz F̃[GW (P−1 z (H))] . Then, we use the
surrogate distance d(Ĥ, H) instead of d(Ô, O) to assess the reconstruction quality. Finally, note
that we used F̃(·) for numerical hologram formation to account for minor differences with the real
hologram formation for parameter mismatch λ, z, and for adopting some idealistic assumptions
(e.g., plane wavefront).
Fig. 2. The overall block diagram of the hologram formation along with the proposed DL
architecture for phase recovery.
Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10119
Fig. 3. The overall framework of our untrained GAN-based network which consists of
AE-based generator network G, a discriminator network D, and a SA-based adaptive masking
module.
• One term is the MSE distance between the reproduced and the captured hologram
d1 (Ĥ, H) = MSE(Ĥ, H) used to directly train the AE-based generator, following the
physics-driven methods [5,28,31].
• Noting the limitations of MSE and 2nd norm, we also use a discriminator network
DW () : Rh×w×2 ↦→ R1 to produce a learnable penalty term by maximizing the confusion
between the reproduced and captured holograms that can be an indicator of the quality
of the hologram. Suppose DW (H) and DW (Ĥ) be the probability of the captured and
reproduced holograms being real. Then, we must maximize the first term and minimize
the second term when training the discriminator to distinguish between the real and
numerically-regenerated holograms. However, we maximize the second term when training
the generator to make the reproduced holograms as close as possible to the captured
hologram to fool the discriminator. This is equivalent to the conventional GAN formulation
L = min max Ex∼Pdata log[DW (x)] + Ez∼pz log[1 − DW (GW (z))], (5)
GW DW
Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10120
To summarize, the proposed network aims to solve the following optimization problem:
where Ĥ = Pz F̃[GW (P−1 z (H))] denotes the reproduced hologram and noting its value depends
(︁ )︁
on the optimizer GW . The first two terms represent the GAN framework loss with the ultimate
goal of making the generator GW () as close as possible to the inverse of hologram formation
F −1 () through iterative training of the generator and discriminator networks. We have used an
auto-encoder architecture for the generator following our previous work [5], whose loss function
is represented by LAuto . Likewise, L B represents the background loss term for points out of the
object mask p ∉ S with λ1 and λ2 being tuning parameters.
In the training phase, the loss of Gw and Dw are minimized sequentially,
To avoid the lazy training of the generator and achieve larger gradient variations, especially at the
beginning training steps, we solve the following equivalent optimization problem
Since this network has only one fixed input and target, the GAN structure aims to map the
input to a reproduced domain as close as possible to the target, even without the LAuto and LB
terms. Adding these terms enhances the reconstruction quality by enforcing our prior knowledge.
Besides, since the discriminator Dw would extract deep features via its multiple convolutional
layers, compared with the MSE loss or L2 loss, its similarity evaluation would intuitively be more
meaningful. Thus, the network would learn a more robust translation from the digital hologram
to the object wave.
The auto-encoder loss term LAuto (GW (H)) in Eq. (6) is used to directly minimize the gap
between the captured hologram and the numerically reconstructed hologram, independent from
the utilized discriminator.
1
LAuto (Ĥ) = dMSE (H, Ĥ) = ∥H − Ĥ∥22 (9)
h×w
where the captured and reconstructed holograms (H, Ĥ) are representatives of the AE input and
output after proper propagation.
Finally, we use total variation (TV) loss to enforce smoothness on the image background, or
simply the pixels p = (x, y) ∉ S out of the region of interest (ROI), or the image foreground.
This incorporates our prior knowledge about zero-shift for background pixels beyond ROI, and
improves the reconstruction quality. The TV loss for complex-valued 2D signal z is
∫
LB (z) = |∇ℜ(z)| + |∇ℑ(z)| dxdy, (10)
(︁ )︁
z∈ΩB
where ΩB denotes the support set of z, and ℜ(z) and ℑ(z) denote the real and imaginary parts of
z, respectively. In our case, the points z are taken from F̃(GW (P−1
z (H))) and ΩB = {(x, y)|1 ≤ x ≤
w, 1 ≤ y ≤ h, (x, y) ∉ S}.
-- Compressed PDF version --
Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10121
For discrete signals, we use the approximation |∇x ℜ(z)| = |ℜ(z)x+1,y − ℜ(z)x,y |. Noting
)︁ 1/2
|∇ℜ(z)| = (|∇x ℜ(z)|22 + |∇ℜy (z)|22 , Eq. (10) converts to
1 ∑︂ (︁ |︁|︁ |︁2 |︁ |︁2 )︁ 1/2
LB (z) = ℜ(z)x+1,y − ℜ(z)x,y |︁ + |︁ℜ(z)x,y+1 − ℜ(z)x,y |︁
|ΩB | x,y∈Ω
B (11)
|︁2 |︁ |︁2 )︁ 1/2
+ |︁ℑ(z)x+1,y − ℑ(z)x,y |︁ + |︁ℑ(z)x,y+1 − ℑ(z)x,y |︁ ,
(︁ |︁
where |ΩB | is the cardinality (the number of points) of set ΩB . For simplicity, we skip the square
root operation, and use the following version, which is computationally faster.
1 ∑︂ |︁|︁ |︁2 |︁ |︁2
LB = ℜ(z)x+1,y − ℜ(z)x,y |︁ + |︁ℜ(z)x,y+1 − ℜ(z)x,y |︁
|ΩB | x,y∈Ω
B (12)
|︁2 |︁ |︁2
+ |︁ℑ(z)x+1,y − ℑ(z)x,y |︁ + |︁ℑ(z)x,y+1 − ℑ(z)x,y |︁ .
|︁
The details of the adaptive masking to define the ROI is discussed below.
Fig. 4. The block-diagram of the adaptive segmentation to create background loss. The
operator ⊗ denotes element-wise multiplication, indicating that all operations are only
applied on the background area. The mask update process is explained in Section 3.2.
The SA algorithm is initialized by temperature T0 for time t = 0. We also set the first mask
M (0) = [1]h×w , assuming no foreground is detected yet.
To update the mask at time t = 1, 2, 3, . . . , we compare the MSE distance between the
reproduced hologram Ĥ and the captured hologram H on the background areas determined once
-- Compressed PDF version --
Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10122
by the previous mask M (t−1) and next by the current mask proposal M̂ (t) . Mathematically, we
compute δt−1 = dMSE (H, Ĥ; M (t−1) ), and δ̂t = dMSE (H, Ĥ; M̂ (t) ). Inequality δ̂t <δt−1 means that
the consistency between the captured and reconstructed holograms improves by using the current
mask proposal, so we accept the proposal and update the mask M (t) = M̂ (t) . Otherwise, we lower
the temperature as Tt = Tt−1 /log(1 + t), and then update the mask with Probability e−(δ̂t −δt−1 )/Tt .
It means that as the time passes, the update probability declines. The summary of Algorithm 1 is
presented below.
Algorithm 1. Adaptive Background Masking
The confirmed binary mask M (t) is used to determine the background area at time point t for
loss term LB in Eq. (6), noting that the background area is flat and bears constant attenuation and
phase shift. This provides additional leverage for the optimization problem to converge faster.
This improvement is confirmed by our results in Section 3.2 (for instance, see Fig. 8 and Table 3).
Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10123
both G and D to stabilize the training progress. The architectural details for G and D are given in
Tables 1 and 2. To show the generalizability of the architecture, we also used DCOD [28]) as an
alternative generator network in our experiments.
The training strategy is shown in Fig. 5. The generator and discriminator are trained sequentially.
However, to avoid the early convergence of the generator, we train the generator only once,
then train the discriminator for 5 consecutive iterations. Note that the early convergence of the
generator is not desirable, since any mediocre generator can produce artificial results that can fool
a discriminator that has not yet reached its optimal operation. Therefore, we let the discriminator
converge first and perform its best, then train the generator accordingly to produce accurate
object waves from the captured holograms. The aforementioned masking update by the SA-based
-- Compressed PDF version --
Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10124
algorithm is performed after updating the generator. This does not occur after every update, but
rather once after every k update of the generator, as shown by the red intervals in Fig. 5.
Fig. 5. The training strategy. The masking update is activated once every k = 100 intervals
(shown by red). Each interval includes one iteration of the generator update (brown) followed
by five iterations of the discriminator update (blue). If masking update is active (in red
intervals), it is performed between the generator training and discriminator training (yellow).
4. Experiment
In this section, we verify the performance of the proposed algorithm using simulated holograms,
publicly available samples, and our dendritic tags.
Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10125
1/2.8” Complementary Metal-Oxide Semiconductor (CMOS) sensor with 2.0 µm × 2.0 µm pixel
size. This sensor provides picture quality with 22 frames per second (fps) at a resolution of
5 Mega-Bytes (2560 × 1920 pixels), which gives a 5120 µm × 3840 µm field of view (FOV).
Rolling shutter and variable exposure time also provide convenience for fast and accurate imaging.
Note that this architecture can be made compact by substantially lowering the distances for
portable readers.
Fig. 6. Utilized experimental setup for in-line holography, (a) using two lenses to enlarge
the beam intersection (b) sample test.
As shown in Fig. 6(c), two convex lenses with focal lengths of f1 = 25 mm and f2 = 150 mm
are applied to expand the laser beam, so that the laser beam fully covers the dendrite samples.
The lenses located at distance f1 + f2 = 175 mm from one another, so their focal points collocate
to retain the plane wavefront. The magnifying power of this system is MP = ff 21 = 150 25 = 6,
which enlarges the laser intersection diameter from 3.5 mm to 21 mm. We use a viewing card at a
distance of 20 ft to verify the magnified beam is properly collimated.
In Fig. 6(b), a sample slide is placed on the sample holder; the laser beam passes through the
sample, and propagates the hologram onto the sensor plane. The captured image is displayed
on the computer in real-time and is fed to the proposed DL-based recovery algorithm. With an
exposure time of 28 µs, the hologram is captured in clear and bright conditions.
The DL framework is developed in Python environment using the Pytorch package and Adam
optimizer. Training is performed using two Windows 10 machines with an NVIDIA RTX2070
graphics card.
Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10126
Fig. 7. A dendritic pattern grown on a synthetic paper and soaked with a liquid electrolyte.
It is noteworthy that in general, there exist two main classes of untrained neural networks, one
with encoder-decoder architecture, mainly based on deep autoencoders, (e.g., DeepDIH [5]), and
another class with only the decoder part, the so-called deep decoder (e.g., DCOD [28]).
In this experiment, we compare our model with two untrained DL methods (DeepDIH and
DCOD) as well as a CS-based method proposed in [11] using USAF target samples. In our
framework, we use the fine-tuned version of DeepDIH as the generator network, but we also
perform ablation analysis by replacing it with the DCOD.
The results in Fig. 8 and Table 3 demonstrate the superiority of our proposed method.
Particularly, the PNSR of our method ranges from 25.7 dB to 29 dB, depending on the choice of
the generator and activating/inactivating the adaptive masking module, which is significantly
higher than the CS method (PSNR 14.6 dB), DeepDIH (PNSR 19.7 dB), and DCOD (PNSR 20.1
dB). A similar observation is made in Fig. 8, especially in the quality of the reconstructed object
phase. The main justification for this huge improvement is that the untrained method with deep
autoencoder without proper regularization terms can easily be trapped in overfitting the noise,
especially if over-parameterized [27].
Fig. 8. The comparison of different methods, including (a) DeepDIH [5], (b) DCOD
[28], (c) proposed method using DCOD as generator, (d) proposed method with modified
DeepDIH as generator, and (e) same as (d) with adaptive masking module. First, second,
and third rows represent the reconstructed amplitude, phase, and amplitude of select zone,
respectively.
Although the DCOD method uses fewer parameters to alleviate the overfitting issue, it does
not employ complete knowledge about the hologram formation process and uses random input.
In contrast, our method uses the back-propagated holograms as the generator input, meaning that
-- Compressed PDF version --
Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10127
Table 3. The comparison of different methods, including compressive sensing (CS) method [11],
DeepDIH [5], DCOD [28], proposed method with DCOD as generator, and proposed method with
modified DeepDIH as generator without and with adaptive masking module.
Method CS DIH DCOD Ours (DCOD as G) Ours w/o mask Ours w/ mask
PSNR(dB) 14.590 19.657 20.056 25.728 26.325 29.019
the generator network training starts from a reasonably good start point and converges to a better
optimum.
Another drawback of the competitor methods is using MSE loss which does not adequately
capture the image reconstruction quality and may guide the network to converge wrongly. This
issue is solved in our method by leveraging the underlying physics law and using a learnable
distance measure through the discriminator network.
Finally, we observe a significant improvement for the utilized adaptive masking module that
improves the reconstruction quality from PSNR 26.3 dB to as high as 29 dB. This highlights the
advantage of incorporating physical knowledge into the reconstruction process by adding more
constraints to the network weights through background loss.
Figure 9 provides a closer look at the benefits of using the adaptive masking module and
applying background loss to USAF target. For a better visibility, we compare three selected parts
of the reconstructed amplitude (middle) and the side-view of the reconstructed object surface. It
is clearly seen that imposing background loss smooths out the background part of the image and
improves the reconstruction quality while not causing edge distortion damage.
Fig. 9. The comparison of the reconstructed object wave from captured hologram using
the proposed model without imposing background loss (top row) and with background loss
(bottom row). Left (a),(d): amplitude; Middle (b),(e): zoom-in details of amplitude; Right
(c),(f): side view of one row of the object blades’ surface.
We present the runtime of different approaches using a windows machine with Intel Core
i7-8700K CPU and RTX 2070 GPU in Table 4. We recognize that our method with adaptive
masking needs about 30 minutes for training GAN, and 6 minutes for masking update. This
time is relatively long but is still reasonable for non-time-sensitive applications. To alleviate the
computational cost, we use Transfer Learning, as discussed in Section 4.6. With this accelerated
network, the reconstruction time reduces to about 4 minutes, comparable to DeepDIH [5].
Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10128
Table 4. The runtime of different methods, including CS [11], DIH [5], DCOD [28], and our method
with and without masking. We use 500 × 500 images with 5000 iterations for all methods while 500
iterations are sufficient to produce high-quality results using transfer learning.
Method CS DIH DCOD Ours (DCOD as G) Ours w/o mask Ours w/ mask Ours Fine-tune
PSNR(dB) ∼30 secs ∼5 mins ∼3 mins ∼30 mins ∼30 mins ∼36 mins ∼4 mins
the sensor field. All samples have been placed at a distance of 5.5 mm (the closest possible) to
the CMOS sensor to avoid unnecessary diffraction of the object waves [28]. The parameters of
the framework are set accordingly. For example, we set pixel size (2 µm), wavelength (0.532
nm), and the distance from the sample to sensor (5,500 µm). We compare our method against
the aforementioned methods in Fig. 11, which shows that our method recovers a higher quality
texture while maintaining a clean background.
Fig. 10. The reconstruction of the three real samples S1: Zea Stem, S2: Onion Epidermis,
S3: Stomata-Vicia, Faba Leaf). (a) Captured hologram; (b) reconstructed amplitude;
(c) reconstructed phase; (d) zoom in part.
Fig. 11. The comparison between different methods on Onion Epidermis sample in terms
of reconstructed phase.
We also used the same setup to capture holographic readings of dendrite samples (Fig. 12).
The results are presented after convergence which occurs after 2,000 epochs. The results in
Figs. 10 and 12 demonstrate the end-to-end performance of the proposed GAN-based phase
recovery when applied to real holograms captured by our DIH setup.
-- Compressed PDF version --
Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10129
Fig. 12. The reconstruction process of a dendrite sample. (a) A typical mica-substrate
dendrite sample; (b) captured hologram of select part; (c) reconstructed amplitude; (d) re-
constructed phase; (e) 3D view of the reconstructed object surface.
Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10130
Fig. 13. Reconstructed amplitude of a cell sample by different approaches. The first row is
the simulated hologram under different noise levels σ = 0, 5, 10, 15.
Fig. 14. Reconstructed phase of the dendrite sample with their 3D plot. The first row is the
captured hologram with artificially added noise with standard deviations σ = 0, σ = 5, σ =
10, and σ = 15, respectively.
The results in Figs. 13 and 14 show that the phase recovery of our algorithm is fairly robust
against noise levels up to σ = 10 ∼ 15, and significantly improves upon the similar frameworks
such as DeepDIH and DCOD. Similar results are provided in Table 5 that shows better performance
for our methods both in SSIM and PSNR metrics. For instance, the SSIM of DeepDIH, DCOD,
ours using DCOD as generator, and ours using DeepDIH as generator for a dendrite sample
-- Compressed PDF version --
Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10131
under noise level σ = 15 is respectively 0.453, 0.494, 0.708, and 0.763. It means that our
method when using DCOD as generator increases the performance of DCOD from SSIM=0.494
to SSIM=0.708 (43% improvement). The same applies to our method when using DeepDIH
as generator (68% improvement). By increasing the noise level up to σ = 10, the performance
decay of our method is smaller than that of the DeepDIH and DCOD methods.
For instance, for the cell sample, DeepDIH shows around 3 dB decay for each ∆σ = 5 increase
in the noise level, while ours only shows around 2 dB decay. This represents a 50% improvement
in PSNR vs noise increase rate. In the dendrite sample, from σ = 5 to σ = 10, the SSIM of
DeepDIH decreases about 0.2, while that of ours only decreases about 0.06, which is 70% smaller.
We declare conservatively that the reconstruction quality is acceptable for noise levels up to
σ = 10, which incurs only around 4 dB decay in PSNR and around 0.2 SSIM loss.
The results overall confirm the robustness of the proposed model for noisy images. Part of this
robustness is inherited from the intrinsic noise removal capability of AEs used as a generator in
our framework. Also, imposing TV loss on the background section of the hologram removes
high-frequency noise from the image.
Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10132
14.3 dB range. One intermediate solution would be transfer learning, namely using the network
trained for S1 as initialization for other networks and performing 500 training iterations for new
samples (f,j,n), which offers the best results (PSNR in 25 dB to 30 dB range).
Fig. 15. The transferability of the DH-GAN. (a) Simulated hologram of sample S1;
(b) reconstructed phase of sample S1 using the xcfully trained model. Left side: each row
represents a sample (S2,S3,S4); first column represents the captured hologram, and the next
three columns represent the results of the three testing scenarios.
To further investigate the transferability of the developed framework, we perform a test using
three sample types, including:1) MNIST handwriting digits, 2) CCBD, and 3) USAF Target. We
choose four samples of each type, and train an independent network with fixed initialization
for each sample using 3,000 iterations. The weights are collected once per 100 iterations and
considered a data point.
Figure 16 visualizes the resulting network weights in the 2D domain using principal component
analysis (PCA). The observation is quite interesting since the network weights corresponding
to the sample type are aligned in the same direction, and different sample types are somewhat
separated into disjoint clusters. However, this is not universal and in some cases, the network
trained for one sample type (like blue) can also be used for a different type (like red) if the
parameters are close enough in some compact space. Therefore, we should take caution when
using transfer learning.
Transferability is more challenging for real samples, because of varying recording conditions.
Here, we evaluate different methods on real samples S3 (Stomata-Vicia, Faba Leaf) and S2
(Onion Epidermis) shown in Fig. 10. All models are fully trained using S3, then fine-tuned for
S2 with 500 iterations. The reconstructed phase is shown in Fig. 17. Compared with retraining
with 500 iterations starting from random initialization (shown in the first row of Fig. 17), all
-- Compressed PDF version --
Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10133
Fig. 16. The 2D visualization of the network weights using PCA. Each data point represent
the vector of network weights after 100 iterations. Different colors represent different sample
types, including MNIST handwriting digits (green), CCBD (blue), and USAF Target (red).
The weights trained for similar patterns are radially clustering with the same orientation.
untrained networks demonstrate reasonable transferability although the two samples S2 and S3
are morphologically different (substantially different textures). Our proposed method exhibits a
significant improvement against DCOD methods in recovering details. Compared with DeepDIH,
our proposed method shows more contrast depth information. We also observe our framework can
boost the performance of DCOD, which can be due to the robustness to noisy reading conditions
introduced by our design.
Fig. 17. Phase reconstruction by Transfer Learning of different methods. All methods are
fully trained on sample S3 (Stomata-Vicia, Faba Leaf), then fine-tuned on sample S2 (Onion
Epidermis) in Fig. 10.
5. Conclusion
In this paper, we implemented a GAN-based framework to recover the 3D surface of micro-scaled
objects from holographic readings. Our method offers several novel features that yield phase
retrieval quality far beyond the current practice.
First, we utilized an AE-based generator network as a function approximator (to map real-valued
holograms into complex-valued object waves) in contrast to regular supervised GAN networks,
where the generator acts as a density estimator of data samples. Secondly, we implemented a
progressive masking method powered by simulated annealing that exploits image foregrounds
(e.g., fractal patterns in dendrite samples). This feature facilitates imposing smoothness through
TV loss on background areas that further improves the reconstruction and noise removal quality.
-- Compressed PDF version --
Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10134
The proposed method outperforms both conventional and DL-based methods designed for
phase recovery from one-shot imaging under similar conditions. Our method achieves a 10 dB
gain in PSNR over the CS-based [11] and about 5 dB gain over the most recent untrained deep
learning methods such as DeepDIH [5], and DCOD [28]. An additional 3 dB gain is observed for
activating the adaptive masking module. Moreover, our model is sufficiently robust against noise
and tolerates AWGN noise up to σ = 10. It shows only about 0.4 dB decay per unit noise variance
increase, lower than similar methods. Our method elevates the DL-based digital holography to
higher levels with a subtle computation increment. Furthermore, we explored transfer learning to
enable fast utilization of the proposed method in time-constrained applications. Our experiments
show that using a model trained for a similar sample can offer a reasonable reconstruction quality.
Using transfer learning by borrowing network weights trained for a similar sample and performing
additional 500 iterations for the new sample brings a considerable gain of about 12 dB compared
to independent training with 500 iterations. This observation suggests that the developed model
is highly transferrable between samples of the same type, but transferability across different
sample types needs further investigation.
Funding. U.S. Department of Agriculture (2020-67017-33078).
Acknowledgments. The authors would like to thank Dr. Bruce Gao for his comments on developing the test setup
and experiment scenarios.
Disclosures. The authors have no known competing financial interests or personal relationships that could have
appeared to influence the work reported in this paper.
Data availability. Data underlying the results presented in this paper are not publicly available at this time but may
be obtained from the authors upon reasonable request.
References
1. J. K. Wallace, S. Rider, E. Serabyn, J. Kühn, K. Liewer, J. Deming, G. Showalter, C. Lindensmith, and J. Nadeau,
“Robust, compact implementation of an off-axis digital holographic microscope,” Opt. Express 23(13), 17367–17378
(2015).
2. N. Patel, S. Rawat, M. Joglekar, V. Chhaniwal, S. K. Dubey, T. O’Connor, B. Javidi, and A. Anand, “Compact
and low-cost instrument for digital holographic microscopy of immobilized micro-particles,” Opt. Lasers Eng. 137,
106397 (2021).
3. W. Xu, M. Jericho, I. Meinertzhagen, and H. Kreuzer, “Digital in-line holography for biological applications,” Proc.
Natl. Acad. Sci. 98(20), 11301–11305 (2001).
4. A. Alfalou and C. Brosseau, “Optical image compression and encryption methods,” Adv. Opt. Photonics 1(3),
589–636 (2009).
5. H. Li, X. Chen, Z. Chi, C. Mann, and A. Razi, “Deep dih: single-shot digital in-line holography reconstruction by
deep learning,” IEEE Access 8, 202648–202659 (2020).
6. M. K. Kim, “Principles and techniques of digital holographic microscopy,” SPIE Rev. 1(1), 018005 (2010).
7. C. J. Mann, L. Yu, C.-M. Lo, and M. K. Kim, “High-resolution quantitative phase-contrast microscopy by digital
holography,” Opt. Express 13(22), 8693–8698 (2005).
8. G. Koren, F. Polack, and D. Joyeux, “Iterative algorithms for twin-image elimination in in-line holography using
finite-support constraints,” J. Opt. Soc. Am. A 10(3), 423–433 (1993).
9. N. Bari, G. Mani, and S. Berkovich, “Internet of things as a methodological concept,” in Fourth International
Conference on Computing for Geospatial Research and Application (IEEE, 2013), pp. 48–55.
10. Z. Chi, A. Valehi, H. Peng, M. Kozicki, and A. Razi, “Consistency penalized graph matching for image-based
identification of dendritic patterns,” IEEE Access 8, 118623–118637 (2020).
11. W. Zhang, L. Cao, D. J. Brady, H. Zhang, J. Cang, H. Zhang, and G. Jin, “Twin-image-free holography: a compressive
sensing approach,” Phys. Rev. Lett. 121(9), 093902 (2018).
12. C. Bai, T. Peng, J. Min, R. Li, Y. Zhou, and B. Yao, “Dual-wavelength in-line digital holography with untrained deep
neural networks,” Photonics Res. 9(12), 2501–2510 (2021).
13. G. Situ, “Deep holography,” Light: Adv. Manuf. 3(2), 1 (2022).
14. T. Shimobaba, D. Blinder, T. Birnbaum, I. Hoshi, H. Shiomi, P. Schelkens, and T. Ito, “Deep-learning computational
holography: A review,” Front. Photonics 3, 8 (2022).
15. T. Zeng, Y. Zhu, and E. Y. Lam, “Deep learning for digital holography: a review,” Opt. Express 29(24), 40572–40593
(2021).
16. H. Wang, M. Lyu, and G. Situ, “eholonet: a learning-based end-to-end approach for in-line digital holographic
reconstruction,” Opt. Express 26(18), 22603–22614 (2018).
17. R. Horisaki, R. Takagi, and J. Tanida, “Deep-learning-generated holography,” Appl. Opt. 57(14), 3859–3863 (2018).
-- Compressed PDF version --
Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10135
18. Y. Rivenson, Y. Zhang, H. Günaydın, D. Teng, and A. Ozcan, “Phase recovery and holographic image reconstruction
using deep learning in neural networks,” Light: Sci. Appl. 7(2), 17141 (2017).
19. Y. Zhang, H. Wang, and M. Shan, “Deep-learning-enhanced digital holographic autofocus imaging,” in Proceedings
of the 2020 4th International Conference on Digital Signal Processing (2020), pp. 56–60.
20. K. Wang, J. Dou, Q. Kemao, J. Di, and J. Zhao, “Y-net: a one-to-two deep learning framework for digital holographic
reconstruction,” Opt. Lett. 44(19), 4765–4768 (2019).
21. Z. Ren, Z. Xu, and E. Y. Lam, “End-to-end deep learning framework for digital holographic reconstruction,” Adv.
Photonics 1(01), 1–016004 (2019).
22. H. Chen, L. Huang, T. Liu, and A. Ozcan, “Fourier imager network (fin): A deep neural network for hologram
reconstruction with superior external generalization,” Light: Sci. Appl. 11(1), 254 (2022).
23. Y. Wu, Y. Luo, G. Chaudhari, Y. Rivenson, A. Calis, K. de Haan, and A. Ozcan, “Bright-field holography: cross-
modality deep learning enables snapshot 3d imaging with bright-field contrast using a single hologram,” Light: Sci.
Appl. 8(1), 25–27 (2019).
24. D. Yin, Z. Gu, Y. Zhang, F. Gu, S. Nie, J. Ma, and C. Yuan, “Digital holographic reconstruction based on deep
learning framework with unpaired data,” IEEE Photonics J. 12(2), 1–12 (2020).
25. Y. Zhang, M. A. Noack, P. Vagovic, K. Fezzaa, F. Garcia-Moreno, T. Ritschel, and P. Villanueva-Perez, “Phasegan:
A deep-learning phase-retrieval approach for unpaired datasets,” Opt. Express 29(13), 19593–19604 (2021).
26. F. A. Jenkins and H. E. White, “Fundamentals of optics,” Indian J. Phys. 25, 265–266 (1957).
27. R. Heckel and P. Hand, “Deep decoder: Concise image representations from untrained non-convolutional networks,”
arXiv, arXiv:1810.03982 (2018).
28. F. Niknam, H. Qazvini, and H. Latifi, “Holographic optical field recovery using a regularized untrained deep decoder
network,” Sci. Rep. 11(1), 10903–10913 (2021).
29. D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Deep image prior,” in Proceedings of the IEEE conference on computer
vision and pattern recognition (2018), pp. 9446–9454.
30. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in
International Conference on Medical image computing and computer-assisted intervention (Springer, 2015), pp.
234–241.
31. F. Wang, Y. Bian, H. Wang, M. Lyu, G. Pedrini, W. Osten, G. Barbastathis, and G. Situ, “Phase imaging with an
untrained neural network,” Light: Sci. Appl. 9(1), 77 (2020).
32. G. Palubinskas, “Image similarity/distance measures: what is really behind mse and ssim?” Int. J. Image Data Fusion
8(1), 32–53 (2017).
33. J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in European
conference on computer vision (Springer, 2016), pp. 694–711.
34. R. W. Gerchberg, “A practical algorithm for the determination of phase from image and diffraction plane pictures,”
Optik 35(2), 237–246 (1972).
35. Z. Zalevsky, D. Mendlovic, and R. G. Dorsch, “Gerchberg–saxton algorithm applied in the fractional fourier or the
fresnel domain,” Opt. Lett. 21(12), 842–844 (1996).
36. T. Latychevskaia, “Iterative phase retrieval for digital holography: tutorial,” J. Opt. Soc. Am. A 36(12), D31–D40
(2019).
37. T. Latychevskaia and H.-W. Fink, “Practical algorithms for simulation and reconstruction of digital in-line holograms,”
Appl. Opt. 54(9), 2424–2434 (2015).
38. M. A. Schofield and Y. Zhu, “Fast phase unwrapping algorithm for interferometric applications,” Opt. Lett. 28(14),
1194–1196 (2003).
39. S. Cai, Z. Mao, Z. Wang, M. Yin, and G. E. Karniadakis, “Physics-informed neural networks (pinns) for fluid
mechanics: A review,” Acta Mech. Sinica pp. 1–12 (2022).
40. S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate
shift,” in International conference on machine learning (PMLR, 2015), pp. 448–456.
41. M. N. Kozicki, “Dendritic structures and tags,” (2021). US Patent 11,170,190.
42. M. N. Kozicki, “Dendritic tags,” (2022). US Patent App. 17/311,154.
43. A. Razi and Z. Chi, “Methods and systems for generating unclonable optical tags,” (2022). US Patent App. 17/505,547.
44. A. Valehi, A. Razi, B. Cambou, W. Yu, and M. Kozicki, “A graph matching algorithm for user authentication in data
networks using image-based physical unclonable functions,” in Computing Conference (IEEE, 2017), pp. 863–870.
45. H. Wang, X. Chen, and A. Razi, “Fast key points detection and matching for tree-structured images,” arXiv,
arXiv:2211.03242 (2022).
46. “Cil project: P1170”.