0% found this document useful (0 votes)
17 views22 pages

Oe 31 6 10114

The research article presents DH-GAN, a novel physics-driven untrained generative adversarial network designed for holographic imaging, which addresses the limitations of existing deep learning methods that require large datasets. The proposed architecture enhances reconstruction quality and robustness to noise by incorporating a discriminative network for quality assessment and a generative network for modeling hologram formation. Experimental results demonstrate significant improvements in reconstruction quality and transferability to similar samples, making it suitable for time-sensitive applications without the need for extensive retraining.

Uploaded by

joycellh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views22 pages

Oe 31 6 10114

The research article presents DH-GAN, a novel physics-driven untrained generative adversarial network designed for holographic imaging, which addresses the limitations of existing deep learning methods that require large datasets. The proposed architecture enhances reconstruction quality and robustness to noise by incorporating a discriminative network for quality assessment and a generative network for modeling hologram formation. Experimental results demonstrate significant improvements in reconstruction quality and transferability to similar samples, making it suitable for time-sensitive applications without the need for extensive retraining.

Uploaded by

joycellh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

-- Compressed PDF version --

Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10114

DH-GAN: a physics-driven untrained generative


adversarial network for holographic imaging
X IWEN C HEN , 1,† H AO WANG , 1,† A BOLFAZL R AZI , 1,* M ICHAEL
K OZICKI , 2 AND C HRISTOPHER M ANN 3
1 Schoolof Computing, Clemson University, 821 McMillan Rd., Clemson, SC 29631, USA
2 Schoolof Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ 85287, USA
3 Department of Applied Physics and Materials Science, Northern Arizona University, 1900 S Knoles Dr.,
Flagstaff, AZ 86011, USA
† The authors have equal contribution.
* arazi@clemson.edu

10.1364/opticaopen.22009556

Abstract: Digital holography is a 3D imaging technique by emitting a laser beam with a plane
wavefront to an object and measuring the intensity of the diffracted waveform, called holograms.
The object’s 3D shape can be obtained by numerical analysis of the captured holograms and
recovering the incurred phase. Recently, deep learning (DL) methods have been used for more
accurate holographic processing. However, most supervised methods require large datasets
to train the model, which is rarely available in most DH applications due to the scarcity of
samples or privacy concerns. A few one-shot DL-based recovery methods exist with no reliance
on large datasets of paired images. Still, most of these methods often neglect the underlying
physics law that governs wave propagation. These methods offer a black-box operation, which
is not explainable, generalizable, and transferrable to other samples and applications. In this
work, we propose a new DL architecture based on generative adversarial networks that uses a
discriminative network for realizing a semantic measure for reconstruction quality while using a
generative network as a function approximator to model the inverse of hologram formation. We
impose smoothness on the background part of the recovered image using a progressive masking
module powered by simulated annealing to enhance the reconstruction quality. The proposed
method exhibits high transferability to similar samples, which facilitates its fast deployment in
time-sensitive applications without the need for retraining the network from scratch. The results
show a considerable improvement to competitor methods in reconstruction quality (about 5 dB
PSNR gain) and robustness to noise (about 50% reduction in PSNR vs noise increase rate).

© 2023 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction
Digital holography (DH) is a commonly-used technique to exploit the 3D shape of microscopic
objects, something not feasible with regular cameras. This powerful technique is used in various
applications, including micro-particle measurement [1,2], biology [3], encryption [4], and visual
identification tags [5]. The core idea behind DH is that a laser beam with a plane wavefront
experiences diffraction and phase shift when it encounters a microscopic object. The interfering
wave intensity, also called hologram, is captured by a charge-coupled device (CCD) sensor array.
The goal of DH is to reconstruct the object’s 3D shape by processing the captured holograms
[6,7]. In short, if O(x, y) and H(x, y) are the object wave and the captured hologram, the goal
is recovering O(x, y) (or equivalently the 3D facet of the sample) from H(x, y), which involves
twin-image removal.
Compared to off-axis holography, digital inline holography (DIH) entails a much easier
hologram rendering method by emitting only one beam through the object and processing the
diffracted wave. However, it requires more complex numerical methods for phase recovery to

#480894 https://doi.org/10.1364/OE.480894
Journal © 2023 Received 18 Nov 2022; revised 3 Feb 2023; accepted 5 Feb 2023; published 6 Mar 2023
-- Compressed PDF version --

Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10115

deconvolve the spatially overlapping zero-order and cross-correlated holographic terms. Some
methods rely on taking multiple images at different positions to enhance the phase recovery
performance [8]. In this work, we use transparent inline holography (Fig. 1) with single-shot
imaging and numerical reconstruction for its more straightforward design and potential for
developing low-cost compact and portable readers appropriate for Internet of Things (IoT) and
supply chain applications [9], especially for dendritic tags, our custom-designed visual identifiers
[10].

Fig. 1. The typical in-line digital holography setups.

1.1. Related work on conventional DH phase recovery


Recently, a physics-driven Compressive sensing (CS) based method has been proposed to solve the
twin image problem using single-shot imaging [11]. Specifically, they observed that the real object
wave RR∗ O = O has sharp edges, while the twin virtual image RRO∗ is diffused when mapped
to their sparse representation. Here, O and R represent the object and reference waves, and ∗ is
the complex conjugate operator. The total variation (TV) loss is applied to the complex-valued
object wave to impose sparsity. Moreover, a two-step iterative shrinkage/thresholding (TwIST)
algorithm is used to optimize the objective function Û = arg minU 12 ∥H − Tf (U)∥22 + τ∥U∥tv ,
{︁ }︁

where Tf is the forward propagator, ∥.∥2 is the 2nd norm, ∥ · ∥tv is the total variation norm, and τ
is a tuning parameter. This method is more efficient than the iterative methods, hence is used
as a benchmark method in some recent papers [5,12] including our comparative results in this
paper. However, it suffers from a few technical issues. For example, imposing explicit sparsity
constraints can cause the edge distortion problem. Moreover, the results are sensitive to the
choice of τ.

1.2. Related work on deep learning-based DH


Recently, deep learning (DL) methods have been used for computational holography due to their
superior performance in many visual computing and image processing tasks [13,14]. In contrast
to the conventional phase recovery algorithms that mainly rely on theoretical knowledge and
phase propagation models, supervised DL methods often use large-scale datasets for training a
black-box model to solve the inverse problem numerically. Therefore, prior knowledge about the
propagation model and the system parameters is not necessary to construct DL networks [15].
For example, the authors of [16–22] used different implementations of convolutional neural
networks (CNN) taking advantage of the CNN’s capability in developing multi-frequency and
multi-scale feature maps. They usually customize the network or apply proper regularization
terms to reconstruct the object wave from the captured hologram. For instance, [20] proposed
a Y-like network with two output heads that can reconstruct intensity and phase information
simultaneously. Digital holographic reconstruction is extended to multi-sectional objects in [21].
More recently, spatial Fourier transform modules are utilized in addition to convolutional layers
to handle spatial information better [22]. A generative adversarial network (GAN) is proposed in
[23] to generate bright-field microscopy at different depths free of the artifacts and noise from the
captured hologram. The GAN network learns the statistical distribution of the training samples.
-- Compressed PDF version --

Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10116

Although their one-shot inference was fast, the training time was fairly long, using about 6,000
image pairs (30,000 pairs after data augmentation).
Supervised DL methods, including the aforementioned methods, offer superior performance
in phase recovery. Nevertheless, they usually suffer from the obvious drawback of reliance on
relatively large datasets for training purposes. For instance, the models in [16,17] require about
10,000 training pairs. This requirement becomes problematic since such huge DH datasets rarely
exist for different sample types. Even if such datasets exist, the training time can be prohibitively
long for time-sensitive applications. For instance, the training time with a typical GPU: GeForce
GTX 1080 is about 14.5 hours for the model proposed in [18]. Since the training process is not
transferable and should be repeated for different setups and sample types, such a long training
phase is not practically desirable. In some other applications, such as authenticating objects
using nano-scaled 3D visual tags, data sharing can be prohibited for security reasons [10].
To address the scarcity of paired DH samples, some recent works utilize unpaired data
(unmatched holograms and samples) to train their network [24]. Specifically, a cycle-generative
adversarial network (CycleGAN) is employed in [24] to reconstruct the object wave from the
hologram by training the model with holograms (denoted as domain X) and unmatched objects
(denoted as domain Y). Particularly, two generators are used to learn the functions X → Y
and Y → X. A consistency loss is used to enforce the training progress X → Y → X̂ ≈ X. A
similar method based on CycleGAN, called PhaseGAN, is proposed in [25], which used unpaired
data for training. The near-field Fresnel propagator [26] is employed as part of their framework.
Although these methods do not require matched object-hologram samples, they still need large
datasets of unmatched hologram samples in the training phase.
Considering the difficulties of developing large DH datasets, some attempts have been made
recently to create unsupervised learning frameworks [5,27,28]. Most of these frameworks utilize
CNN architectures as their backbones since they can capture sufficient low-level image features
to reproduce uncorrupted and realistic image parts [29]. Often, a loss function is employed
to minimize the distance between the captured hologram and the artificial hologram obtained
by forward-propagating the recovered object wave. For example, our previous work [5] uses
an hourglass encoder-decoder structure to reconstruct the object wave from DIH holograms.
Inspired by the Deep decoder concept proposed in [27], the reconstruction algorithm in [28]
abandoned the encoder part and only used a decoder with a fixed random tensor as its input.
Some classical regularization methods, such as total variation (TV) loss and weight decay, are
applied to partially solve the noisy and incomplete signal problem. PhysenNet used a U-net
architecture [30] to retrieve the phase information [31]. Most recently, an untrained CNN-based
network is employed in dual-wavelength DIH, which benefits from the CNN’s capability of image
reconstruction and denoising and the Dual-wavelength setup’s capability of phase unwrapping
[12].

1.3. Summary of our contributions


Despite their innovative design and reconstruction efficiency, most of these methods suffer
from critical shortcomings. First, these untrained networks often use a loss function based on
the mean-squared errors (MSE), L2-norm, or similar distance measures between the captured
hologram and the reproduced hologram. This class of loss functions is not capable of measuring
structural similarities [32] and is not fully consistent with the human perception. The perceptual
loss, proposed in [33], uses a pre-trained feature extraction backbone to measure the loss,
as a reasonable solution for this matter. Inspired by this work in developing a semantic
similarity measure, we propose an untrained and physics-driven learning framework based on
GAN architecture for one-shot DH reconstruction. In our method, the discriminator network
contributes a learnable penalty term to evaluate the similarity between the reproduced and the
captured holograms. As we will discuss later in section 3.1, the role of the generator network
-- Compressed PDF version --

Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10117

in our network is textitfunction approximator to model the inverse of the hologram generation
process (i.e., mapping the hologram to complex-valued object wave), as opposed to general
GANs, where the generator network learns the data distribution to create new samples from noise.
Another drawback of most aforementioned DL-based methods is their lack of interpretability
and ignorance of physics knowledge. Therefore, there are always two risks (i) over-fitting and
(ii) severe performance degradation under minor changes to sample characteristics and test
conditions. We address these issues in two different ways. First, we incorporate forward and
backward propagation into our model, following some recent works [5,28,31]. Secondly, we
implement a new spatial attention module using an adaptive masking process to split the object
pattern into foreground and background regions and impose smoothness on the image background.
The background mask update is performed based on the reconstructed object wave quality to be
regulated by simulated annealing (SA) optimization to start from more aggressive updates and
settle with more conservative changes when the network is converged. Imposing smoothness
constraint on the gradually-evolving background area, makes our method fundamentally different
than some iterative methods that enforce physics-driven hologram formation equations on the
support region (i.e., the foreground) [34,35] or the entire image [36].
We show that our framework is generic and independent of the choice of the generator network.
In particular, we tested our framework with two recently developed generators, the fine-tuned
version of DeepDIH [5] and the deep compressed object decoder (DCOD) [28]. We also show
that adding a super-resolution layer to the utilized auto-encoder (AE) improves the quality of the
phase recovery.
This paper is organized as follows. Section 2 reviews the hologram formation process and
recasts it as a nonlinear inverse problem. Section 3 elaborates on the details of the proposed DL
method for phase recovery, highlighting its key features and differences from similar methods.
Experimental results for simulated holograms, publicly available samples, and our dendrite
samples are presented in Section 4 followed by concluding remarks in Section 5.

2. Problem formulation
The goal of this work is to design an unsupervised physics-driven DL network to reconstruct
the 3D surface of microscopic objects, especially dendrites, micro-scaled security tags used
to protect supply chains against cloning and counterfeit attacks (see Section 4.4 for details of
dendrites).
The incident wave passing through a thin transparent object can be characterized as a
complex-valued value
O(x, y; z = 0) = R(x, y; z = 0)t(x, y), (1)
where R(x, y; z = 0) is the reference wave (i.e., the incident wave if the object is not present)
and t(x, y) = A(x, y)exp(jϕ(x, y)) is the incurred perturbation term caused by the object. t(x, y)
includes attenuation A(x, y) and phase shift ϕ(x, y) [37]. After performing forward-propagation
described by the angular spectrum method at distance z = d, O(x, y; z = d) is formed as follows
O(x, y; z = d) = p(λ, z = d) ⊛ O(x, y; z = 0)
(2)
= F −1 {P(λ, z = d) · F {O(x, y; z = 0)}},
where λ represents the wavelength and ⊛ is the convolution operator. F {·} and F −1 {·} denote
the direct and inverse Fourier transforms, respectively. Here, P(λ, z) = F {p(x, y, z)} is the transfer
function, defined as
2πjz
(︃ √︂ )︃
(︁ )︁ 2
P(λ, z) = exp 1 − (λfx )2 − λfy , (3)
λ
where fx and fy denote the spatial frequencies. The formed hologram in the detector plane is
H(x, y; λ, z) = |p(λ, z = d) ⊛ (O(x, y; z = 0) + R(x, y; z = 0))| 2 . (4)
-- Compressed PDF version --

Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10118

Our ultimate goal is to recover the object-related perturbation t(x, y) or equivalently the complex-
values object wave O(x, y) from the captured hologram H(x, y), that is consistent with Eqs. (1–4).

3. Proposed method
The essence of our method relies on using a GAN-based architecture with several key modifications.
F(·) Pz (·) P−1
z (·) GW (·) F̃(·) Pz (·)
More specifically, consider a chain O −→ H0 −→ H −→ H0 −→ Ô −→ Ĥ0 −→ Ĥ (Fig. 2),
where O ∈ Rh×w×2 is the inaccessible and unknown complex-valued object wave with height h
and width w, H0 ∈ Rh×w×1 is the produced hologram in the object plane, and H ∈ Rh×w×1 is the
hologram in the sensor plane. Similarly, Ô, Ĥ0 , Ĥ, are the reconstructed versions of the object
wave, the hologram in the object plane, and the hologram in the sensor plane. It is noteworthy
that a classic phase unwrapping algorithm based on fast Fourier transform [38] is applied to the
phase of Ô. Forward and backward angular spectrum propagation (ASP) according to Eqs. (2)
and (3) are represented by Pz (·) and P−1z (·). Likewise, F(·) : R
h×w×2 ↦→ Rh×w×1 represents

the hologram formation according to Eqs. (1)-(4). Our goal is to develop a generator network
Gw (·) : Rh×w×1 ↦→ Rh×w×2 that models the inverse of the hologram formation process to reproduce
the object wave Ô as close as possible to O under some distance measure d(Ô, O). However,
we can not quantify d(Ô, O) since O is inaccessible. To address this issue and noting that the
hologram formation process F(·) is known, we apply the same (︁ process to the )︁reconstructed wave
Ô() to obtain a corresponding reproduced hologram Ĥ = Pz F̃[GW (P−1 z (H))] . Then, we use the
surrogate distance d(Ĥ, H) instead of d(Ô, O) to assess the reconstruction quality. Finally, note
that we used F̃(·) for numerical hologram formation to account for minor differences with the real
hologram formation for parameter mismatch λ, z, and for adopting some idealistic assumptions
(e.g., plane wavefront).

Fig. 2. The overall block diagram of the hologram formation along with the proposed DL
architecture for phase recovery.

3.1. Optimization through loss function


Figures 2 and 3 present the details of the proposed DL Architecture for DIH phase recovery. The
loss term used in the generator network GW (·) includes the following components.
-- Compressed PDF version --

Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10119

Fig. 3. The overall framework of our untrained GAN-based network which consists of
AE-based generator network G, a discriminator network D, and a SA-based adaptive masking
module.

• One term is the MSE distance between the reproduced and the captured hologram
d1 (Ĥ, H) = MSE(Ĥ, H) used to directly train the AE-based generator, following the
physics-driven methods [5,28,31].
• Noting the limitations of MSE and 2nd norm, we also use a discriminator network
DW () : Rh×w×2 ↦→ R1 to produce a learnable penalty term by maximizing the confusion
between the reproduced and captured holograms that can be an indicator of the quality
of the hologram. Suppose DW (H) and DW (Ĥ) be the probability of the captured and
reproduced holograms being real. Then, we must maximize the first term and minimize
the second term when training the discriminator to distinguish between the real and
numerically-regenerated holograms. However, we maximize the second term when training
the generator to make the reproduced holograms as close as possible to the captured
hologram to fool the discriminator. This is equivalent to the conventional GAN formulation

L = min max Ex∼Pdata log[DW (x)] + Ez∼pz log[1 − DW (GW (z))], (5)
GW DW

with a few modifications.


• Finally, to incorporate our prior knowledge, we use a new term that imposes smoothness
on the image background. This is to embrace the fact that in most real scenarios, the
samples are supported with a transparent glass slide, meaning that the background of the
reconstructed object should present no phase shift. In other words, t(x, y) = A(x, y)eΦ(x,y) =
1 ⇒ O(x, y) = R(x, y) based on Eq. (1), which means zero phase shift in the object wave
for all pixels out of the object boundary, (x, y) ∉ S. This approach is inspired by the
physics-informed neural networks (PINN) [39] that use boundary conditions to solve partial
differential equations (PDEs). Our approach to detecting image background is discussed
in Section 3.2.
-- Compressed PDF version --

Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10120

To summarize, the proposed network aims to solve the following optimization problem:

L = min max log[DW (H)] + log[1 − DW (Ĥ)]


GW DW
(6)
+λ1 LAuto (Ĥ) + λ2 L B (GW (H)),

where Ĥ = Pz F̃[GW (P−1 z (H))] denotes the reproduced hologram and noting its value depends
(︁ )︁
on the optimizer GW . The first two terms represent the GAN framework loss with the ultimate
goal of making the generator GW () as close as possible to the inverse of hologram formation
F −1 () through iterative training of the generator and discriminator networks. We have used an
auto-encoder architecture for the generator following our previous work [5], whose loss function
is represented by LAuto . Likewise, L B represents the background loss term for points out of the
object mask p ∉ S with λ1 and λ2 being tuning parameters.
In the training phase, the loss of Gw and Dw are minimized sequentially,

LGw = min log[1 − DW (Ĥ)] + λ1 LAuto (Ĥ) + λ2 L B (GW (H))


G
(7)
LDw = max log[DW (H)] + log[1 − DW (Ĥ)].
D

To avoid the lazy training of the generator and achieve larger gradient variations, especially at the
beginning training steps, we solve the following equivalent optimization problem

LGw = min − log[DW (Ĥ)] + λ1 LAuto (Ĥ) + λ2 L B (GW (H)). (8)


G

Since this network has only one fixed input and target, the GAN structure aims to map the
input to a reproduced domain as close as possible to the target, even without the LAuto and LB
terms. Adding these terms enhances the reconstruction quality by enforcing our prior knowledge.
Besides, since the discriminator Dw would extract deep features via its multiple convolutional
layers, compared with the MSE loss or L2 loss, its similarity evaluation would intuitively be more
meaningful. Thus, the network would learn a more robust translation from the digital hologram
to the object wave.
The auto-encoder loss term LAuto (GW (H)) in Eq. (6) is used to directly minimize the gap
between the captured hologram and the numerically reconstructed hologram, independent from
the utilized discriminator.
1
LAuto (Ĥ) = dMSE (H, Ĥ) = ∥H − Ĥ∥22 (9)
h×w

where the captured and reconstructed holograms (H, Ĥ) are representatives of the AE input and
output after proper propagation.
Finally, we use total variation (TV) loss to enforce smoothness on the image background, or
simply the pixels p = (x, y) ∉ S out of the region of interest (ROI), or the image foreground.
This incorporates our prior knowledge about zero-shift for background pixels beyond ROI, and
improves the reconstruction quality. The TV loss for complex-valued 2D signal z is

LB (z) = |∇ℜ(z)| + |∇ℑ(z)| dxdy, (10)
(︁ )︁
z∈ΩB

where ΩB denotes the support set of z, and ℜ(z) and ℑ(z) denote the real and imaginary parts of
z, respectively. In our case, the points z are taken from F̃(GW (P−1
z (H))) and ΩB = {(x, y)|1 ≤ x ≤
w, 1 ≤ y ≤ h, (x, y) ∉ S}.
-- Compressed PDF version --

Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10121

For discrete signals, we use the approximation |∇x ℜ(z)| = |ℜ(z)x+1,y − ℜ(z)x,y |. Noting
)︁ 1/2
|∇ℜ(z)| = (|∇x ℜ(z)|22 + |∇ℜy (z)|22 , Eq. (10) converts to
1 ∑︂ (︁ |︁|︁ |︁2 |︁ |︁2 )︁ 1/2
LB (z) = ℜ(z)x+1,y − ℜ(z)x,y |︁ + |︁ℜ(z)x,y+1 − ℜ(z)x,y |︁
|ΩB | x,y∈Ω
B (11)
|︁2 |︁ |︁2 )︁ 1/2
+ |︁ℑ(z)x+1,y − ℑ(z)x,y |︁ + |︁ℑ(z)x,y+1 − ℑ(z)x,y |︁ ,
(︁ |︁

where |ΩB | is the cardinality (the number of points) of set ΩB . For simplicity, we skip the square
root operation, and use the following version, which is computationally faster.
1 ∑︂ |︁|︁ |︁2 |︁ |︁2
LB = ℜ(z)x+1,y − ℜ(z)x,y |︁ + |︁ℜ(z)x,y+1 − ℜ(z)x,y |︁
|ΩB | x,y∈Ω
B (12)
|︁2 |︁ |︁2
+ |︁ℑ(z)x+1,y − ℑ(z)x,y |︁ + |︁ℑ(z)x,y+1 − ℑ(z)x,y |︁ .
|︁

The details of the adaptive masking to define the ROI is discussed below.

3.2. Adaptive masking by K-means and simulated annealing


The background loss LB in Eq. (6) operates on the background area of the output image, as
shown in Fig. 4. The background area is determined by a binary mask M (t) , where t is a discrete
number denoting the mask update time point. To this end, a binary mask M̂ (t) is developed by
applying K-means segmentation (with K=2) to | Ô0 |, the amplitude of the reconstructed object
wave at z = 0. We consider the resulting mask as a "proposal mask", which may or may not
be accepted. Rejection means that we use the previously formed mask M (t−1) to calculate the
background loss. To avoid instability of the results and unnecessary mask updates, we use a
mechanism that tends to make more frequent (aggressive) updates at the beginning and less
frequent (conservative) updates when the algorithm converges to reasonably good results. A
natural way of implementing such a mechanism is using simulated annealing (SA) algorithm
where the variation rate decline is controlled by temperature cooling.

Fig. 4. The block-diagram of the adaptive segmentation to create background loss. The
operator ⊗ denotes element-wise multiplication, indicating that all operations are only
applied on the background area. The mask update process is explained in Section 3.2.

The SA algorithm is initialized by temperature T0 for time t = 0. We also set the first mask
M (0) = [1]h×w , assuming no foreground is detected yet.
To update the mask at time t = 1, 2, 3, . . . , we compare the MSE distance between the
reproduced hologram Ĥ and the captured hologram H on the background areas determined once
-- Compressed PDF version --

Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10122

by the previous mask M (t−1) and next by the current mask proposal M̂ (t) . Mathematically, we
compute δt−1 = dMSE (H, Ĥ; M (t−1) ), and δ̂t = dMSE (H, Ĥ; M̂ (t) ). Inequality δ̂t <δt−1 means that
the consistency between the captured and reconstructed holograms improves by using the current
mask proposal, so we accept the proposal and update the mask M (t) = M̂ (t) . Otherwise, we lower
the temperature as Tt = Tt−1 /log(1 + t), and then update the mask with Probability e−(δ̂t −δt−1 )/Tt .
It means that as the time passes, the update probability declines. The summary of Algorithm 1 is
presented below.
Algorithm 1. Adaptive Background Masking

The confirmed binary mask M (t) is used to determine the background area at time point t for
loss term LB in Eq. (6), noting that the background area is flat and bears constant attenuation and
phase shift. This provides additional leverage for the optimization problem to converge faster.
This improvement is confirmed by our results in Section 3.2 (for instance, see Fig. 8 and Table 3).

3.3. Network architecture


The network consists of a generator G and a discriminator D (Fig. 2). Although the proposed
framework is general and any typical generative network and binary classifier can be used for G
and D; here, we provide the details of the utilized networks for the sake of completeness. We
use the modified version of the auto-encoder (AE) in [5] as our generator network (Fig. 3). The
AE network consists of 8 convolutional layers in the encoder and 8 in the decoder part. Max
pooling, and transposed convolution operators are used to perform downsampling and upsampling,
respectively. One key modification we made is adding 2 more convolutional layers 1 more
transposed convolutional layers to enable super-resolution, which brings further improvement at
a reasonably low computation cost.
The discriminator network D uses an architecture similar to the encoder part of the AE-based
generator G. It consists of 8 convolutional layers, a global pooling layer, and a dense layer. It
outputs a single value that represents the evaluation score. Batch Normalization [40] is used for
-- Compressed PDF version --

Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10123

both G and D to stabilize the training progress. The architectural details for G and D are given in
Tables 1 and 2. To show the generalizability of the architecture, we also used DCOD [28]) as an
alternative generator network in our experiments.

Table 1. The architectural details of generator G. It utilizes a


hourglass autoencoder structure. K1 and K2 denote the kernel size
and Cin and Cout denote the input channel and the output channel,
respectively. Layers with * are used for super-resolution.
Kernel Size
Layer Type
K1 × K2 × Cin × Cout
Conv2d+BatchNorm+Relu 5 × 5 × 2 × 32
Conv2d+BatchNorm+Relu 3 × 3 × 32 × 32
MaxPool2d
Conv2d+BatchNorm+Relu 3 × 3 × 32 × 64
Conv2d+BatchNorm+Relu 3 × 3 × 64 × 64
MaxPool2d
Conv2d+BatchNorm+Relu 3 × 3 × 64 × 128
Conv2d+BatchNorm+Relu 3 × 3 × 128 × 128
MaxPool2d
Conv2d+BatchNorm+Relu 3 × 3 × 128 × 128
Conv2d+BatchNorm+Tanh 3 × 3 × 128 × 16
Conv2d+BatchNorm+Relu 3 × 3 × 16 × 128
Conv2d+BatchNorm+Relu 3 × 3 × 128 × 128
ConvTranspose2d stride = 2
Conv2d+BatchNorm+Relu 3 × 3 × 128 × 64
Conv2d+BatchNorm+Relu 3 × 3 × 64 × 64
ConvTranspose2d stride = 2
Conv2d+BatchNorm+Relu 3 × 3 × 64 × 32
Conv2d+BatchNorm+Relu 3 × 3 × 32 × 32
ConvTranspose2d stride = 2
______________________________
Conv2d+BatchNorm+Relu* 3 × 3 × 32 × 16
Conv2d+BatchNorm+Relu* 3 × 3 × 16 × 16
ConvTranspose2d* stride = 2
_______________________________
Conv2d+BatchNorm+Relu 3 × 3 × 16 × 16
Conv2d+BatchNorm+Relu 3 × 3 × 16 × 16
Conv2d 3 × 3 × 16 × 2

The training strategy is shown in Fig. 5. The generator and discriminator are trained sequentially.
However, to avoid the early convergence of the generator, we train the generator only once,
then train the discriminator for 5 consecutive iterations. Note that the early convergence of the
generator is not desirable, since any mediocre generator can produce artificial results that can fool
a discriminator that has not yet reached its optimal operation. Therefore, we let the discriminator
converge first and perform its best, then train the generator accordingly to produce accurate
object waves from the captured holograms. The aforementioned masking update by the SA-based
-- Compressed PDF version --

Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10124

Table 2. The architectural details of discriminator D. It


outputs the similarity of the input and the target.
Kernel Size
Layer Type
K1 × K2 × Cin × Cout
Conv2d+BatchNorm+ReLU 5 × 5 × 2 × 32
Conv2d+BatchNorm+ReLU 3 × 3 × 32 × 32
MaxPool2d -
Conv2d+BatchNorm+ReLU 3 × 3 × 32 × 64
Conv2d+BatchNorm+ReLU 3 × 3 × 64 × 64
MaxPool2d -
Conv2d+BatchNorm+ReLU 3 × 3 × 64 × 128
Conv2d+BatchNorm+ReLU 3 × 3 × 128 × 128
MaxPool2d -
Conv2d+BatchNorm+ReLU 3 × 3 × 128 × 128
Conv2d+BatchNorm 3 × 3 × 128 × 16
GlobalPool2d -
Full-connected 1 × 1 × 16 × 1

algorithm is performed after updating the generator. This does not occur after every update, but
rather once after every k update of the generator, as shown by the red intervals in Fig. 5.

Fig. 5. The training strategy. The masking update is activated once every k = 100 intervals
(shown by red). Each interval includes one iteration of the generator update (brown) followed
by five iterations of the discriminator update (blue). If masking update is active (in red
intervals), it is performed between the generator training and discriminator training (yellow).

4. Experiment
In this section, we verify the performance of the proposed algorithm using simulated holograms,
publicly available samples, and our dendritic tags.

4.1. Experiment setup


Our experiment setup is shown in Fig. 6. The laser module CPS532-C2 is used to generate
a single wavelength (532 nm) laser beam with a round shape of diameter 3.5 mm. The laser
module provides 0.9 mW a typical USB port power. The USB-based powering facilitates taking
clear holograms in normal conditions. We use a digital camera A55050U, which employs a
-- Compressed PDF version --

Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10125

1/2.8” Complementary Metal-Oxide Semiconductor (CMOS) sensor with 2.0 µm × 2.0 µm pixel
size. This sensor provides picture quality with 22 frames per second (fps) at a resolution of
5 Mega-Bytes (2560 × 1920 pixels), which gives a 5120 µm × 3840 µm field of view (FOV).
Rolling shutter and variable exposure time also provide convenience for fast and accurate imaging.
Note that this architecture can be made compact by substantially lowering the distances for
portable readers.

Fig. 6. Utilized experimental setup for in-line holography, (a) using two lenses to enlarge
the beam intersection (b) sample test.

As shown in Fig. 6(c), two convex lenses with focal lengths of f1 = 25 mm and f2 = 150 mm
are applied to expand the laser beam, so that the laser beam fully covers the dendrite samples.
The lenses located at distance f1 + f2 = 175 mm from one another, so their focal points collocate
to retain the plane wavefront. The magnifying power of this system is MP = ff 21 = 150 25 = 6,
which enlarges the laser intersection diameter from 3.5 mm to 21 mm. We use a viewing card at a
distance of 20 ft to verify the magnified beam is properly collimated.
In Fig. 6(b), a sample slide is placed on the sample holder; the laser beam passes through the
sample, and propagates the hologram onto the sensor plane. The captured image is displayed
on the computer in real-time and is fed to the proposed DL-based recovery algorithm. With an
exposure time of 28 µs, the hologram is captured in clear and bright conditions.
The DL framework is developed in Python environment using the Pytorch package and Adam
optimizer. Training is performed using two Windows 10 machines with an NVIDIA RTX2070
graphics card.

4.2. Dendrite samples


In addition to simulated and public holograms, we use dendrite samples in our experiments.
Dendrites are visual identifiers that are formed by growing tree-shaped metallic fractal patterns
by inducing regulated voltage on electrolyte solutions with different propensities [41]. These tags
can be efficiently produced in large volumes on multiple substrate materials (e.g., mica, synthetic
paper, etc.) with different granularity and density [42]. A dendrite sample is shown in Fig. 7.
Dendrites have specific features such as extremely high entropy for their inherent randomness,
self-similarity, and unclonability due to their 3D facets and non-resolution granularity. These
features make this patented technology an appropriate choice for security solutions, including
identification tags, visual authentication, random generators, and producing physical unclonable
functions (PUFs) with robust security keys [10,43].
We previously have shown dendrites’ utility as 2D authentication identifiers [10,44,45], but
exploiting information-rich features from dendrites to achieve unclonability requires specific
technologies such as digital holography, as presented in our previous work [5].

4.3. Test with simulated holograms


First, we compare the performance of our method against the most powerful untrained methods,
where the sample hologram is sufficient to recover the phase with no need for a training dataset.
-- Compressed PDF version --

Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10126

Fig. 7. A dendritic pattern grown on a synthetic paper and soaked with a liquid electrolyte.

It is noteworthy that in general, there exist two main classes of untrained neural networks, one
with encoder-decoder architecture, mainly based on deep autoencoders, (e.g., DeepDIH [5]), and
another class with only the decoder part, the so-called deep decoder (e.g., DCOD [28]).
In this experiment, we compare our model with two untrained DL methods (DeepDIH and
DCOD) as well as a CS-based method proposed in [11] using USAF target samples. In our
framework, we use the fine-tuned version of DeepDIH as the generator network, but we also
perform ablation analysis by replacing it with the DCOD.
The results in Fig. 8 and Table 3 demonstrate the superiority of our proposed method.
Particularly, the PNSR of our method ranges from 25.7 dB to 29 dB, depending on the choice of
the generator and activating/inactivating the adaptive masking module, which is significantly
higher than the CS method (PSNR 14.6 dB), DeepDIH (PNSR 19.7 dB), and DCOD (PNSR 20.1
dB). A similar observation is made in Fig. 8, especially in the quality of the reconstructed object
phase. The main justification for this huge improvement is that the untrained method with deep
autoencoder without proper regularization terms can easily be trapped in overfitting the noise,
especially if over-parameterized [27].

Fig. 8. The comparison of different methods, including (a) DeepDIH [5], (b) DCOD
[28], (c) proposed method using DCOD as generator, (d) proposed method with modified
DeepDIH as generator, and (e) same as (d) with adaptive masking module. First, second,
and third rows represent the reconstructed amplitude, phase, and amplitude of select zone,
respectively.

Although the DCOD method uses fewer parameters to alleviate the overfitting issue, it does
not employ complete knowledge about the hologram formation process and uses random input.
In contrast, our method uses the back-propagated holograms as the generator input, meaning that
-- Compressed PDF version --

Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10127

Table 3. The comparison of different methods, including compressive sensing (CS) method [11],
DeepDIH [5], DCOD [28], proposed method with DCOD as generator, and proposed method with
modified DeepDIH as generator without and with adaptive masking module.
Method CS DIH DCOD Ours (DCOD as G) Ours w/o mask Ours w/ mask
PSNR(dB) 14.590 19.657 20.056 25.728 26.325 29.019

the generator network training starts from a reasonably good start point and converges to a better
optimum.
Another drawback of the competitor methods is using MSE loss which does not adequately
capture the image reconstruction quality and may guide the network to converge wrongly. This
issue is solved in our method by leveraging the underlying physics law and using a learnable
distance measure through the discriminator network.
Finally, we observe a significant improvement for the utilized adaptive masking module that
improves the reconstruction quality from PSNR 26.3 dB to as high as 29 dB. This highlights the
advantage of incorporating physical knowledge into the reconstruction process by adding more
constraints to the network weights through background loss.
Figure 9 provides a closer look at the benefits of using the adaptive masking module and
applying background loss to USAF target. For a better visibility, we compare three selected parts
of the reconstructed amplitude (middle) and the side-view of the reconstructed object surface. It
is clearly seen that imposing background loss smooths out the background part of the image and
improves the reconstruction quality while not causing edge distortion damage.

Fig. 9. The comparison of the reconstructed object wave from captured hologram using
the proposed model without imposing background loss (top row) and with background loss
(bottom row). Left (a),(d): amplitude; Middle (b),(e): zoom-in details of amplitude; Right
(c),(f): side view of one row of the object blades’ surface.

We present the runtime of different approaches using a windows machine with Intel Core
i7-8700K CPU and RTX 2070 GPU in Table 4. We recognize that our method with adaptive
masking needs about 30 minutes for training GAN, and 6 minutes for masking update. This
time is relatively long but is still reasonable for non-time-sensitive applications. To alleviate the
computational cost, we use Transfer Learning, as discussed in Section 4.6. With this accelerated
network, the reconstruction time reduces to about 4 minutes, comparable to DeepDIH [5].

4.4. Test with real samples


To prove the applicability of our model in real-world scenarios, we have tested different types
of samples, including S1: Zea Stem, S2: Onion Epidermis, and S3: Stomata-Vicia Faba Leaf
(Fig. 10). The average cell sample size is 2 mm × 2 mm, equivalent to 1000 × 1000 pixels in
-- Compressed PDF version --

Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10128

Table 4. The runtime of different methods, including CS [11], DIH [5], DCOD [28], and our method
with and without masking. We use 500 × 500 images with 5000 iterations for all methods while 500
iterations are sufficient to produce high-quality results using transfer learning.
Method CS DIH DCOD Ours (DCOD as G) Ours w/o mask Ours w/ mask Ours Fine-tune
PSNR(dB) ∼30 secs ∼5 mins ∼3 mins ∼30 mins ∼30 mins ∼36 mins ∼4 mins

the sensor field. All samples have been placed at a distance of 5.5 mm (the closest possible) to
the CMOS sensor to avoid unnecessary diffraction of the object waves [28]. The parameters of
the framework are set accordingly. For example, we set pixel size (2 µm), wavelength (0.532
nm), and the distance from the sample to sensor (5,500 µm). We compare our method against
the aforementioned methods in Fig. 11, which shows that our method recovers a higher quality
texture while maintaining a clean background.

Fig. 10. The reconstruction of the three real samples S1: Zea Stem, S2: Onion Epidermis,
S3: Stomata-Vicia, Faba Leaf). (a) Captured hologram; (b) reconstructed amplitude;
(c) reconstructed phase; (d) zoom in part.

Fig. 11. The comparison between different methods on Onion Epidermis sample in terms
of reconstructed phase.

We also used the same setup to capture holographic readings of dendrite samples (Fig. 12).
The results are presented after convergence which occurs after 2,000 epochs. The results in
Figs. 10 and 12 demonstrate the end-to-end performance of the proposed GAN-based phase
recovery when applied to real holograms captured by our DIH setup.
-- Compressed PDF version --

Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10129

Fig. 12. The reconstruction process of a dendrite sample. (a) A typical mica-substrate
dendrite sample; (b) captured hologram of select part; (c) reconstructed amplitude; (d) re-
constructed phase; (e) 3D view of the reconstructed object surface.

4.5. Robustness to noise


Like regular images, the holographic readings can be noisy due to the illumination conditions,
rusty lens, sensor noise, and other imaging artifacts. We examine the impact of noise to ensure
reasonable noise levels do not substantially degrade the reconstruction quality. To this end,
we add additive white Gaussian noise (AWGN) of different levels (standard deviation: σ = 5,
σ = 10, and σ = 15) to the simulated holograms. The results for cell and dendrite samples are
respectively presented in Figs. 13 and 14, and summarized in Table 5.

Table 5. Comparison of different methods in reconstructing phase


contaminated with different noise levels.
Noise Level 0 5 10 15
PSNR 26.008 22.933 20.663 18.311
DeepDIH
SSIM 0.807 0.699 0.585 0.491
PSNR 25.169 21.881 18.481 16.216
DCOD
SSIM 0.748 0.664 0.481 0.423
Cell
PSNR 27.979 24.794 20.939 19.392
Ours (DCOD as G)
SSIM 0.869 0.732 0.568 0.492
PSNR 27.817 26.062 23.732 19.793
Ours (DeepDIH as G)
SSIM 0.941 0.842 0.716 0.548
PSNR 30.113 29.207 21.784 16.72
DeepDIH
SSIM 0.916 0.875 0.671 0.453
PSNR 30.54 28.793 21.597 17.118
DCOD
SSIM 0.922 0.865 0.639 0.494
Dendrite
PSNR 30.99 28.691 25.98 21.846
Ours (DCOD as G)
SSIM 0.931 0.872 0.76 0.708
PSNR 32.994 30.071 28.092 23.976
Ours (DeepDIH as G)
SSIM 0.969 0.911 0.846 0.763
-- Compressed PDF version --

Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10130

Fig. 13. Reconstructed amplitude of a cell sample by different approaches. The first row is
the simulated hologram under different noise levels σ = 0, 5, 10, 15.

Fig. 14. Reconstructed phase of the dendrite sample with their 3D plot. The first row is the
captured hologram with artificially added noise with standard deviations σ = 0, σ = 5, σ =
10, and σ = 15, respectively.

The results in Figs. 13 and 14 show that the phase recovery of our algorithm is fairly robust
against noise levels up to σ = 10 ∼ 15, and significantly improves upon the similar frameworks
such as DeepDIH and DCOD. Similar results are provided in Table 5 that shows better performance
for our methods both in SSIM and PSNR metrics. For instance, the SSIM of DeepDIH, DCOD,
ours using DCOD as generator, and ours using DeepDIH as generator for a dendrite sample
-- Compressed PDF version --

Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10131

under noise level σ = 15 is respectively 0.453, 0.494, 0.708, and 0.763. It means that our
method when using DCOD as generator increases the performance of DCOD from SSIM=0.494
to SSIM=0.708 (43% improvement). The same applies to our method when using DeepDIH
as generator (68% improvement). By increasing the noise level up to σ = 10, the performance
decay of our method is smaller than that of the DeepDIH and DCOD methods.
For instance, for the cell sample, DeepDIH shows around 3 dB decay for each ∆σ = 5 increase
in the noise level, while ours only shows around 2 dB decay. This represents a 50% improvement
in PSNR vs noise increase rate. In the dendrite sample, from σ = 5 to σ = 10, the SSIM of
DeepDIH decreases about 0.2, while that of ours only decreases about 0.06, which is 70% smaller.
We declare conservatively that the reconstruction quality is acceptable for noise levels up to
σ = 10, which incurs only around 4 dB decay in PSNR and around 0.2 SSIM loss.
The results overall confirm the robustness of the proposed model for noisy images. Part of this
robustness is inherited from the intrinsic noise removal capability of AEs used as a generator in
our framework. Also, imposing TV loss on the background section of the hologram removes
high-frequency noise from the image.

4.6. One-shot training and transfer learning


A key challenge of DL-based phase recovery methods compared to conventional numerical
methods is their generalizability and transferability to other experiments due to DL methods’
unexplainability and black-box nature. This matter can be problematic in real-time applications
since the time-consuming training phase should be repeated for every new sample. The proposed
method like some other untrained methods partially alleviates this issue by incorporating the
underlying physics laws. Originally, our model takes 3000-5000 iterations (∼30 minutes) to
reconstruct a hologram with random initialization. By using One-shot Training, we expect
the model trained for the first hologram reconstruction can directly be used for all other new
holograms with the same recording conditions. By using Transfer Learning, the model can be
initialized by the weights obtained from the reconstruction of the previous hologram, and then
the model fine-tunes itself to the new hologram, which takes fewer iterations.
To investigate the transferability of our methods, we develop an experiment with the following
three testing scenarios for simulated holograms for 4 randomly selected neuro samples taken
from the CCBD dataset [46].
I. "One-shot Train" model: The hologram of sample S1 is used to train the DH-GAN model
and reconstruct S1 amplitude and phase as usual with 3,000 iterations. This generator part
of the model is used to reconstruct the amplitude and phase of the holograms of samples
S2-S4 (one model for all).
II. "Retrain:500" model: the network is initialized with random weights, then the reconstruction
is performed independently for each sample using 500 iterations (four different models,
one for each sample).
III. "One-shot Train+500" model: we use the model trained for sample S1 to initialize the
network for other samples S2-S4, then fine-tune the model to perform reconstruction with
extra 500 iterations for each sample separately.
The results of this experiment are shown in Fig. 15 and Table 6. The results in Fig. 15(a,b)
show an excellent reconstruction quality for DH-GAN with 3,000 iterations. However, for fast
deployment, one may not afford repeating 3,000 training iterations for every new sample. In this
case, one potential solution would be using the trained network for other samples (e,i,m). The
results are quite impressive (PSNR is in 19 dB to 20.2 dB range) and can be acceptable for many
applications. Indeed, it outperforms independent networks trained for new samples using only
500 training iterations and random initialization (d,h,i), which achieve a PSNR in 12.8 dB to
-- Compressed PDF version --

Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10132

14.3 dB range. One intermediate solution would be transfer learning, namely using the network
trained for S1 as initialization for other networks and performing 500 training iterations for new
samples (f,j,n), which offers the best results (PSNR in 25 dB to 30 dB range).

Fig. 15. The transferability of the DH-GAN. (a) Simulated hologram of sample S1;
(b) reconstructed phase of sample S1 using the xcfully trained model. Left side: each row
represents a sample (S2,S3,S4); first column represents the captured hologram, and the next
three columns represent the results of the three testing scenarios.

Table 6. The performance (in PSNR) of three transfer learning


scenarios, presented in Fig. 15.
Samples S1 S2 S3 S4
Retrain: 500 - 13.028 14.240 12.866
One-shot Train 31.286 19.772 20.261 18.998
One-shot Train+500 - 29.476 25.162 25.576

To further investigate the transferability of the developed framework, we perform a test using
three sample types, including:1) MNIST handwriting digits, 2) CCBD, and 3) USAF Target. We
choose four samples of each type, and train an independent network with fixed initialization
for each sample using 3,000 iterations. The weights are collected once per 100 iterations and
considered a data point.
Figure 16 visualizes the resulting network weights in the 2D domain using principal component
analysis (PCA). The observation is quite interesting since the network weights corresponding
to the sample type are aligned in the same direction, and different sample types are somewhat
separated into disjoint clusters. However, this is not universal and in some cases, the network
trained for one sample type (like blue) can also be used for a different type (like red) if the
parameters are close enough in some compact space. Therefore, we should take caution when
using transfer learning.
Transferability is more challenging for real samples, because of varying recording conditions.
Here, we evaluate different methods on real samples S3 (Stomata-Vicia, Faba Leaf) and S2
(Onion Epidermis) shown in Fig. 10. All models are fully trained using S3, then fine-tuned for
S2 with 500 iterations. The reconstructed phase is shown in Fig. 17. Compared with retraining
with 500 iterations starting from random initialization (shown in the first row of Fig. 17), all
-- Compressed PDF version --

Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10133

Fig. 16. The 2D visualization of the network weights using PCA. Each data point represent
the vector of network weights after 100 iterations. Different colors represent different sample
types, including MNIST handwriting digits (green), CCBD (blue), and USAF Target (red).
The weights trained for similar patterns are radially clustering with the same orientation.

untrained networks demonstrate reasonable transferability although the two samples S2 and S3
are morphologically different (substantially different textures). Our proposed method exhibits a
significant improvement against DCOD methods in recovering details. Compared with DeepDIH,
our proposed method shows more contrast depth information. We also observe our framework can
boost the performance of DCOD, which can be due to the robustness to noisy reading conditions
introduced by our design.

Fig. 17. Phase reconstruction by Transfer Learning of different methods. All methods are
fully trained on sample S3 (Stomata-Vicia, Faba Leaf), then fine-tuned on sample S2 (Onion
Epidermis) in Fig. 10.

5. Conclusion
In this paper, we implemented a GAN-based framework to recover the 3D surface of micro-scaled
objects from holographic readings. Our method offers several novel features that yield phase
retrieval quality far beyond the current practice.
First, we utilized an AE-based generator network as a function approximator (to map real-valued
holograms into complex-valued object waves) in contrast to regular supervised GAN networks,
where the generator acts as a density estimator of data samples. Secondly, we implemented a
progressive masking method powered by simulated annealing that exploits image foregrounds
(e.g., fractal patterns in dendrite samples). This feature facilitates imposing smoothness through
TV loss on background areas that further improves the reconstruction and noise removal quality.
-- Compressed PDF version --

Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10134

The proposed method outperforms both conventional and DL-based methods designed for
phase recovery from one-shot imaging under similar conditions. Our method achieves a 10 dB
gain in PSNR over the CS-based [11] and about 5 dB gain over the most recent untrained deep
learning methods such as DeepDIH [5], and DCOD [28]. An additional 3 dB gain is observed for
activating the adaptive masking module. Moreover, our model is sufficiently robust against noise
and tolerates AWGN noise up to σ = 10. It shows only about 0.4 dB decay per unit noise variance
increase, lower than similar methods. Our method elevates the DL-based digital holography to
higher levels with a subtle computation increment. Furthermore, we explored transfer learning to
enable fast utilization of the proposed method in time-constrained applications. Our experiments
show that using a model trained for a similar sample can offer a reasonable reconstruction quality.
Using transfer learning by borrowing network weights trained for a similar sample and performing
additional 500 iterations for the new sample brings a considerable gain of about 12 dB compared
to independent training with 500 iterations. This observation suggests that the developed model
is highly transferrable between samples of the same type, but transferability across different
sample types needs further investigation.
Funding. U.S. Department of Agriculture (2020-67017-33078).
Acknowledgments. The authors would like to thank Dr. Bruce Gao for his comments on developing the test setup
and experiment scenarios.
Disclosures. The authors have no known competing financial interests or personal relationships that could have
appeared to influence the work reported in this paper.
Data availability. Data underlying the results presented in this paper are not publicly available at this time but may
be obtained from the authors upon reasonable request.

References
1. J. K. Wallace, S. Rider, E. Serabyn, J. Kühn, K. Liewer, J. Deming, G. Showalter, C. Lindensmith, and J. Nadeau,
“Robust, compact implementation of an off-axis digital holographic microscope,” Opt. Express 23(13), 17367–17378
(2015).
2. N. Patel, S. Rawat, M. Joglekar, V. Chhaniwal, S. K. Dubey, T. O’Connor, B. Javidi, and A. Anand, “Compact
and low-cost instrument for digital holographic microscopy of immobilized micro-particles,” Opt. Lasers Eng. 137,
106397 (2021).
3. W. Xu, M. Jericho, I. Meinertzhagen, and H. Kreuzer, “Digital in-line holography for biological applications,” Proc.
Natl. Acad. Sci. 98(20), 11301–11305 (2001).
4. A. Alfalou and C. Brosseau, “Optical image compression and encryption methods,” Adv. Opt. Photonics 1(3),
589–636 (2009).
5. H. Li, X. Chen, Z. Chi, C. Mann, and A. Razi, “Deep dih: single-shot digital in-line holography reconstruction by
deep learning,” IEEE Access 8, 202648–202659 (2020).
6. M. K. Kim, “Principles and techniques of digital holographic microscopy,” SPIE Rev. 1(1), 018005 (2010).
7. C. J. Mann, L. Yu, C.-M. Lo, and M. K. Kim, “High-resolution quantitative phase-contrast microscopy by digital
holography,” Opt. Express 13(22), 8693–8698 (2005).
8. G. Koren, F. Polack, and D. Joyeux, “Iterative algorithms for twin-image elimination in in-line holography using
finite-support constraints,” J. Opt. Soc. Am. A 10(3), 423–433 (1993).
9. N. Bari, G. Mani, and S. Berkovich, “Internet of things as a methodological concept,” in Fourth International
Conference on Computing for Geospatial Research and Application (IEEE, 2013), pp. 48–55.
10. Z. Chi, A. Valehi, H. Peng, M. Kozicki, and A. Razi, “Consistency penalized graph matching for image-based
identification of dendritic patterns,” IEEE Access 8, 118623–118637 (2020).
11. W. Zhang, L. Cao, D. J. Brady, H. Zhang, J. Cang, H. Zhang, and G. Jin, “Twin-image-free holography: a compressive
sensing approach,” Phys. Rev. Lett. 121(9), 093902 (2018).
12. C. Bai, T. Peng, J. Min, R. Li, Y. Zhou, and B. Yao, “Dual-wavelength in-line digital holography with untrained deep
neural networks,” Photonics Res. 9(12), 2501–2510 (2021).
13. G. Situ, “Deep holography,” Light: Adv. Manuf. 3(2), 1 (2022).
14. T. Shimobaba, D. Blinder, T. Birnbaum, I. Hoshi, H. Shiomi, P. Schelkens, and T. Ito, “Deep-learning computational
holography: A review,” Front. Photonics 3, 8 (2022).
15. T. Zeng, Y. Zhu, and E. Y. Lam, “Deep learning for digital holography: a review,” Opt. Express 29(24), 40572–40593
(2021).
16. H. Wang, M. Lyu, and G. Situ, “eholonet: a learning-based end-to-end approach for in-line digital holographic
reconstruction,” Opt. Express 26(18), 22603–22614 (2018).
17. R. Horisaki, R. Takagi, and J. Tanida, “Deep-learning-generated holography,” Appl. Opt. 57(14), 3859–3863 (2018).
-- Compressed PDF version --

Research Article Vol. 31, No. 6 / 13 Mar 2023 / Optics Express 10135

18. Y. Rivenson, Y. Zhang, H. Günaydın, D. Teng, and A. Ozcan, “Phase recovery and holographic image reconstruction
using deep learning in neural networks,” Light: Sci. Appl. 7(2), 17141 (2017).
19. Y. Zhang, H. Wang, and M. Shan, “Deep-learning-enhanced digital holographic autofocus imaging,” in Proceedings
of the 2020 4th International Conference on Digital Signal Processing (2020), pp. 56–60.
20. K. Wang, J. Dou, Q. Kemao, J. Di, and J. Zhao, “Y-net: a one-to-two deep learning framework for digital holographic
reconstruction,” Opt. Lett. 44(19), 4765–4768 (2019).
21. Z. Ren, Z. Xu, and E. Y. Lam, “End-to-end deep learning framework for digital holographic reconstruction,” Adv.
Photonics 1(01), 1–016004 (2019).
22. H. Chen, L. Huang, T. Liu, and A. Ozcan, “Fourier imager network (fin): A deep neural network for hologram
reconstruction with superior external generalization,” Light: Sci. Appl. 11(1), 254 (2022).
23. Y. Wu, Y. Luo, G. Chaudhari, Y. Rivenson, A. Calis, K. de Haan, and A. Ozcan, “Bright-field holography: cross-
modality deep learning enables snapshot 3d imaging with bright-field contrast using a single hologram,” Light: Sci.
Appl. 8(1), 25–27 (2019).
24. D. Yin, Z. Gu, Y. Zhang, F. Gu, S. Nie, J. Ma, and C. Yuan, “Digital holographic reconstruction based on deep
learning framework with unpaired data,” IEEE Photonics J. 12(2), 1–12 (2020).
25. Y. Zhang, M. A. Noack, P. Vagovic, K. Fezzaa, F. Garcia-Moreno, T. Ritschel, and P. Villanueva-Perez, “Phasegan:
A deep-learning phase-retrieval approach for unpaired datasets,” Opt. Express 29(13), 19593–19604 (2021).
26. F. A. Jenkins and H. E. White, “Fundamentals of optics,” Indian J. Phys. 25, 265–266 (1957).
27. R. Heckel and P. Hand, “Deep decoder: Concise image representations from untrained non-convolutional networks,”
arXiv, arXiv:1810.03982 (2018).
28. F. Niknam, H. Qazvini, and H. Latifi, “Holographic optical field recovery using a regularized untrained deep decoder
network,” Sci. Rep. 11(1), 10903–10913 (2021).
29. D. Ulyanov, A. Vedaldi, and V. Lempitsky, “Deep image prior,” in Proceedings of the IEEE conference on computer
vision and pattern recognition (2018), pp. 9446–9454.
30. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in
International Conference on Medical image computing and computer-assisted intervention (Springer, 2015), pp.
234–241.
31. F. Wang, Y. Bian, H. Wang, M. Lyu, G. Pedrini, W. Osten, G. Barbastathis, and G. Situ, “Phase imaging with an
untrained neural network,” Light: Sci. Appl. 9(1), 77 (2020).
32. G. Palubinskas, “Image similarity/distance measures: what is really behind mse and ssim?” Int. J. Image Data Fusion
8(1), 32–53 (2017).
33. J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” in European
conference on computer vision (Springer, 2016), pp. 694–711.
34. R. W. Gerchberg, “A practical algorithm for the determination of phase from image and diffraction plane pictures,”
Optik 35(2), 237–246 (1972).
35. Z. Zalevsky, D. Mendlovic, and R. G. Dorsch, “Gerchberg–saxton algorithm applied in the fractional fourier or the
fresnel domain,” Opt. Lett. 21(12), 842–844 (1996).
36. T. Latychevskaia, “Iterative phase retrieval for digital holography: tutorial,” J. Opt. Soc. Am. A 36(12), D31–D40
(2019).
37. T. Latychevskaia and H.-W. Fink, “Practical algorithms for simulation and reconstruction of digital in-line holograms,”
Appl. Opt. 54(9), 2424–2434 (2015).
38. M. A. Schofield and Y. Zhu, “Fast phase unwrapping algorithm for interferometric applications,” Opt. Lett. 28(14),
1194–1196 (2003).
39. S. Cai, Z. Mao, Z. Wang, M. Yin, and G. E. Karniadakis, “Physics-informed neural networks (pinns) for fluid
mechanics: A review,” Acta Mech. Sinica pp. 1–12 (2022).
40. S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate
shift,” in International conference on machine learning (PMLR, 2015), pp. 448–456.
41. M. N. Kozicki, “Dendritic structures and tags,” (2021). US Patent 11,170,190.
42. M. N. Kozicki, “Dendritic tags,” (2022). US Patent App. 17/311,154.
43. A. Razi and Z. Chi, “Methods and systems for generating unclonable optical tags,” (2022). US Patent App. 17/505,547.
44. A. Valehi, A. Razi, B. Cambou, W. Yu, and M. Kozicki, “A graph matching algorithm for user authentication in data
networks using image-based physical unclonable functions,” in Computing Conference (IEEE, 2017), pp. 863–870.
45. H. Wang, X. Chen, and A. Razi, “Fast key points detection and matching for tree-structured images,” arXiv,
arXiv:2211.03242 (2022).
46. “Cil project: P1170”.

You might also like