Add photo2cartoon model (PaddlePaddle#117)

* Add photo2cartoon model * Resolve conflicts * Remove comments * Add photo2cartoon tutorial * update p2c tutorials
SlamDunkDcc · Dec 29, 2020 · 792b38a · 792b38a
1 parent 5519d09
commit 792b38a
Show file tree

Hide file tree

Showing 21 changed files with 1,900 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -86,6 +86,14 @@ GAN-Generative Adversarial Network, was praised by "the Father of Convolutional
   <img src='./docs/imgs/ugatit.png'width='700' height='250'/>
 </div>
 
+
+### Realistic face cartoonization
+
+<div align='center'>
+  <img src='./docs/imgs/photo2cartoon.png'width='700' height='250'/>
+</div>
+
+
 ### Photo animation
 
 <div align='center'>

diff --git a/README_cn.md b/README_cn.md
@@ -98,6 +98,14 @@ GAN--生成对抗网络，被“卷积网络之父”**Yann LeCun（杨立昆）
   <img src='./docs/imgs/ugatit.png'width='700' height='250'/>
 </div>
 
+
+### 写实人像卡通化
+
+<div align='center'>
+  <img src='./docs/imgs/photo2cartoon.png'width='700' height='250'/>
+</div>
+
+
 ### 照片动漫化
 
 <div align='center'>

diff --git a/configs/ugatit_photo2cartoon.yaml b/configs/ugatit_photo2cartoon.yaml
@@ -0,0 +1,85 @@
+epochs: 300
+output_dir: output_dir
+adv_weight: 1.0
+cycle_weight: 50.0
+identity_weight: 10.0
+cam_weight: 1000.0
+
+model:
+  name: UGATITModel
+  generator:
+    name: ResnetUGATITP2CGenerator
+    input_nc: 3
+    output_nc: 3
+    ngf: 32
+    n_blocks: 4
+    img_size: 256
+    light: True
+  discriminator_g:
+    name: UGATITDiscriminator
+    input_nc: 3
+    ndf: 32
+    n_layers: 7
+  discriminator_l:
+    name: UGATITDiscriminator
+    input_nc: 3
+    ndf: 32
+    n_layers: 5
+
+dataset:
+  train:
+    name: UnpairedDataset
+    dataroot: data/photo2cartoon
+    num_workers: 0
+    phase: train
+    max_dataset_size: inf
+    direction: AtoB
+    input_nc: 3
+    output_nc: 3
+    serial_batches: False
+    transforms:
+      - name: Resize
+        size: [286, 286]
+        interpolation: 'bilinear' #'bicubic' #cv2.INTER_CUBIC
+      - name: RandomCrop
+        size: [256, 256]
+      - name: RandomHorizontalFlip
+        prob: 0.5
+      - name: Transpose
+      - name: Normalize
+        mean: [127.5, 127.5, 127.5]
+        std: [127.5, 127.5, 127.5]
+  test:
+    name: SingleDataset
+    dataroot: data/photo2cartoon/testA
+    max_dataset_size: inf
+    direction: AtoB
+    input_nc: 3
+    output_nc: 3
+    serial_batches: False
+    transforms:
+      - name: Resize
+        size: [256, 256]
+        interpolation: 'bilinear' #cv2.INTER_CUBIC
+      - name: Transpose
+      - name: Normalize
+        mean: [127.5, 127.5, 127.5]
+        std: [127.5, 127.5, 127.5]
+
+optimizer:
+  name: Adam
+  beta1: 0.5
+  weight_decay: 0.0001
+
+lr_scheduler:
+  name: linear
+  learning_rate: 0.0001
+  start_epoch: 150
+  decay_epochs: 150
+
+log_config:
+  interval: 10
+  visiual_interval: 500
+
+snapshot_config:
+  interval: 30
diff --git a/docs/en_US/tutorials/photo2cartoon.md b/docs/en_US/tutorials/photo2cartoon.md
@@ -0,0 +1,81 @@
+# Photo2cartoon
+
+## 1 Principle
+
+  The aim of portrait cartoon stylization is to transform real photos into cartoon images with portrait's ID information and texture details. We use Generative Adversarial Network method to realize the mapping of picture to cartoon. Considering the difficulty in obtaining paired data and the non-corresponding shape of input and output, we adopt unpaired image translation fashion.
+
+  Recently, Kim et al. propose a novel normalization function (AdaLIN) and an attention module in paper "U-GAT-IT" and achieve exquisite selfie2anime results. Different from the exaggerated anime style, our cartoon style is more realistic and contains unequivocal ID information.
+
+  We propose a Soft Adaptive Layer-Instance Normalization (Soft-AdaLIN) method which fuses the statistics of encoding features and decoding features in de-standardization. 
+
+  Based on U-GAT-IT, two hourglass modules are introduced before encoder and after decoder to improve the performance in a progressively way.
+
+  Different from the exaggerated anime style, our cartoon style is more realistic and contains unequivocal ID information. In original [project](https://github.com/minivision-ai/photo2cartoon), we add a Face ID Loss (cosine distance of ID features between input image and cartoon image) to reach identity invariance. (Face ID Loss is not added in this repo, please refer to photo2cartoon)
+
+  ![](../../imgs/photo2cartoon_pipeline.png)
+
+  We also pre-process the data to a fixed pattern to help reduce the difficulty of optimization. For details, see below.
+
+  ![](../../imgs/photo2cartoon_data_process.jpg)
+
+## 2 How to use 
+
+### 2.1 Test
+
+  ```
+  from ppgan.apps import Photo2CartoonPredictor
+
+  p2c = Photo2CartoonPredictor()
+  p2c.run('test_img.jpg')
+  ```
+
+### 2.2 Train
+
+  Prepare Datasets:
+
+  Training data contains portrait photos (domain A) and cartoon images (domain B), and can be downloaded from [baidu driver](https://pan.baidu.com/s/1RqB4MNMAY_yyXAIS3KBXqw)(password: fo8u).
+  The structure of dataset is as following:
+
+  ```
+  ├── data
+      └── photo2cartoon
+          ├── trainA
+          ├── trainB
+          ├── testA
+          └── testB
+  ```
+
+  Train:
+
+  ```
+     python -u tools/main.py --config-file configs/ugatit_photo2cartoon.yaml
+  ```
+
+
+## 3 Results
+
+![](../../imgs/photo2cartoon.png)
+
+## 4 Download
+
+| model | link |
+|---|---|
+| photo2cartoon_genA2B | [photo2cartoon_genA2B](https://paddlegan.bj.bcebos.com/models/photo2cartoon_genA2B_weight.pdparams)
+
+
+# References
+
+- [U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation](https://arxiv.org/abs/1907.10830)
+
+  ```
+  @inproceedings{Kim2020U-GAT-IT:,
+    title={U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation},
+    author={Junho Kim and Minjae Kim and Hyeonwoo Kang and Kwang Hee Lee},
+    booktitle={International Conference on Learning Representations},
+    year={2020}
+  }
+  ```
+
+
+# Authors
+[minivision-ai](https://github.com/minivision-ai)、[haoqiang](https://github.com/hao-qiang)
diff --git a/docs/imgs/photo2cartoon.png b/docs/imgs/photo2cartoon.png
diff --git a/docs/imgs/photo2cartoon_data_process.jpg b/docs/imgs/photo2cartoon_data_process.jpg
diff --git a/docs/imgs/photo2cartoon_pipeline.png b/docs/imgs/photo2cartoon_pipeline.png
diff --git a/docs/zh_CN/tutorials/photo2cartoon.md b/docs/zh_CN/tutorials/photo2cartoon.md
@@ -0,0 +1,77 @@
+# Photo2cartoon
+
+## 1 原理介绍
+
+  人像卡通风格渲染的目标是，在保持原图像ID信息和纹理细节的同时，将真实照片转换为卡通风格的非真实感图像。一般而言，基于成对数据的pix2pix方法能达到较好的图像转换效果，但本任务的输入输出轮廓并非一一对应，例如卡通风格的眼睛更大、下巴更瘦；且成对的数据绘制难度大、成本较高，因此我们采用unpaired image translation方法来实现。
+
+  近期的论文U-GAT-IT提出了一种归一化方法——AdaLIN，能够自动调节Instance Norm和Layer Norm的比重，再结合attention机制能够实现精美的人像日漫风格转换。为了实现写实的人像卡通化风格，我们对U-GAT-IT进行了定制化的修改。
+
+  我们提出了一种Soft-AdaLIN（Soft Adaptive Layer-Instance Normalization）归一化方法，在反规范化时将编码器的均值方差（照片特征）与解码器的均值方差（卡通特征）相融合。
+
+  模型结构方面，在U-GAT-IT的基础上，我们在编码器之前和解码器之后各增加了2个hourglass模块，渐进地提升模型特征抽象和重建能力。
+
+  在[原项目](https://github.com/minivision-ai/photo2cartoon)中我们还增加了Face ID Loss，使用预训练的人脸识别模型提取照片和卡通画的ID特征，通过余弦距离来约束生成的卡通画，使其更像本人。(paddle版本中暂时未加入Face ID Loss，请参见原项目)
+
+  ![](../../imgs/photo2cartoon_pipeline.png)
+
+  由于实验数据较为匮乏，为了降低训练难度，我们将数据处理成固定的模式。首先检测图像中的人脸及关键点，根据人脸关键点旋转校正图像，并按统一标准裁剪，再将裁剪后的头像输入人像分割模型（基于PaddleSeg框架训练）去除背景。
+
+  ![](../../imgs/photo2cartoon_data_process.jpg)
+
+## 2 如何使用
+
+### 2.1 测试
+  ```
+  from ppgan.apps import Photo2CartoonPredictor
+
+  p2c = Photo2CartoonPredictor()
+  p2c.run('test_img.jpg')
+  ```
+
+### 2.2 训练
+
+  数据准备：
+
+  模型使用非成对数据训练，下载地址:[百度网盘](https://pan.baidu.com/s/1RqB4MNMAY_yyXAIS3KBXqw)，提取码：fo8u。
+  数据集组成方式如下：
+  ```
+  ├── data
+      └── photo2cartoon
+          ├── trainA
+          ├── trainB
+          ├── testA
+          └── testB
+  ```
+
+  训练模型：
+  ```
+     python -u tools/main.py --config-file configs/ugatit_photo2cartoon.yaml
+  ```
+
+
+## 3 结果展示
+
+![](../../imgs/photo2cartoon.png)
+
+## 4 模型下载
+| 模型 | 下载地址 |
+|---|---|
+| photo2cartoon_genA2B | [下载链接](https://paddlegan.bj.bcebos.com/models/photo2cartoon_genA2B_weight.pdparams)
+
+
+# 参考
+
+- [U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation](https://arxiv.org/abs/1907.10830)
+
+  ```
+  @inproceedings{Kim2020U-GAT-IT:,
+    title={U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation},
+    author={Junho Kim and Minjae Kim and Hyeonwoo Kang and Kwang Hee Lee},
+    booktitle={International Conference on Learning Representations},
+    year={2020}
+  }
+  ```
+
+
+# 作者
+[minivision-ai](https://github.com/minivision-ai)、[haoqiang](https://github.com/hao-qiang)
diff --git a/ppgan/apps/__init__.py b/ppgan/apps/__init__.py
@@ -21,5 +21,6 @@
 from .face_parse_predictor import FaceParsePredictor
 from .animegan_predictor import AnimeGANPredictor
 from .midas_predictor import MiDaSPredictor
+from .photo2cartoon_predictor import Photo2CartoonPredictor
 from .styleganv2_predictor import StyleGANv2Predictor
 from .pixel2style2pixel_predictor import Pixel2Style2PixelPredictor
diff --git a/ppgan/apps/photo2cartoon_predictor.py b/ppgan/apps/photo2cartoon_predictor.py
@@ -0,0 +1,77 @@
+# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+
+import cv2
+from PIL import Image
+import numpy as np
+
+import paddle
+from paddle.utils.download import get_path_from_url
+from ppgan.faceutils.dlibutils import align_crop
+from ppgan.faceutils.face_segmentation import FaceSeg
+from ppgan.models.generators import ResnetUGATITP2CGenerator
+
+from .base_predictor import BasePredictor
+
+
+P2C_WEIGHT_URL = "https://paddlegan.bj.bcebos.com/models/photo2cartoon_genA2B_weight.pdparams"
+
+
+class Photo2CartoonPredictor(BasePredictor):
+    def __init__(self, output_path='output', weight_path=None):
+        self.output_path = output_path
+        if not os.path.exists(self.output_path):
+            os.makedirs(self.output_path)
+
+        if weight_path is None:
+            cur_path = os.path.abspath(os.path.dirname(__file__))
+            weight_path = get_path_from_url(P2C_WEIGHT_URL, cur_path)
+
+        self.genA2B = ResnetUGATITP2CGenerator()
+        params = paddle.load(weight_path)
+        self.genA2B.set_state_dict(params)
+        self.genA2B.eval()
+
+        self.faceseg = FaceSeg()
+
+    def run(self, image_path):
+        image = Image.open(image_path)
+        face_image = align_crop(image)
+        face_mask = self.faceseg(face_image)
+
+        face_image = cv2.resize(face_image, (256, 256), interpolation=cv2.INTER_AREA)
+        face_mask = cv2.resize(face_mask, (256, 256))[:, :, np.newaxis] / 255.
+        face = (face_image * face_mask + (1 - face_mask) * 255) / 127.5 - 1
+
+        face = np.transpose(face[np.newaxis, :, :, :], (0, 3, 1, 2)).astype(np.float32)
+        face = paddle.to_tensor(face)
+
+        # inference
+        with paddle.no_grad():
+            cartoon = self.genA2B(face)[0][0]
+
+        # post-process
+        cartoon = np.transpose(cartoon.numpy(), (1, 2, 0))
+        cartoon = (cartoon + 1) * 127.5
+        cartoon = (cartoon * face_mask + (1 - face_mask) * 255).astype(np.uint8)
+
+        pnoto_save_path = os.path.join(self.output_path, 'p2c_photo.png')
+        cv2.imwrite(pnoto_save_path, cv2.cvtColor(face_image, cv2.COLOR_RGB2BGR))
+        cartoon_save_path = os.path.join(self.output_path, 'p2c_cartoon.png')
+        cv2.imwrite(cartoon_save_path, cv2.cvtColor(cartoon, cv2.COLOR_RGB2BGR))
+
+        print("Cartoon image has been saved at '{}'.".format(cartoon_save_path))
+        return cartoon
diff --git a/ppgan/faceutils/__init__.py b/ppgan/faceutils/__init__.py
@@ -15,3 +15,4 @@
 from . import dlibutils as dlib
 from . import mask
 from . import image
+from . import face_segmentation
diff --git a/ppgan/faceutils/dlibutils/__init__.py b/ppgan/faceutils/dlibutils/__init__.py
@@ -13,3 +13,4 @@
 # limitations under the License.
 
 from .dlib_utils import detect, crop, landmarks, crop_from_array
+from .face_align import align_crop