Skip to content

Commit

Permalink
Add photo2cartoon model (PaddlePaddle#117)
Browse files Browse the repository at this point in the history
* Add photo2cartoon model
* Resolve conflicts
* Remove comments
* Add photo2cartoon tutorial
* update p2c tutorials
  • Loading branch information
hao-qiang authored Dec 29, 2020
1 parent 5519d09 commit 792b38a
Show file tree
Hide file tree
Showing 21 changed files with 1,900 additions and 0 deletions.
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,14 @@ GAN-Generative Adversarial Network, was praised by "the Father of Convolutional
<img src='./docs/imgs/ugatit.png'width='700' height='250'/>
</div>


### Realistic face cartoonization

<div align='center'>
<img src='./docs/imgs/photo2cartoon.png'width='700' height='250'/>
</div>


### Photo animation

<div align='center'>
Expand Down
8 changes: 8 additions & 0 deletions README_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,14 @@ GAN--生成对抗网络,被“卷积网络之父”**Yann LeCun(杨立昆)
<img src='./docs/imgs/ugatit.png'width='700' height='250'/>
</div>


### 写实人像卡通化

<div align='center'>
<img src='./docs/imgs/photo2cartoon.png'width='700' height='250'/>
</div>


### 照片动漫化

<div align='center'>
Expand Down
85 changes: 85 additions & 0 deletions configs/ugatit_photo2cartoon.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
epochs: 300
output_dir: output_dir
adv_weight: 1.0
cycle_weight: 50.0
identity_weight: 10.0
cam_weight: 1000.0

model:
name: UGATITModel
generator:
name: ResnetUGATITP2CGenerator
input_nc: 3
output_nc: 3
ngf: 32
n_blocks: 4
img_size: 256
light: True
discriminator_g:
name: UGATITDiscriminator
input_nc: 3
ndf: 32
n_layers: 7
discriminator_l:
name: UGATITDiscriminator
input_nc: 3
ndf: 32
n_layers: 5

dataset:
train:
name: UnpairedDataset
dataroot: data/photo2cartoon
num_workers: 0
phase: train
max_dataset_size: inf
direction: AtoB
input_nc: 3
output_nc: 3
serial_batches: False
transforms:
- name: Resize
size: [286, 286]
interpolation: 'bilinear' #'bicubic' #cv2.INTER_CUBIC
- name: RandomCrop
size: [256, 256]
- name: RandomHorizontalFlip
prob: 0.5
- name: Transpose
- name: Normalize
mean: [127.5, 127.5, 127.5]
std: [127.5, 127.5, 127.5]
test:
name: SingleDataset
dataroot: data/photo2cartoon/testA
max_dataset_size: inf
direction: AtoB
input_nc: 3
output_nc: 3
serial_batches: False
transforms:
- name: Resize
size: [256, 256]
interpolation: 'bilinear' #cv2.INTER_CUBIC
- name: Transpose
- name: Normalize
mean: [127.5, 127.5, 127.5]
std: [127.5, 127.5, 127.5]

optimizer:
name: Adam
beta1: 0.5
weight_decay: 0.0001

lr_scheduler:
name: linear
learning_rate: 0.0001
start_epoch: 150
decay_epochs: 150

log_config:
interval: 10
visiual_interval: 500

snapshot_config:
interval: 30
81 changes: 81 additions & 0 deletions docs/en_US/tutorials/photo2cartoon.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Photo2cartoon

## 1 Principle

The aim of portrait cartoon stylization is to transform real photos into cartoon images with portrait's ID information and texture details. We use Generative Adversarial Network method to realize the mapping of picture to cartoon. Considering the difficulty in obtaining paired data and the non-corresponding shape of input and output, we adopt unpaired image translation fashion.

Recently, Kim et al. propose a novel normalization function (AdaLIN) and an attention module in paper "U-GAT-IT" and achieve exquisite selfie2anime results. Different from the exaggerated anime style, our cartoon style is more realistic and contains unequivocal ID information.

We propose a Soft Adaptive Layer-Instance Normalization (Soft-AdaLIN) method which fuses the statistics of encoding features and decoding features in de-standardization.

Based on U-GAT-IT, two hourglass modules are introduced before encoder and after decoder to improve the performance in a progressively way.

Different from the exaggerated anime style, our cartoon style is more realistic and contains unequivocal ID information. In original [project](https://github.com/minivision-ai/photo2cartoon), we add a Face ID Loss (cosine distance of ID features between input image and cartoon image) to reach identity invariance. (Face ID Loss is not added in this repo, please refer to photo2cartoon)

![](../../imgs/photo2cartoon_pipeline.png)

We also pre-process the data to a fixed pattern to help reduce the difficulty of optimization. For details, see below.

![](../../imgs/photo2cartoon_data_process.jpg)

## 2 How to use

### 2.1 Test

```
from ppgan.apps import Photo2CartoonPredictor
p2c = Photo2CartoonPredictor()
p2c.run('test_img.jpg')
```

### 2.2 Train

Prepare Datasets:

Training data contains portrait photos (domain A) and cartoon images (domain B), and can be downloaded from [baidu driver](https://pan.baidu.com/s/1RqB4MNMAY_yyXAIS3KBXqw)(password: fo8u).
The structure of dataset is as following:

```
├── data
└── photo2cartoon
├── trainA
├── trainB
├── testA
└── testB
```

Train:

```
python -u tools/main.py --config-file configs/ugatit_photo2cartoon.yaml
```


## 3 Results

![](../../imgs/photo2cartoon.png)

## 4 Download

| model | link |
|---|---|
| photo2cartoon_genA2B | [photo2cartoon_genA2B](https://paddlegan.bj.bcebos.com/models/photo2cartoon_genA2B_weight.pdparams)


# References

- [U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation](https://arxiv.org/abs/1907.10830)

```
@inproceedings{Kim2020U-GAT-IT:,
title={U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation},
author={Junho Kim and Minjae Kim and Hyeonwoo Kang and Kwang Hee Lee},
booktitle={International Conference on Learning Representations},
year={2020}
}
```


# Authors
[minivision-ai](https://github.com/minivision-ai)[haoqiang](https://github.com/hao-qiang)
Binary file added docs/imgs/photo2cartoon.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/imgs/photo2cartoon_data_process.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/imgs/photo2cartoon_pipeline.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
77 changes: 77 additions & 0 deletions docs/zh_CN/tutorials/photo2cartoon.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Photo2cartoon

## 1 原理介绍

人像卡通风格渲染的目标是,在保持原图像ID信息和纹理细节的同时,将真实照片转换为卡通风格的非真实感图像。一般而言,基于成对数据的pix2pix方法能达到较好的图像转换效果,但本任务的输入输出轮廓并非一一对应,例如卡通风格的眼睛更大、下巴更瘦;且成对的数据绘制难度大、成本较高,因此我们采用unpaired image translation方法来实现。

近期的论文U-GAT-IT提出了一种归一化方法——AdaLIN,能够自动调节Instance Norm和Layer Norm的比重,再结合attention机制能够实现精美的人像日漫风格转换。为了实现写实的人像卡通化风格,我们对U-GAT-IT进行了定制化的修改。

我们提出了一种Soft-AdaLIN(Soft Adaptive Layer-Instance Normalization)归一化方法,在反规范化时将编码器的均值方差(照片特征)与解码器的均值方差(卡通特征)相融合。

模型结构方面,在U-GAT-IT的基础上,我们在编码器之前和解码器之后各增加了2个hourglass模块,渐进地提升模型特征抽象和重建能力。

[原项目](https://github.com/minivision-ai/photo2cartoon)中我们还增加了Face ID Loss,使用预训练的人脸识别模型提取照片和卡通画的ID特征,通过余弦距离来约束生成的卡通画,使其更像本人。(paddle版本中暂时未加入Face ID Loss,请参见原项目)

![](../../imgs/photo2cartoon_pipeline.png)

由于实验数据较为匮乏,为了降低训练难度,我们将数据处理成固定的模式。首先检测图像中的人脸及关键点,根据人脸关键点旋转校正图像,并按统一标准裁剪,再将裁剪后的头像输入人像分割模型(基于PaddleSeg框架训练)去除背景。

![](../../imgs/photo2cartoon_data_process.jpg)

## 2 如何使用

### 2.1 测试
```
from ppgan.apps import Photo2CartoonPredictor
p2c = Photo2CartoonPredictor()
p2c.run('test_img.jpg')
```

### 2.2 训练

数据准备:

模型使用非成对数据训练,下载地址:[百度网盘](https://pan.baidu.com/s/1RqB4MNMAY_yyXAIS3KBXqw),提取码:fo8u。
数据集组成方式如下:
```
├── data
└── photo2cartoon
├── trainA
├── trainB
├── testA
└── testB
```

训练模型:
```
python -u tools/main.py --config-file configs/ugatit_photo2cartoon.yaml
```


## 3 结果展示

![](../../imgs/photo2cartoon.png)

## 4 模型下载
| 模型 | 下载地址 |
|---|---|
| photo2cartoon_genA2B | [下载链接](https://paddlegan.bj.bcebos.com/models/photo2cartoon_genA2B_weight.pdparams)


# 参考

- [U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation](https://arxiv.org/abs/1907.10830)

```
@inproceedings{Kim2020U-GAT-IT:,
title={U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation},
author={Junho Kim and Minjae Kim and Hyeonwoo Kang and Kwang Hee Lee},
booktitle={International Conference on Learning Representations},
year={2020}
}
```


# 作者
[minivision-ai](https://github.com/minivision-ai)[haoqiang](https://github.com/hao-qiang)
1 change: 1 addition & 0 deletions ppgan/apps/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,6 @@
from .face_parse_predictor import FaceParsePredictor
from .animegan_predictor import AnimeGANPredictor
from .midas_predictor import MiDaSPredictor
from .photo2cartoon_predictor import Photo2CartoonPredictor
from .styleganv2_predictor import StyleGANv2Predictor
from .pixel2style2pixel_predictor import Pixel2Style2PixelPredictor
77 changes: 77 additions & 0 deletions ppgan/apps/photo2cartoon_predictor.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# copyright (c) 2020 PaddlePaddle Authors. All Rights Reserve.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import os

import cv2
from PIL import Image
import numpy as np

import paddle
from paddle.utils.download import get_path_from_url
from ppgan.faceutils.dlibutils import align_crop
from ppgan.faceutils.face_segmentation import FaceSeg
from ppgan.models.generators import ResnetUGATITP2CGenerator

from .base_predictor import BasePredictor


P2C_WEIGHT_URL = "https://paddlegan.bj.bcebos.com/models/photo2cartoon_genA2B_weight.pdparams"


class Photo2CartoonPredictor(BasePredictor):
def __init__(self, output_path='output', weight_path=None):
self.output_path = output_path
if not os.path.exists(self.output_path):
os.makedirs(self.output_path)

if weight_path is None:
cur_path = os.path.abspath(os.path.dirname(__file__))
weight_path = get_path_from_url(P2C_WEIGHT_URL, cur_path)

self.genA2B = ResnetUGATITP2CGenerator()
params = paddle.load(weight_path)
self.genA2B.set_state_dict(params)
self.genA2B.eval()

self.faceseg = FaceSeg()

def run(self, image_path):
image = Image.open(image_path)
face_image = align_crop(image)
face_mask = self.faceseg(face_image)

face_image = cv2.resize(face_image, (256, 256), interpolation=cv2.INTER_AREA)
face_mask = cv2.resize(face_mask, (256, 256))[:, :, np.newaxis] / 255.
face = (face_image * face_mask + (1 - face_mask) * 255) / 127.5 - 1

face = np.transpose(face[np.newaxis, :, :, :], (0, 3, 1, 2)).astype(np.float32)
face = paddle.to_tensor(face)

# inference
with paddle.no_grad():
cartoon = self.genA2B(face)[0][0]

# post-process
cartoon = np.transpose(cartoon.numpy(), (1, 2, 0))
cartoon = (cartoon + 1) * 127.5
cartoon = (cartoon * face_mask + (1 - face_mask) * 255).astype(np.uint8)

pnoto_save_path = os.path.join(self.output_path, 'p2c_photo.png')
cv2.imwrite(pnoto_save_path, cv2.cvtColor(face_image, cv2.COLOR_RGB2BGR))
cartoon_save_path = os.path.join(self.output_path, 'p2c_cartoon.png')
cv2.imwrite(cartoon_save_path, cv2.cvtColor(cartoon, cv2.COLOR_RGB2BGR))

print("Cartoon image has been saved at '{}'.".format(cartoon_save_path))
return cartoon
1 change: 1 addition & 0 deletions ppgan/faceutils/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,4 @@
from . import dlibutils as dlib
from . import mask
from . import image
from . import face_segmentation
1 change: 1 addition & 0 deletions ppgan/faceutils/dlibutils/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,4 @@
# limitations under the License.

from .dlib_utils import detect, crop, landmarks, crop_from_array
from .face_align import align_crop
Loading

0 comments on commit 792b38a

Please sign in to comment.