add styleclip (PaddlePaddle#643)

* add styleclip * update 2022 * add weight url * update doc & img url
wangna11BD · Sep 1, 2022 · d1225d0 · d1225d0
1 parent 0541ace
commit d1225d0
Show file tree

Hide file tree

Showing 7 changed files with 1,061 additions and 61 deletions.
diff --git a/applications/tools/styleganv2clip.py b/applications/tools/styleganv2clip.py
@@ -0,0 +1,99 @@
+#   Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserve.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+
+import paddle
+from ppgan.apps import StyleGANv2ClipPredictor
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--latent",
+                        type=str,
+                        help="path to first image latent codes")
+
+    parser.add_argument("--neutral", type=str, help="neutral description")
+    parser.add_argument("--target", type=str, help="neutral description")
+    parser.add_argument("--beta_threshold",
+                        type=float,
+                        default=0.12,
+                        help="beta threshold for channel editing")
+
+    parser.add_argument("--direction_offset",
+                        type=float,
+                        default=5.0,
+                        help="offset value of edited attribute")
+
+    parser.add_argument("--direction_path",
+                        type=str,
+                        default=None,
+                        help="path to latent editing directions")
+
+    parser.add_argument("--output_path",
+                        type=str,
+                        default='output_dir',
+                        help="path to output image dir")
+
+    parser.add_argument("--weight_path",
+                        type=str,
+                        default=None,
+                        help="path to model checkpoint path")
+
+    parser.add_argument("--model_type",
+                        type=str,
+                        default=None,
+                        help="type of model for loading pretrained model")
+
+    parser.add_argument("--size",
+                        type=int,
+                        default=1024,
+                        help="resolution of output image")
+
+    parser.add_argument("--style_dim",
+                        type=int,
+                        default=512,
+                        help="number of style dimension")
+
+    parser.add_argument("--n_mlp",
+                        type=int,
+                        default=8,
+                        help="number of mlp layer depth")
+
+    parser.add_argument("--channel_multiplier",
+                        type=int,
+                        default=2,
+                        help="number of channel multiplier")
+
+    parser.add_argument("--cpu",
+                        dest="cpu",
+                        action="store_true",
+                        help="cpu mode.")
+
+    args = parser.parse_args()
+
+    if args.cpu:
+        paddle.set_device('cpu')
+
+    predictor = StyleGANv2ClipPredictor(
+        output_path=args.output_path,
+        weight_path=args.weight_path,
+        model_type=args.model_type,
+        seed=None,
+        size=args.size,
+        style_dim=args.style_dim,
+        n_mlp=args.n_mlp,
+        channel_multiplier=args.channel_multiplier,
+        direction_path=args.direction_path)
+    predictor.run(args.latent, args.neutral, args.target, args.direction_offset,
+                  args.beta_threshold)
diff --git a/docs/en_US/tutorials/styleganv2clip.md b/docs/en_US/tutorials/styleganv2clip.md
@@ -0,0 +1,144 @@
+# StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery
+
+## Introduction
+
+The task of StyleGAN V2 is image generation while the Clip guided Editing module uses the attribute manipulation vector obtained by CLIP (Contrastive Language-Image Pre-training) Model for mapping text prompts to input-agnostic directions in StyleGAN’s style space, enabling interactive text-driven image manipulation.
+
+
+This model uses pretrained StyleGAN V2 generator and uses Pixel2Style2Pixel model for image encoding. At present, only the models of portrait editing (trained on FFHQ dataset) is available.
+
+Paddle-CLIP and dlib package is needed for this module.
+
+```
+pip install -e .
+pip install paddleclip
+pip install dlib-bin
+```
+
+## How to use
+
+### Editing
+
+```
+cd applications/
+python -u tools/styleganv2clip.py \
+       --latent <<PATH TO STYLE VECTOR> \
+       --output_path <DIRECTORY TO STORE OUTPUT IMAGE> \
+       --weight_path <YOUR PRETRAINED MODEL PATH> \
+       --model_type ffhq-config-f \
+       --size 1024 \
+       --style_dim 512 \
+       --n_mlp 8 \
+       --channel_multiplier 2 \
+       --direction_path <PATH TO STORE ATTRIBUTE DIRECTIONS> \
+       --neutral <DESCRIPTION OF THE SOURCE IMAGE> \
+       --target <DESCRIPTION OF THE TARGET IMAGE> \
+       --beta_threshold 0.12 \
+       --direction_offset 5
+       --cpu
+```
+
+**params:**
+- latent: The path of the style vector which represents an image. Come from `dst.npy` generated by Pixel2Style2Pixel or `dst.fitting.npy` generated by StyleGANv2 Fitting module
+- output_path: the directory where the generated images are stored
+- weight_path: pretrained StyleGANv2 model path
+- model_type: inner model type, currently only `ffhq-config-f` is available.
+- direction_path: The path of CLIP mapping vector
+- stat_path: The path of latent statisitc file
+- neutral: Description of the source image，for example: face
+- target: Description of the target image，for example: young face
+- beta_threshold:  editing threshold of the attribute channels
+- direction_offset: Offset strength of the attribute
+- cpu: whether to use cpu inference, if not, please remove it from the command
+
+>inherited params for the pretrained StyleGAN model
+- size: model parameters, output image resolution
+- style_dim: model parameters, dimensions of style z
+- n_mlp: model parameters, the number of multi-layer perception layers for style z
+- channel_multiplier: model parameters, channel product, affect model size and the quality of generated pictures
+
+### Results
+
+Input portrait:
+<div align="center">
+    <img src="../../imgs/stylegan2fitting-sample.png" width="300"/>
+</div>
+
+with
+> direction_offset = [ -1, 0, 1, 2, 3, 4, 5]
+> beta_threshold = 0.1
+
+edit from 'face' to 'boy face':
+
+![stylegan2clip-sample-boy](https://user-images.githubusercontent.com/29187613/187344690-6709fba5-6e21-4bc0-83d1-5996947c99a4.png)
+
+
+edit from 'face' to 'happy face':
+
+![stylegan2clip-sample-happy](https://user-images.githubusercontent.com/29187613/187344681-6509f01b-0d9e-4dea-8a97-ee9ca75d152e.png)
+
+
+edit from 'face' to 'angry face':
+
+![stylegan2clip-sample-angry](https://user-images.githubusercontent.com/29187613/187344686-ff5047ab-5499-420d-ad02-e0908ac71bf7.png)
+
+edit from 'face' to 'face with long hair':
+
+![stylegan2clip-sample-long-hair](https://user-images.githubusercontent.com/29187613/187344684-4e452631-52b0-47cf-966e-3216c0392815.png)
+
+
+
+edit from 'face' to 'face with curly hair':
+
+![stylegan2clip-sample-curl-hair](https://user-images.githubusercontent.com/29187613/187344677-c9a3aa9f-1f3c-41b3-a1f0-fcd48a9c627b.png)
+
+
+edit from 'head with black hair' to 'head with gold hair':
+
+![stylegan2clip-sample-gold-hair](https://user-images.githubusercontent.com/29187613/187344678-5220e8b2-b1c9-4f2f-8655-621b6272c457.png)
+
+## Make Attribute Direction Vector
+
+For details, please refer to [Puzer/stylegan-encoder](https://github.com/Puzer/stylegan-encoder/blob/master/Learn_direction_in_latent_space.ipynb)
+
+Currently pretrained weight for `stylegan2` & `ffhq-config-f` dataset is provided：
+
+direction: https://paddlegan.bj.bcebos.com/models/stylegan2-ffhq-config-f-styleclip-global-directions.pdparams
+
+stats: https://paddlegan.bj.bcebos.com/models/stylegan2-ffhq-config-f-styleclip-stats.pdparams
+
+## Training
+
+1. extract style latent vector stats
+```
+python styleclip_getf.py
+```
+2. calcuate mapping vector using CLIP model
+
+```
+python ppgan/apps/styleganv2clip_predictor.py extract
+```
+
+# Reference
+
+- 1. [StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery](https://arxiv.org/abs/2103.17249)
+
+  ```
+  @article{Patashnik2021StyleCLIPTM,
+    title={StyleCLIP: Text-Driven Manipulation of StyleGAN Imagery},
+    author={Or Patashnik and Zongze Wu and Eli Shechtman and Daniel Cohen-Or and D. Lischinski},
+    journal={2021 IEEE/CVF International Conference on Computer Vision (ICCV)},
+    year={2021},
+    pages={2065-2074}
+  }
+  ```
+- 2. [Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation](hhttps://arxiv.org/abs/2008.00951)
+
+  ```
+  @article{richardson2020encoding,
+    title={Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation},
+    author={Richardson, Elad and Alaluf, Yuval and Patashnik, Or and Nitzan, Yotam and Azar, Yaniv and Shapiro, Stav and Cohen-Or, Daniel},
+    journal={arXiv preprint arXiv:2008.00951},
+    year={2020}
+  }
+  ```