User profiles for Xiuye Gu

Xiuye Gu

Google
Verified email at google.com
Cited by 3313

Open-vocabulary object detection via vision and language knowledge distillation

X Gu, TY Lin, W Kuo, Y Cui - arXiv preprint arXiv:2104.13921, 2021 - arxiv.org
We aim at advancing open-vocabulary object detection, which detects objects described by
arbitrary text inputs. The fundamental challenge is the availability of training data. It is costly …

Scaling open-vocabulary image segmentation with image-level labels

G Ghiasi, X Gu, Y Cui, TY Lin - European conference on computer vision, 2022 - Springer
We design an open-vocabulary image segmentation model to organize an image into meaningful
regions indicated by arbitrary texts. Recent works (CLIP and ALIGN), despite attaining …

Photorealistic video generation with diffusion models

A Gupta, L Yu, K Sohn, X Gu, M Hahn, FF Li… - … on Computer Vision, 2024 - Springer
We present WALT, a diffusion transformer for photorealistic video generation from text
prompts. Our approach has two key design decisions. First, we use a causal encoder to jointly …

Videopoet: A large language model for zero-shot video generation

D Kondratyuk, L Yu, X Gu, J Lezama, J Huang… - arXiv preprint arXiv …, 2023 - arxiv.org
We present VideoPoet, a language model capable of synthesizing high-quality video, with
matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder-…

F-vlm: Open-vocabulary object detection upon frozen vision and language models

W Kuo, Y Cui, X Gu, AJ Piergiovanni… - arXiv preprint arXiv …, 2022 - arxiv.org
We present F-VLM, a simple open-vocabulary object detection method built upon Frozen
Vision and Language Models. F-VLM simplifies the current multi-stage training pipeline by …

Dataseg: Taming a universal multi-dataset multi-task segmentation model

X Gu, Y Cui, J Huang, A Rashwan… - Advances in …, 2023 - proceedings.neurips.cc
Observing the close relationship among panoptic, semantic and instance segmentation
tasks, we propose to train a universal multi-dataset multi-task segmentation model: DaTaSeg. …

Hplflownet: Hierarchical permutohedral lattice flownet for scene flow estimation on large-scale point clouds

X Gu, Y Wang, C Wu, YJ Lee… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
We present a novel deep neural network architecture for end-to-end scene flow estimation
that directly operates on large-scale 3D point clouds. Inspired by Bilateral Convolutional …

Language Model Beats Diffusion--Tokenizer is Key to Visual Generation

…, D Minnen, Y Cheng, V Birodkar, A Gupta, X Gu… - arXiv preprint arXiv …, 2023 - arxiv.org
While Large Language Models (LLMs) are the dominant models for generative tasks in
language, they do not perform as well as diffusion models on image and video generation. To …

Pixel-aligned language model

J Xu, X Zhou, S Yan, X Gu, A Arnab… - Proceedings of the …, 2024 - openaccess.thecvf.com
Large language models have achieved great success in recent years so as their variants in
vision. Existing vision-language models can describe images in natural languages answer …

Password-conditioned anonymization and deanonymization with face identity transformers

X Gu, W Luo, MS Ryoo, YJ Lee - European conference on computer vision, 2020 - Springer
Cameras are prevalent in our daily lives, and enable many useful systems built upon computer
vision technologies such as smart cameras and home robots for service applications. …