User profiles for Zhiwu Lu

Zhiwu Lu

Professor, Renmin University of China
Verified email at ruc.edu.cn
Cited by 6879

[HTML][HTML] Pre-trained models: Past, present and future

…, W Han, M Huang, Q Jin, Y Lan, Y Liu, Z Liu, Z Lu… - AI Open, 2021 - Elsevier
Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved
great success and become a milestone in the field of artificial intelligence (AI). Owing to …

Counterfactual vqa: A cause-effect look at language bias

Y Niu, K Tang, H Zhang, Z Lu… - Proceedings of the …, 2021 - openaccess.thecvf.com
Recent VQA models may tend to rely on language bias as a shortcut and thus fail to
sufficiently learn the multi-modal knowledge from both vision and language. In this paper, we …

Towards artificial general intelligence via a multimodal foundation model

N Fei, Z Lu, Y Gao, G Yang, Y Huo, J Wen, H Lu… - Nature …, 2022 - nature.com
The fundamental goal of artificial intelligence (AI) is to mimic the core cognitive activities of
human. Despite tremendous success in the AI research, most of existing methods have only …

Learning depth-guided convolutions for monocular 3d object detection

…, Y Huo, H Yi, Z Wang, J Shi, Z Lu… - Proceedings of the …, 2020 - openaccess.thecvf.com
3D object detection from a single image without LiDAR is a challenging task due to the lack
of accurate depth information. Conventional 2D convolutions are unsuitable for this task …

Learning from weak and noisy labels for semantic segmentation

Z Lu, Z Fu, T Xiang, P Han, L Wang… - IEEE transactions on …, 2016 - ieeexplore.ieee.org
A weakly supervised semantic segmentation (WSSS) method aims to learn a segmentation
model from weak (image-level) as opposed to strong (pixel-level) labels. By avoiding the …

Z-score normalization, hubness, and few-shot learning

N Fei, Y Gao, Z Lu, T Xiang - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
The goal of few-shot learning (FSL) is to recognize a set of novel classes with only few
labeled samples by exploiting a large set of abundant base class samples. Adopting a meta-…

WenLan: Bridging vision and language by large-scale multi-modal pre-training

Y Huo, M Zhang, G Liu, H Lu, Y Gao, G Yang… - arXiv preprint arXiv …, 2021 - arxiv.org
Multi-modal pre-training models have been intensively explored to bridge vision and
language in recent years. However, most of them explicitly model the cross-modal interaction …

Cots: Collaborative two-stream vision-language pre-training model for cross-modal retrieval

H Lu, N Fei, Y Huo, Y Gao, Z Lu… - Proceedings of the …, 2022 - openaccess.thecvf.com
Large-scale single-stream pre-training has shown dramatic performance in image-text retrieval.
Regrettably, it faces low inference efficiency due to heavy attention layers. Recently, two-…

Large-scale few-shot learning: Knowledge transfer with class hierarchy

A Li, T Luo, Z Lu, T Xiang… - Proceedings of the ieee …, 2019 - openaccess.thecvf.com
Recently, large-scale few-shot learning (FSL) becomes topical. It is discovered that, for a
large-scale FSL problem with 1,000 classes in the source domain, a strong baseline emerges, …

Vdt: General-purpose video diffusion transformers via mask modeling

H Lu, G Yang, N Fei, Y Huo, Z Lu, P Luo… - arXiv preprint arXiv …, 2023 - arxiv.org
This work introduces Video Diffusion Transformer (VDT), which pioneers the use of transformers
in diffusion-based video generation. It features transformer blocks with modularized …