Google Scholar

User profiles for Zhiwu Lu

Zhiwu Lu

Professor, Renmin University of China

Verified email at ruc.edu.cn

Cited by 6879

[HTML] sciencedirect.com

[HTML][HTML] Pre-trained models: Past, present and future

…, W Han, M Huang, Q Jin, Y Lan, Y Liu, Z Liu, Z Lu… - AI Open, 2021 - Elsevier

Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved
great success and become a milestone in the field of artificial intelligence (AI). Owing to …

Save Cite Cited by 1043 Related articles All 11 versions

[PDF] thecvf.com

Counterfactual vqa: A cause-effect look at language bias

Y Niu, K Tang, H Zhang, Z Lu… - Proceedings of the …, 2021 - openaccess.thecvf.com

Recent VQA models may tend to rely on language bias as a shortcut and thus fail to
sufficiently learn the multi-modal knowledge from both vision and language. In this paper, we …

Save Cite Cited by 498 Related articles All 8 versions View as HTML

[PDF] nature.com

Towards artificial general intelligence via a multimodal foundation model

N Fei, Z Lu, Y Gao, G Yang, Y Huo, J Wen, H Lu… - Nature …, 2022 - nature.com

The fundamental goal of artificial intelligence (AI) is to mimic the core cognitive activities of
human. Despite tremendous success in the AI research, most of existing methods have only …

Save Cite Cited by 295 Related articles All 16 versions

[PDF] thecvf.com

Learning depth-guided convolutions for monocular 3d object detection

…, Y Huo, H Yi, Z Wang, J Shi, Z Lu… - Proceedings of the …, 2020 - openaccess.thecvf.com

3D object detection from a single image without LiDAR is a challenging task due to the lack
of accurate depth information. Conventional 2D convolutions are unsuitable for this task …

Save Cite Cited by 384 Related articles All 13 versions View as HTML

[PDF] kaust.edu.sa

Learning from weak and noisy labels for semantic segmentation

Z Lu, Z Fu, T Xiang, P Han, L Wang… - IEEE transactions on …, 2016 - ieeexplore.ieee.org

A weakly supervised semantic segmentation (WSSS) method aims to learn a segmentation
model from weak (image-level) as opposed to strong (pixel-level) labels. By avoiding the …

Save Cite Cited by 143 Related articles All 12 versions

[PDF] thecvf.com

Z-score normalization, hubness, and few-shot learning

N Fei, Y Gao, Z Lu, T Xiang - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com

The goal of few-shot learning (FSL) is to recognize a set of novel classes with only few
labeled samples by exploiting a large set of abundant base class samples. Adopting a meta-…

Save Cite Cited by 149 Related articles All 5 versions View as HTML

[PDF] arxiv.org

WenLan: Bridging vision and language by large-scale multi-modal pre-training

Y Huo, M Zhang, G Liu, H Lu, Y Gao, G Yang… - arXiv preprint arXiv …, 2021 - arxiv.org

Multi-modal pre-training models have been intensively explored to bridge vision and
language in recent years. However, most of them explicitly model the cross-modal interaction …

Save Cite Cited by 150 Related articles All 3 versions View as HTML

[PDF] thecvf.com

Cots: Collaborative two-stream vision-language pre-training model for cross-modal retrieval

H Lu, N Fei, Y Huo, Y Gao, Z Lu… - Proceedings of the …, 2022 - openaccess.thecvf.com

Large-scale single-stream pre-training has shown dramatic performance in image-text retrieval.
Regrettably, it faces low inference efficiency due to heavy attention layers. Recently, two-…

Save Cite Cited by 82 Related articles All 6 versions View as HTML

[PDF] thecvf.com

Large-scale few-shot learning: Knowledge transfer with class hierarchy

A Li, T Luo, Z Lu, T Xiang… - Proceedings of the ieee …, 2019 - openaccess.thecvf.com

Recently, large-scale few-shot learning (FSL) becomes topical. It is discovered that, for a
large-scale FSL problem with 1,000 classes in the source domain, a strong baseline emerges, …

Save Cite Cited by 164 Related articles All 10 versions View as HTML

[PDF] arxiv.org

Vdt: General-purpose video diffusion transformers via mask modeling

H Lu, G Yang, N Fei, Y Huo, Z Lu, P Luo… - arXiv preprint arXiv …, 2023 - arxiv.org

This work introduces Video Diffusion Transformer (VDT), which pioneers the use of transformers
in diffusion-based video generation. It features transformer blocks with modularized …

Save Cite Cited by 62 Related articles All 3 versions View as HTML

Create alert

Cite

Advanced search

Saved to My library

User profiles for Zhiwu Lu

Zhiwu Lu

[HTML][HTML] Pre-trained models: Past, present and future

Counterfactual vqa: A cause-effect look at language bias

Towards artificial general intelligence via a multimodal foundation model

Learning depth-guided convolutions for monocular 3d object detection

Learning from weak and noisy labels for semantic segmentation

Z-score normalization, hubness, and few-shot learning

WenLan: Bridging vision and language by large-scale multi-modal pre-training

Cots: Collaborative two-stream vision-language pre-training model for cross-modal retrieval

Large-scale few-shot learning: Knowledge transfer with class hierarchy

Vdt: General-purpose video diffusion transformers via mask modeling