User profiles for Zhiwu Lu
Zhiwu LuProfessor, Renmin University of China Verified email at ruc.edu.cn Cited by 6879 |
[HTML][HTML] Pre-trained models: Past, present and future
Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved
great success and become a milestone in the field of artificial intelligence (AI). Owing to …
great success and become a milestone in the field of artificial intelligence (AI). Owing to …
Counterfactual vqa: A cause-effect look at language bias
Recent VQA models may tend to rely on language bias as a shortcut and thus fail to
sufficiently learn the multi-modal knowledge from both vision and language. In this paper, we …
sufficiently learn the multi-modal knowledge from both vision and language. In this paper, we …
Towards artificial general intelligence via a multimodal foundation model
The fundamental goal of artificial intelligence (AI) is to mimic the core cognitive activities of
human. Despite tremendous success in the AI research, most of existing methods have only …
human. Despite tremendous success in the AI research, most of existing methods have only …
Learning depth-guided convolutions for monocular 3d object detection
3D object detection from a single image without LiDAR is a challenging task due to the lack
of accurate depth information. Conventional 2D convolutions are unsuitable for this task …
of accurate depth information. Conventional 2D convolutions are unsuitable for this task …
Learning from weak and noisy labels for semantic segmentation
A weakly supervised semantic segmentation (WSSS) method aims to learn a segmentation
model from weak (image-level) as opposed to strong (pixel-level) labels. By avoiding the …
model from weak (image-level) as opposed to strong (pixel-level) labels. By avoiding the …
Z-score normalization, hubness, and few-shot learning
The goal of few-shot learning (FSL) is to recognize a set of novel classes with only few
labeled samples by exploiting a large set of abundant base class samples. Adopting a meta-…
labeled samples by exploiting a large set of abundant base class samples. Adopting a meta-…
WenLan: Bridging vision and language by large-scale multi-modal pre-training
Multi-modal pre-training models have been intensively explored to bridge vision and
language in recent years. However, most of them explicitly model the cross-modal interaction …
language in recent years. However, most of them explicitly model the cross-modal interaction …
Cots: Collaborative two-stream vision-language pre-training model for cross-modal retrieval
Large-scale single-stream pre-training has shown dramatic performance in image-text retrieval.
Regrettably, it faces low inference efficiency due to heavy attention layers. Recently, two-…
Regrettably, it faces low inference efficiency due to heavy attention layers. Recently, two-…
Large-scale few-shot learning: Knowledge transfer with class hierarchy
Recently, large-scale few-shot learning (FSL) becomes topical. It is discovered that, for a
large-scale FSL problem with 1,000 classes in the source domain, a strong baseline emerges, …
large-scale FSL problem with 1,000 classes in the source domain, a strong baseline emerges, …
Vdt: General-purpose video diffusion transformers via mask modeling
This work introduces Video Diffusion Transformer (VDT), which pioneers the use of transformers
in diffusion-based video generation. It features transformer blocks with modularized …
in diffusion-based video generation. It features transformer blocks with modularized …