User profiles for Fuli Luo

Fuli Luo(罗福莉)

Peking University
Verified email at pku.edu.cn
Cited by 22669

Incorporating glosses into neural word sense disambiguation

F Luo, T Liu, Q Xia, B Chang, Z Sui - … of the 56th Annual Meeting of …, 2018 - aclanthology.org
Word Sense Disambiguation (WSD) aims to identify the correct meaning of polysemous words
in the particular context. Lexical resources like WordNet which are proved to be of great …

Deepseek-v2: A strong, economical, and efficient mixture-of-experts language model

…, D Guo, D Yang, D Chen, D Ji, E Li, F Lin, F Luo… - arXiv preprint arXiv …, 2024 - arxiv.org
We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized
by economical training and efficient inference. It comprises 236B total parameters, of which …

Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning

…, D Dai, D Chen, D Ji, E Li, F Lin, F Dai, F Luo… - arXiv preprint arXiv …, 2025 - arxiv.org
General reasoning represents a long-standing and formidable challenge in artificial
intelligence. Recent breakthroughs, exemplified by large language models (LLMs) and chain-of-…

DeepSeek-Coder: when the large language model meets programming--the rise of code intelligence

…, W Zhang, G Chen, X Bi, Y Wu, YK Li, F Luo… - arXiv preprint arXiv …, 2024 - arxiv.org
The rapid development of large language models has revolutionized code intelligence in
software development. However, the predominance of closed-source models has restricted …

Raise a child in large language model: Towards effective and generalizable fine-tuning

R Xu, F Luo, Z Zhang, C Tan, B Chang… - Proceedings of the …, 2021 - aclanthology.org
Recent pretrained language models extend from millions to billions of parameters. Thus the
need to fine-tune an extremely large pretrained model with a limited training corpus arises in …

[PDF][PDF] Deepseekmoe: Towards ultimate expert specialization in mixture-of-experts language models

…, X Yu, Y Wu, Z Xie, YK Li, P Huang, F Luo… - Proceedings of the …, 2024 - aclanthology.org
In the era of large language models, Mixture-of-Experts (MoE) is a promising architecture for
managing computational costs when scaling up model parameters. However, conventional …

Deepseek llm: Scaling open-source language models with longtermism

…, W Liu, X Liu, X Liu, Y Liu, H Lu, S Lu, F Luo… - arXiv preprint arXiv …, 2024 - arxiv.org
The rapid development of open-source large language models (LLMs) has been truly remarkable.
However, the scaling law described in previous literature presents varying conclusions…

DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning

…, C Ruan, D Dai, D Chen, D Ji, E Li, F Lin, F Dai, F Luo… - Nature, 2025 - nature.com
General reasoning represents a long-standing and formidable challenge in artificial intelligence
(AI). Recent breakthroughs, exemplified by large language models (LLMs) 1 , 2 and …

Deepseek-v3 technical report

…, D Yang, D Chen, D Ji, E Li, F Lin, F Dai, F Luo… - arXiv preprint arXiv …, 2024 - arxiv.org
We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B
total parameters with 37B activated for each token. To achieve efficient inference and cost-…

Deepseek-coder-v2: Breaking the barrier of closed-source models in code intelligence

…, Y Wang, C Deng, J Li, C Zhao, C Ruan, F Luo… - arXiv preprint arXiv …, 2024 - arxiv.org
We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language
model that achieves performance comparable to GPT4-Turbo in code-specific tasks. …