User profiles for Fuli Luo
Fuli Luo(罗福莉)Peking University Verified email at pku.edu.cn Cited by 22669 |
Incorporating glosses into neural word sense disambiguation
Word Sense Disambiguation (WSD) aims to identify the correct meaning of polysemous words
in the particular context. Lexical resources like WordNet which are proved to be of great …
in the particular context. Lexical resources like WordNet which are proved to be of great …
Deepseek-v2: A strong, economical, and efficient mixture-of-experts language model
We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized
by economical training and efficient inference. It comprises 236B total parameters, of which …
by economical training and efficient inference. It comprises 236B total parameters, of which …
Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning
General reasoning represents a long-standing and formidable challenge in artificial
intelligence. Recent breakthroughs, exemplified by large language models (LLMs) and chain-of-…
intelligence. Recent breakthroughs, exemplified by large language models (LLMs) and chain-of-…
DeepSeek-Coder: when the large language model meets programming--the rise of code intelligence
The rapid development of large language models has revolutionized code intelligence in
software development. However, the predominance of closed-source models has restricted …
software development. However, the predominance of closed-source models has restricted …
Raise a child in large language model: Towards effective and generalizable fine-tuning
Recent pretrained language models extend from millions to billions of parameters. Thus the
need to fine-tune an extremely large pretrained model with a limited training corpus arises in …
need to fine-tune an extremely large pretrained model with a limited training corpus arises in …
[PDF][PDF] Deepseekmoe: Towards ultimate expert specialization in mixture-of-experts language models
In the era of large language models, Mixture-of-Experts (MoE) is a promising architecture for
managing computational costs when scaling up model parameters. However, conventional …
managing computational costs when scaling up model parameters. However, conventional …
Deepseek llm: Scaling open-source language models with longtermism
The rapid development of open-source large language models (LLMs) has been truly remarkable.
However, the scaling law described in previous literature presents varying conclusions…
However, the scaling law described in previous literature presents varying conclusions…
DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning
General reasoning represents a long-standing and formidable challenge in artificial intelligence
(AI). Recent breakthroughs, exemplified by large language models (LLMs) 1 , 2 and …
(AI). Recent breakthroughs, exemplified by large language models (LLMs) 1 , 2 and …
Deepseek-v3 technical report
We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B
total parameters with 37B activated for each token. To achieve efficient inference and cost-…
total parameters with 37B activated for each token. To achieve efficient inference and cost-…
Deepseek-coder-v2: Breaking the barrier of closed-source models in code intelligence
We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language
model that achieves performance comparable to GPT4-Turbo in code-specific tasks. …
model that achieves performance comparable to GPT4-Turbo in code-specific tasks. …