Highlights
- Pro
Stars
使用deepspeed从头开始训练一个LLM,经过pretrain和sft阶段,验证llm学习知识、理解语言、回答问题的能力
《Build a Large Language Model (From Scratch)》是一本深入探讨大语言模型原理与实现的电子书,适合希望深入了解 GPT 等大模型架构、训练过程及应用开发的学习者。为了让更多中文读者能够接触到这本极具价值的教材,我决定将其翻译成中文,并通过 GitHub 进行开源共享。
最少使用 3090 即可训练自己的比特大脑(miniLLM)🧠(进行中). Train your own BitBrain(A mini LLM) with just an RTX 3090 minimum.
Awesome papers involving LLMs in Social Science.
Drawing Bayesian networks, graphical models, tensors, technical frameworks, and illustrations in LaTeX.
Chinese Political Hate Speech Detection Trained with Flair NLP
MS-Agent: Lightweight Framework for Empowering Agents with Autonomous Exploration in Complex Task Scenarios
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
Pytorch AMP / Activation Checkpoint / Gradient Accumulation
Network communication information intervention based on reinforcement learning
记录所参加的比赛,包括但不限于kaggle,阿里天池,科大讯飞等平台所提供的NLP方面的比赛。
[ICLR 2025 Oral] This is the official repo for the paper "LLM-SR" on Scientific Equation Discovery and Symbolic Regression with Large Language Models
Code for AAAI Workshop WMAC "Paper Simulating Rumor Spreading in Social Networks using LLM agents"
A Comprehensive Library for Memory of LLM-based Agents.
从无名小卒到大模型(LLM)大英雄~ 欢迎关注后续!!!
大模型算法岗面试题(含答案):常见问题和概念解析 "大模型面试题"、"算法岗面试"、"面试常见问题"、"大模型算法面试"、"大模型应用基础"
(ICLR'25) A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents