Skip to content
View hantek's full-sized avatar
  • Facebook; University de Montreal, Harbin Institute of Technology
  • Menlo Park, California, US

Organizations

@mila-iqia

Block or report hantek

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Retrieval and Retrieval-augmented LLMs

Python 11,302 834 Updated Dec 15, 2025

[TMLR'24] A CommonSense Reasoning Dataset pertaining to Physical Commonsense affordance of objects.

10 2 Updated Jun 26, 2024

LLM API 管理 & 分发系统,支持 OpenAI、Azure、Anthropic Claude、Google Gemini、DeepSeek、字节豆包、ChatGLM、文心一言、讯飞星火、通义千问、360 智脑、腾讯混元等主流模型,统一 API 适配,可用于 key 管理与二次分发。单可执行文件,提供 Docker 镜像,一键部署,开箱即用。LLM API management & k…

JavaScript 29,705 5,719 Updated Jan 9, 2026

Supercharge Your LLM Application Evaluations 🚀

Python 12,640 1,249 Updated Jan 31, 2026

视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.

Python 8,423 863 Updated Aug 21, 2025

A data set based on all arXiv publications, pre-processed for NLP, including structured full-text and citation network

Python 297 19 Updated Sep 28, 2024

Code and datasets for paper "K2: A Foundation Language Model for Geoscience Knowledge Understanding and Utilization" in WSDM-2024

Python 208 18 Updated Jun 5, 2024

A library for efficient similarity search and clustering of dense vectors.

C++ 39,125 4,242 Updated Feb 14, 2026

A Snake Game in c++ and Qt for cs1605 homework

C++ 2 1 Updated Jan 5, 2024

Repo for the paper "Large Language Models Struggle to Learn Long-Tail Knowledge"

Python 78 8 Updated Apr 12, 2023

A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。

JavaScript 44,090 5,394 Updated Feb 3, 2026

Code for the ProteinMPNN paper

Jupyter Notebook 1,624 437 Updated Aug 14, 2024

中文图书语料MD5链接

Python 217 23 Updated Jan 31, 2024

JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf

Python 24,547 2,065 Updated Jul 29, 2025

中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

Python 18,964 1,866 Updated Jul 15, 2025

A tool for extracting plain text from Wikipedia dumps

Python 3,969 1,006 Updated May 23, 2024

Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

Python 13,259 884 Updated Dec 17, 2024

BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)

HTML 8,280 768 Updated Oct 16, 2024

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

4,121 286 Updated Jan 31, 2026

GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)

Python 7,673 607 Updated Jul 25, 2023

Making large AI models cheaper, faster and more accessible

Python 41,346 4,533 Updated Jan 19, 2026

更新2008年版本的《上海交通大学生存手册》gitbook发布于https://survivesjtu.gitbook.io/survivesjtumanual/

5,373 573 Updated Dec 6, 2025

Collections of resources from Joint Laboratory of HIT and iFLYTEK Research (HFL)

Markdown 378 40 Updated Mar 9, 2023

The ultimate Vim configuration (vimrc)

Vim Script 31,691 7,300 Updated Oct 6, 2024

Collection of works from VIPL-AVSU

50 5 Updated Aug 2, 2025

Code for prefix beam search tutorial by @labodk

Python 187 36 Updated Dec 9, 2020

AMD ROCm™ Software - GitHub Home

Shell 6,172 517 Updated Feb 17, 2026

A comprehensive mapping database of English to Chinese technical vocabulary in the artificial intelligence domain

2,009 338 Updated Dec 30, 2022
Next