Skip to content
View hantek's full-sized avatar
  • Facebook; University de Montreal, Harbin Institute of Technology
  • Menlo Park, California, US

Organizations

@mila-iqia

Block or report hantek

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Retrieval and Retrieval-augmented LLMs

Python 11,839 890 Updated Apr 22, 2026

[TMLR'24] A CommonSense Reasoning Dataset pertaining to Physical Commonsense affordance of objects.

10 1 Updated Jun 26, 2024

LLM API 管理 & 分发系统,支持 OpenAI、Azure、Anthropic Claude、Google Gemini、DeepSeek、字节豆包、ChatGLM、文心一言、讯飞星火、通义千问、360 智脑、腾讯混元等主流模型,统一 API 适配,可用于 key 管理与二次分发。单可执行文件,提供 Docker 镜像,一键部署,开箱即用。LLM API management & k…

JavaScript 35,055 6,646 Updated Jan 9, 2026

Supercharge Your LLM Application Evaluations 🚀

Python 14,413 1,490 Updated Feb 24, 2026

视频硬字幕提取,生成srt文件。无需申请第三方API,本地实现文本识别。基于深度学习的视频字幕提取框架,包含字幕区域检测、字幕内容提取。A GUI tool for extracting hard-coded subtitle (hardsub) from videos and generating srt files.

Python 9,016 914 Updated Apr 9, 2026

A data set based on all arXiv publications, pre-processed for NLP, including structured full-text and citation network

Python 303 19 Updated Sep 28, 2024

Code and datasets for paper "K2: A Foundation Language Model for Geoscience Knowledge Understanding and Utilization" in WSDM-2024

Python 212 20 Updated Jun 5, 2024

A library for efficient similarity search and clustering of dense vectors.

C++ 40,321 4,424 Updated Jun 17, 2026

A Snake Game in c++ and Qt for cs1605 homework

C++ 2 1 Updated Jan 5, 2024

Repo for the paper "Large Language Models Struggle to Learn Long-Tail Knowledge"

Python 77 8 Updated Apr 12, 2023

A visual no-code/code-free web crawler/spider易采集:一个可视化浏览器自动化测试/数据采集/网页爬虫软件,可以无代码图形化的设计和执行爬虫任务。别名:ServiceWrapper面向Web应用的智能化服务封装系统。

JavaScript 44,109 5,380 Updated May 22, 2026

Code for the ProteinMPNN paper

Jupyter Notebook 1,767 476 Updated Aug 14, 2024

中文图书语料MD5链接

Python 217 21 Updated Jan 31, 2024

JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf

Python 24,861 2,147 Updated Jul 29, 2025

中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

Python 18,941 1,852 Updated Apr 19, 2026

A tool for extracting plain text from Wikipedia dumps

Python 3,989 1,005 Updated May 23, 2024

Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

Python 13,602 910 Updated Dec 17, 2024

BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)

HTML 8,273 761 Updated Oct 16, 2024

MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。

4,212 288 Updated May 23, 2026

GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)

Python 7,650 603 Updated Jul 25, 2023

Making large AI models cheaper, faster and more accessible

Python 41,396 4,510 Updated May 25, 2026

更新2008年版本的《上海交通大学生存手册》gitbook发布于https://survivesjtu.gitbook.io/survivesjtumanual/

5,674 585 Updated Apr 21, 2026

Collections of resources from Joint Laboratory of HIT and iFLYTEK Research (HFL)

Markdown 379 40 Updated Mar 9, 2023

The ultimate Vim configuration (vimrc)

Vim Script 31,773 7,258 Updated Oct 6, 2024

Collection of works from VIPL-AVSU

50 5 Updated May 27, 2026

Code for prefix beam search tutorial by @labodk

Python 186 36 Updated Dec 9, 2020

AMD ROCm™ Software - GitHub Home

Shell 6,625 570 Updated Jun 17, 2026

A comprehensive mapping database of English to Chinese technical vocabulary in the artificial intelligence domain

2,027 340 Updated Dec 30, 2022
Next