ww0987

ww0987

Stars

haibojin001 / AmazonCOREBench

Python 2 Updated Apr 8, 2026

Gan-Xing / CodexBridge

WeChat-to-Codex bridge for running Codex app-server from chat, with threads, slash commands, approvals, agents, automation, uploads, and assistant records.

TypeScript 282 43 Updated Jun 12, 2026

openai / evals

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

Python 18,721 2,990 Updated Apr 14, 2026

hugohe3 / ppt-master

AI generates a real, editable PowerPoint from any document — native shapes & animations, speaker notes voiced as audio narration, and the option to follow your own .pptx template, not slide images …

Python 29,149 2,548 Updated Jun 18, 2026

jina-ai / reader

Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/

TypeScript 11,298 835 Updated May 22, 2026

vibrantlabsai / ragas

Supercharge Your LLM Application Evaluations 🚀

Python 14,429 1,494 Updated Feb 24, 2026

EdinburghNLP / awesome-hallucination-detection

List of papers on hallucination detection in LLMs.

1,105 90 Updated Jun 6, 2026

Arize-ai / LibreEval

Jupyter Notebook 9 2 Updated Jan 8, 2026

yuh-zha / AlignScore

ACL2023 - AlignScore, a metric for factual consistency evaluation.

Python 164 30 Updated Mar 11, 2024

vectara / hallucination-leaderboard

Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents

Python 3,280 106 Updated May 11, 2026

mlflow / mlflow

The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while control…

Python 26,608 5,864 Updated Jun 18, 2026

krillinai / GEO

A comprehensive guide to Generative Engine Optimization (GEO) — optimizing content for AI-driven search engines like ChatGPT, Gemini, and Perplexity. #AEO

70 12 Updated Nov 15, 2025

rom1504 / img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

Python 4,423 375 Updated Oct 19, 2025

cvdfoundation / open-images-dataset

Open Images is a dataset of ~9 million images that have been annotated with image-level labels and bounding boxes spanning thousands of classes.

1,125 169 Updated May 4, 2020

apple / pico-banana-400k

Python 1,838 81 Updated Dec 16, 2025

jiahuigeng / VSCBench

Python 5 Updated Jun 9, 2025

oneal2000 / UniFact

Unified Automated Evaluation for Hallucination Detection and Fact Verification

Python 7 Updated Jan 26, 2026

nctu-eva-lab / VHD11K

Official implementation of T2Vs Meet VLMs: A Scalable Multimodal Dataset for Visual Harmfulness Recognition

Jupyter Notebook 20 2 Updated Oct 23, 2024

whitzard-ai / jade-db

"他山之石、可以攻玉"：复旦JADE团队发布的大模型测评与治理系列

Jupyter Notebook 514 35 Updated May 14, 2026

RUCAIBox / HaluEval

This is the repository of HaluEval, a large-scale hallucination evaluation benchmark for Large Language Models.

Python 592 45 Updated Feb 12, 2024

hkust-nlp / felm

Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)

Python 64 1 Updated Dec 25, 2023

ponhvoan / iris

Python 3 Updated Mar 19, 2026

hiyouga / LlamaFactory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 72,281 8,847 Updated Jun 17, 2026

karpathy / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 59,829 10,325 Updated Nov 12, 2025

wanglne / DELMAN

🔥[ACL 2025 Findings] DELMAN: Dynamic Defense Against Large Language Model Jailbreaking with Model Editing

Python 9 Updated May 12, 2026

AnswerDotAI / RAGatouille

Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.

Python 3,934 271 Updated May 17, 2025

amazon-science / RefChecker

RefChecker provides automatic checking pipeline and benchmark dataset for detecting fine-grained hallucinations generated by Large Language Models.

Python 433 44 Updated May 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly