Skip to content
View ww0987's full-sized avatar

Block or report ww0987

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 2 Updated Apr 8, 2026

WeChat-to-Codex bridge for running Codex app-server from chat, with threads, slash commands, approvals, agents, automation, uploads, and assistant records.

TypeScript 282 43 Updated Jun 12, 2026

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

Python 18,721 2,990 Updated Apr 14, 2026

AI generates a real, editable PowerPoint from any document — native shapes & animations, speaker notes voiced as audio narration, and the option to follow your own .pptx template, not slide images …

Python 29,149 2,548 Updated Jun 18, 2026

Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/

TypeScript 11,298 835 Updated May 22, 2026

Supercharge Your LLM Application Evaluations 🚀

Python 14,429 1,494 Updated Feb 24, 2026

List of papers on hallucination detection in LLMs.

1,105 90 Updated Jun 6, 2026
Jupyter Notebook 9 2 Updated Jan 8, 2026

ACL2023 - AlignScore, a metric for factual consistency evaluation.

Python 164 30 Updated Mar 11, 2024

Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents

Python 3,280 106 Updated May 11, 2026

The open source AI engineering platform for agents, LLMs, and ML models. MLflow enables teams of all sizes to debug, evaluate, monitor, and optimize production-quality AI applications while control…

Python 26,608 5,864 Updated Jun 18, 2026

A comprehensive guide to Generative Engine Optimization (GEO) — optimizing content for AI-driven search engines like ChatGPT, Gemini, and Perplexity. #AEO

70 12 Updated Nov 15, 2025

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

Python 4,423 375 Updated Oct 19, 2025

Open Images is a dataset of ~9 million images that have been annotated with image-level labels and bounding boxes spanning thousands of classes.

1,125 169 Updated May 4, 2020
Python 1,838 81 Updated Dec 16, 2025
Python 5 Updated Jun 9, 2025

Unified Automated Evaluation for Hallucination Detection and Fact Verification

Python 7 Updated Jan 26, 2026

Official implementation of T2Vs Meet VLMs: A Scalable Multimodal Dataset for Visual Harmfulness Recognition

Jupyter Notebook 20 2 Updated Oct 23, 2024

"他山之石、可以攻玉":复旦JADE团队发布的大模型测评与治理系列

Jupyter Notebook 514 35 Updated May 14, 2026

This is the repository of HaluEval, a large-scale hallucination evaluation benchmark for Large Language Models.

Python 592 45 Updated Feb 12, 2024

Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)

Python 64 1 Updated Dec 25, 2023
Python 3 Updated Mar 19, 2026

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 72,281 8,847 Updated Jun 17, 2026

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 59,829 10,325 Updated Nov 12, 2025

🔥[ACL 2025 Findings] DELMAN: Dynamic Defense Against Large Language Model Jailbreaking with Model Editing

Python 9 Updated May 12, 2026

Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-of-use, backed by research.

Python 3,934 271 Updated May 17, 2025

RefChecker provides automatic checking pipeline and benchmark dataset for detecting fine-grained hallucinations generated by Large Language Models.

Python 433 44 Updated May 16, 2025