Skip to content
View mirianfsilva's full-sized avatar
:octocat:
:octocat:

Organizations

@blackinai @mini-configs @gsgcommunity @brazilinai @equitable-ai-research @equity-ai-hub

Block or report mirianfsilva

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

A booklet on machine learning systems design with exercises. NOT the repo for the book "Designing Machine Learning Systems", which is `dmls-book`

HTML 10,431 1,616 Updated Apr 15, 2023

https://huyenchip.com/ml-interviews-book/

HTML 4,646 673 Updated Mar 21, 2025

A short 6-step curriculum I built to teach myself & others the basics of mech interp

Jupyter Notebook 7 Updated May 21, 2026

Streamlit web app for the Inspect Evals dashboard

Python 7 1 Updated Apr 17, 2026

🪨 why use many token when few token do trick — Claude Code skill that cuts 65% of tokens by talking like caveman

JavaScript 73,215 4,130 Updated Jun 12, 2026

Holistic Evaluation of Language Models (HELM) is an open source Python framework created by the Center for Research on Foundation Models (CRFM) at Stanford for holistic, reproducible and transparen…

Python 2,826 398 Updated Jun 5, 2026

An extremely fast Python package and project manager, written in Rust.

Rust 86,437 3,205 Updated Jun 16, 2026

ControlArena is a collection of settings, model organisms and protocols - for running control experiments.

Python 201 121 Updated Jun 9, 2026

Moral Operational Reasoning Assessment for Language Systems

Python 20 3 Updated Apr 8, 2026

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

Jupyter Notebook 980 142 Updated Aug 16, 2024

open Source code for propensity evaluation

Python 18 3 Updated Apr 25, 2026

WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning method which reduces LLM performance on WMDP while retaining …

Jupyter Notebook 175 46 Updated May 29, 2025

This is the repository of HaluEval, a large-scale hallucination evaluation benchmark for Large Language Models.

Python 591 45 Updated Feb 12, 2024

AssetOpsBench - Industry 4.0: A unified benchmark and framework for building, orchestrating, and evaluating domain-specific AI agents for Industry 4.0 asset operations and maintenance, with 460+ sc…

Python 1,809 269 Updated Jun 16, 2026
Python 141 21 Updated Jul 7, 2025

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

Python 3,499 262 Updated Feb 8, 2026

Hugo Noir is a clean, minimalistic theme for Hugo with a focus on readability and simplicity.

HTML 97 37 Updated Feb 3, 2026

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

Python 18,701 2,990 Updated Apr 14, 2026

Cheatsheets for AI and Machine Learning

661 220 Updated Jul 4, 2025

Fluid Language Model Benchmarking

Python 30 4 Updated Sep 16, 2025
Jupyter Notebook 1,133 729 Updated Jun 16, 2026

All Stanford Cheatsheets: Artificial Intelligence, Transformers, LLMs, Deep Learning, Machine Learning, Probabilities, Statistics, Algebra and Calculus.

916 183 Updated Jan 6, 2026

GraFiTe is a platform to track and manage domain-specific model issues for continuous LLM evaluation.

TypeScript 10 Updated Jun 16, 2026
JavaScript 4,226 1,919 Updated Jun 21, 2024

Get your documents ready for gen AI

Python 61,659 4,308 Updated Jun 16, 2026

Mellea is a library for writing generative programs.

Python 469 127 Updated Jun 15, 2026

Supercharge Your LLM Application Evaluations 🚀

Python 14,384 1,482 Updated Feb 24, 2026

🪢 Open source AI engineering platform: LLM evals, observability, metrics, prompt management, playground, datasets. Integrates with OpenTelemetry, LangChain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

TypeScript 29,164 3,024 Updated Jun 16, 2026

Dark Flavored - Academic Project Website Template

JavaScript 17 2 Updated Sep 30, 2024

⚠️ Deprecated: This library's functionality has been rolled into Mellea (https://github.com/generative-computing/mellea)

Python 57 30 Updated May 18, 2026
Next