mirianfsilva

Mírian Silva mirianfsilva

MSc in AI/ML Fairness, ethics @ufmg. Computational Mathematician. Former AI Engineer @IBM Research | Member of @blackinai

266 followers · 74 following

Achievements

x2 x2

Achievements

x2 x2

Highlights

Developer Program Member
Pro

Organizations

Lists (7)

Sort

Starred repositories

chiphuyen / machine-learning-systems-design

A booklet on machine learning systems design with exercises. NOT the repo for the book "Designing Machine Learning Systems", which is `dmls-book`

HTML 10,431 1,616 Updated Apr 15, 2023

chiphuyen / ml-interviews-book

https://huyenchip.com/ml-interviews-book/

HTML 4,646 673 Updated Mar 21, 2025

catmcgee / learn-mechanistic-interpretability

A short 6-step curriculum I built to teach myself & others the basics of mech interp

Jupyter Notebook 7 Updated May 21, 2026

ArcadiaImpact / inspect_evals_dashboard

Streamlit web app for the Inspect Evals dashboard

Python 7 1 Updated Apr 17, 2026

JuliusBrussee / caveman

🪨 why use many token when few token do trick — Claude Code skill that cuts 65% of tokens by talking like caveman

JavaScript 73,215 4,130 Updated Jun 12, 2026

stanford-crfm / helm

Holistic Evaluation of Language Models (HELM) is an open source Python framework created by the Center for Research on Foundation Models (CRFM) at Stanford for holistic, reproducible and transparen…

Python 2,826 398 Updated Jun 5, 2026

astral-sh / uv

An extremely fast Python package and project manager, written in Rust.

Rust 86,437 3,205 Updated Jun 16, 2026

UKGovernmentBEIS / control-arena

ControlArena is a collection of settings, model organisms and protocols - for running control experiments.

Python 201 121 Updated Jun 9, 2026

The-Responsible-AI-Initiative / LLM_Ethics_Benchmark

Moral Operational Reasoning Assessment for Language Systems

Python 20 3 Updated Apr 8, 2026

centerforaisafety / HarmBench

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

Jupyter Notebook 980 142 Updated Aug 16, 2024

scaleapi / propensity-evaluation

open Source code for propensity evaluation

Python 18 3 Updated Apr 25, 2026

centerforaisafety / wmdp

WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning method which reduces LLM performance on WMDP while retaining …

Jupyter Notebook 175 46 Updated May 29, 2025

RUCAIBox / HaluEval

This is the repository of HaluEval, a large-scale hallucination evaluation benchmark for Large Language Models.

Python 591 45 Updated Feb 12, 2024

IBM / AssetOpsBench

AssetOpsBench - Industry 4.0: A unified benchmark and framework for building, orchestrating, and evaluating domain-specific AI agents for Industry 4.0 asset operations and maintenance, with 460+ sc…

Python 1,809 269 Updated Jun 16, 2026

dsbowen / strong_reject

Python 141 21 Updated Jul 7, 2025

THUDM / AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

Python 3,499 262 Updated Feb 8, 2026

prxshetty / hugo-noir

Hugo Noir is a clean, minimalistic theme for Hugo with a focus on readability and simplicity.

HTML 97 37 Updated Feb 3, 2026

openai / evals

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

Python 18,701 2,990 Updated Apr 14, 2026

SamBelkacem / AI-ML-cheatsheets

Cheatsheets for AI and Machine Learning

661 220 Updated Jul 4, 2025

allenai / fluid-benchmarking

Fluid Language Model Benchmarking

Python 30 4 Updated Sep 16, 2025

callummcdougall / ARENA_3.0

Jupyter Notebook 1,133 729 Updated Jun 16, 2026

analyticalrohit / AI-ML-Cheatsheets

All Stanford Cheatsheets: Artificial Intelligence, Transformers, LLMs, Deep Learning, Machine Learning, Probabilities, Statistics, Algebra and Calculus.

916 183 Updated Jan 6, 2026

IBM / grafite

GraFiTe is a platform to track and manage domain-specific model issues for continuous LLM evaluation.

TypeScript 10 Updated Jun 16, 2026

nerfies / nerfies.github.io

JavaScript 4,226 1,919 Updated Jun 21, 2024

docling-project / docling

Get your documents ready for gen AI

Python 61,659 4,308 Updated Jun 16, 2026

generative-computing / mellea

Mellea is a library for writing generative programs.

Python 469 127 Updated Jun 15, 2026

vibrantlabsai / ragas

Supercharge Your LLM Application Evaluations 🚀

Python 14,384 1,482 Updated Feb 24, 2026

langfuse / langfuse

🪢 Open source AI engineering platform: LLM evals, observability, metrics, prompt management, playground, datasets. Integrates with OpenTelemetry, LangChain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

TypeScript 29,164 3,024 Updated Jun 16, 2026