🔍 Visualize attention patterns in transformer models to better understand how LLMs process text inputs with interactive heatmaps and comparisons.
-
Updated
Nov 12, 2025 - Python
🔍 Visualize attention patterns in transformer models to better understand how LLMs process text inputs with interactive heatmaps and comparisons.
💥 Optimize linear attention models with efficient Triton-based implementations in PyTorch, compatible across NVIDIA, AMD, and Intel platforms.
🐙 Implements Flash Attention with sink for gpt-oss-20b; includes test.py. WIP backward pass, varlen support, and community sync to return softmax_lse only.
Advanced sparse modern Hopfield models delivering fast associative memory with energy-efficient inference.
Keras beit,caformer,CMT,CoAtNet,convnext,davit,dino,efficientdet,edgenext,efficientformer,efficientnet,eva,fasternet,fastervit,fastvit,flexivit,gcvit,ghostnet,gpvit,hornet,hiera,iformer,inceptionnext,lcnet,levit,maxvit,mobilevit,moganet,nat,nfnets,pvt,swin,tinynet,tinyvit,uniformer,volo,vanillanet,yolor,yolov7,yolov8,yolox,gpt2,llama2, alias kecam
Implementation of Danijar's latest iteration for his Dreamer line of work
🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
This repository contains the code to reproduce the experiments performed in the Dynamical Mean-Field Theory of Self-Attention Neural Networks article.
Train a GPT from scratch on your laptop
A code deep-dive on one of the key innovations from Deepseek - Multihead Latent Attention (MLA)
A complete implementation of the "Attention Is All You Need" Transformer model from scratch using PyTorch. This project focuses on building and training a Transformer for neural machine translation (English-to-Italian) on the OpusBooks dataset.
Scenic: A Jax Library for Computer Vision Research and Beyond
Linear-time sequence modeling that replaces attention's O(n²d) complexity with O(nd) summation-based aggregation. Demonstrates constraint-driven emergence: how functional representations can develop from optimization pressure and architectural constraints alone, without explicit pairwise interactions.
Vision Transformers for image classification, image segmentation, and object detection.
A simple and minimal open source implementation of "Introducing LFM2: The Fastest On-Device Foundation Models on the Market" from Liquid AI in Pytorch
Implementation of the ScreenAI model from the paper: "A Vision-Language Model for UI and Infographics Understanding"
We introduce VLM-Mamba, the first Vision-Language Model built entirely on State Space Models (SSMs), specifically leveraging the Mamba architecture.
Helps preserving your attention when watching long videos by skipping hummings, "hum"s. Using the model HUMAwareVAD2025
Add a description, image, and links to the attention topic page so that developers can more easily learn about it.
To associate your repository with the attention topic, visit your repo's landing page and select "manage topics."