-
Tehran Univeristy
- Tehran, Iran
- https://arhosseini77.github.io/
- in/arh77
- arhosseini_77
- https://t.me/arhosseini_77
Lists (8)
Sort Name ascending (A-Z)
Starred repositories
🔥 A Survey on AI Auto-Research
Triton kernels and PyTorch ops for Block Attention Residuals (AttnRes)
Specification and documentation for Agent Skills
This is end to end course on AI Agents and Agentic AI with 15+ AI Agent Projects with real time use cases and industry expertise.
CVPR-NTIRE 2026 Challenge on Video Saliency Prediction
Mobile and Web client for Codex and Claude Code, with realtime voice, encryption and fully featured
An AI SKILL that provide design intelligence for building professional UI/UX multiple platforms
A Curated List of Awesome Works in World Modeling, Aiming to Serve as a One-stop Resource for Researchers, Practitioners, and Enthusiasts Interested in World Modeling.
Sharp Monocular View Synthesis in Less Than a Second
The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…
Regular expressions (or Regexes) are patterns used to match character combinations in strings. In this challenge, I learned to build a Regex engine from scratch by recreating grep, a CLI tool for r…
Control Google Meet with customizable keyboard shortcuts. Toggle mic, camera, and navigate meetings from anywhere with global hotkeys.
Official inference repo for FLUX.2 models
[ACL 2025 🔥] A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding
Official implementation of Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents (NeurIPS 2025)
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Artifact for "Marconi: Prefix Caching for the Era of Hybrid LLMs" [MLSys '25 Outstanding Paper Award, Honorable Mention]
Dream to Control: Learning Behaviors by Latent Imagination
The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"
A Pocket-Sized MLLM for Ultra-Efficient Image and Video Understanding on Your Phone
cjeen / LoRAEdit
Forked from tdrussell/diffusion-pipeWe achieves high-quality first-frame guided video editing given a reference image, while maintaining flexibility for incorporating additional reference conditions.
[CVPR 2025] RelationField: Relate Anything in Radiance Fields
[ICCV 25]SpectralAR: Spectral Autoregressive Visual Generation
Official Repository of End-to-End Implicit Neural Representations for Classification (CVPR 2025)
[CVPR2025] Official Implementation of ILLUME+
The simplest, fastest repository for training/finetuning small-sized VLMs.