🌐 Build and share your personal website with ease using mjsushanth.github.io, a simple and effective static site generator.
-
Updated
Mar 23, 2026 - HTML
🌐 Build and share your personal website with ease using mjsushanth.github.io, a simple and effective static site generator.
Image captioning with Vision Transformer encoder + GPT decoder. Cross-attention, beam search, attention map visualization, BLEU/CIDEr metrics.
Early identification of IPO short-term adjustment risk via nonlinear price formation analysis
Pytorch implementation of CL-ViT and FF-ViT models
🚀 Cross attention map tools for huggingface/diffusers
PyTorch Implementation of SD-VSum and S-VideoXum Dataset Distribution from "SD-VSum: A Method and Dataset for Script-Driven Video Summarization" (ACM Multimedia 2025)
Repo for portfolio, containing working redirects to all projects.
Framework encluded with a new Contexualization module (CARU) to enrich embedding data with a lightweight architecture (Multi-Head Cross-Attention), and a module for weighing BPR triplets (TIL)
Photometry Guided Cross Attention Transformers for Astronomical Image Processing
A Bidirectional LSTM-CNN Ensemble with Cross-Attention Gating and Multi-Horizon Feature Fusion for Heterogeneous Retail Demand Forecasting
Official repository for "The Strawberry Problem 🍓: Emergence of Character-level Understanding in Tokenized Language Models"
Tensorflow implementation of 'Robust Image Watermarking based on Cross-Attention and Invariant Domain Learning'
A complete implementation of the "Attention Is All You Need" Transformer model from scratch using PyTorch. This project focuses on building and training a Transformer for neural machine translation (English-to-Italian) on the OpusBooks dataset.
[ICIP 2025] Official implementation of RT-X Net: RGB-Thermal cross attention network for Low-Light Image Enhancement
"FG2024: Dyncamic Cross Attention for Person Verification"
TCR Epitope Generation Model with Top-K Prediction
Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
Multimodal transformer for financial time-series prediction with dual configuration systems (YAML/programmatic), sophisticated data processing pipelines, file caching, and advanced numerical data augmentation
[IV 2025, Oral] Official code of "6Img-to-3D: Few-Image Large-Scale Outdoor Novel View Synthesis"
Add a description, image, and links to the cross-attention topic page so that developers can more easily learn about it.
To associate your repository with the cross-attention topic, visit your repo's landing page and select "manage topics."