Jaydeep Raijada jaydeepraijada

Hi, I'm Jaydeep

I train small language models and study how far they can be pushed with reinforcement learning and post-training. Most of my work sits at the intersection of post-training methods (SFT, RLHF, GRPO, reward modeling) and sub-1B models, testing whether large-model techniques scale down ~70x.

Currently an Analyst at Lowe's (production NLP & ML), doing post-training and RL research on my own time. Top 100 @ HuggingFace x Meta OpenEnv Hackathon (Bangalore).

Featured work

Project	Description
SHADE-GYM	OpenEnv-native RL gym for AI safety. Trained a 1.5B LoRA monitor via GRPO, improving AUROC from 0.500 → 0.893, closing ~40% of the gap to a frontier model at <0.1% of the cost.
post-training-experiments	Domain-adapted SmolLM-135M using QLoRA, achieving −20.1% perplexity and +25.4% ROUGE-L, with a writeup analyzing which post-training interventions actually moved the needle.
Diffusion	Built diffusion language models from scratch, including a ~150M ModernBERT-based model and a 45M first-principles implementation.
TenderIQ	Production document-evaluation system featuring multi-tier OCR, RAG, and human-in-the-loop review for scalable document analysis.

Find me

jaydeepraijada.com HuggingFace Substack LinkedIn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jaydeep Raijada jaydeepraijada

Achievements

Achievements

Block or report jaydeepraijada

Hi, I'm Jaydeep

Featured work

Find me

Pinned Loading

Uh oh!