Skip to content
View jaydeepraijada's full-sized avatar

Block or report jaydeepraijada

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
jaydeepraijada/README.md

Hi, I'm Jaydeep

I train small language models and study how far they can be pushed with reinforcement learning and post-training. Most of my work sits at the intersection of post-training methods (SFT, RLHF, GRPO, reward modeling) and sub-1B models, testing whether large-model techniques scale down ~70x.

Currently an Analyst at Lowe's (production NLP & ML), doing post-training and RL research on my own time. Top 100 @ HuggingFace x Meta OpenEnv Hackathon (Bangalore).

Featured work

Project Description
SHADE-GYM OpenEnv-native RL gym for AI safety. Trained a 1.5B LoRA monitor via GRPO, improving AUROC from 0.500 → 0.893, closing ~40% of the gap to a frontier model at <0.1% of the cost.
post-training-experiments Domain-adapted SmolLM-135M using QLoRA, achieving −20.1% perplexity and +25.4% ROUGE-L, with a writeup analyzing which post-training interventions actually moved the needle.
Diffusion Built diffusion language models from scratch, including a ~150M ModernBERT-based model and a 45M first-principles implementation.
TenderIQ Production document-evaluation system featuring multi-tier OCR, RAG, and human-in-the-loop review for scalable document analysis.

Find me

jaydeepraijada.com HuggingFace Substack LinkedIn

Pinned Loading

  1. Diffusion Diffusion Public

    Jupyter Notebook 4 3

  2. post-training-experiments post-training-experiments Public

    Python

  3. SHADE-GYM SHADE-GYM Public

    Forked from Mayankpratapsingh022/SHADE-GYM

    SHADE-Arena asks whether a frontier monitor model can catch a frontier agent attempting hidden harmful side-tasks on top of benign user requests.

    Jupyter Notebook

  4. TenderIQ TenderIQ Public

    Python

  5. Learning_RL Learning_RL Public

    Learning RL - from scratch implementations to mini-experiments

    Python