Hi, I'm

Tanish Jain.

I'm a Senior AI/ML Engineer at Wiley. I build RAG systems for scholarly content, design the evaluation frameworks that tell us whether they work, and publish research on the side.

Email me GitHub LinkedIn Google Scholar

About

I'm a production ML engineer who still does research. At Wiley, I work on the systems that bring our scholarly and educational content into AI products: retrieval, evaluation, generation, and the plumbing in between.

I build systems end to end, and evaluation is the part I care most about. The interesting part of shipping a retrieval or generation system isn't the first working prototype; it's the harness around it that tells you whether a change helped or hurt. Q&A benchmarks at collection scale, LLM-as-judge layered over strict matching, sweeping the retrieval configuration space before anything touches production. The kinds of things that keep the rest of the work honest.

Before Wiley I did an MS in CS at Stanford (AI track) and a BS in Electrical Engineering at UCSD. I've worked on robot simulation (iGibson 2.0 at CoRL 2021), bioelectronics (wireless smart bandages, Nature Biotechnology), and deep learning for general aviation. Lately I've been working on an independent project about outcome switching in clinical trials. Paper in progress.

Experience

Where I've worked.

Senior AI/ML Engineer · Wiley
July 2022 – Present

I work on retrieval, evaluation, and applied ML for Wiley’s RAG products. The goal across everything: ground generative AI in scholarly sources we trust.
- 0.85 Retrieval MRR
- 0.94 Recall@10
- 2.9s p95 latency
- +13 to 27pp LLM-judge recall lift
- +68% Customer-record coverage
- Designed and shipped a corporate RAG product on Weaviate and AWS Bedrock. Migrated the Domains API off AWS Knowledge Base with a zero-downtime cutover, and added federated patent and PubMed search with per-customer entitlement controls.
- Built the retrieval evaluation framework used across Wiley’s RAG systems. Designed 1,000+ question Q&A benchmarks per collection, an LLM-as-judge layer that catches +13 to 27pp of lenient recall over strict matching, and 12-configuration retrieval sweeps that gate production releases.
- Built the collection-aware retrieval layer: hybrid search with RRF, metadata pre-filtering, and strict tenant isolation with zero cross-collection contamination. Root-caused a reranker regression (−8 to −14pp Recall@10) that led the team to pull reranking from the default pipeline.
- Evaluated scientific (SciBERT, PubMedBERT, BioLinkBERT) and commercial (Gemini, Cohere, Titan v2, Voyage) embeddings on domain benchmarks. Shipped a hierarchical chunking strategy adapted to journal articles, systematic reviews, and long-form content.
- Earlier: productionized graph-neural-network enrichment pipelines that lifted customer-record coverage by 68%, and shipped LLM-powered review and fraud-detection systems with privacy safeguards and bias testing.
Research Assistant · Gurtner Lab, Stanford Medicine
June 2021 – July 2022

Trained ML models to predict wound-healing trajectories from real-time sensor data on prototype smart bandages, 12pp better than clinician observation-based predictions. The work fed into the Nature Biotechnology paper on wireless closed-loop smart bandages.
Research Assistant · Changing Cities Research Lab, Stanford Sociology
March 2021 – July 2022

Built a computer vision model that scores neighborhood conditions from Google Street View imagery across three cities. Reached 94% recall, 9pp above the prior single-city benchmark. Used to study links between urban environment and well-being.
Research · Stanford Vision and Learning Lab
January 2021 – 2022

Co-authored iGibson 2.0, an object-centric simulator for robot learning of household tasks. Accepted at CoRL 2021.

Selected work

Projects & publications.

A mix of current research, published papers, and a patent. Grouped together because several are both.

The Moving Goalpost Tracker
2026

A pipeline that compares ClinicalTrials.gov registered endpoints to published paper outcomes using embedding similarity and an LLM judge, flagging trials where the reported primary endpoint differs from the pre-registered one. Targeting Phase III oncology trials. Paper in progress.
- Clinical Trials
- LLM-as-judge
- NLP
- Repo (private, opens on publication)
iGibson 2.0: Object-centric simulation for robot learning
2021

Open-source simulation environment for embodied AI research. Features object states, logic functions for task specification, and a VR interface for human demonstrations.
- Robotics
- Simulation
- Paper
- Project page
Wireless closed-loop smart bandage
2023

Flexible bioelectronic system with wireless sensing and stimulation that accelerated wound healing in preclinical models. I contributed the ML models that predict healing trajectories from sensor data.
- Bioelectronics
- Sensors
- Paper
Gait-correcting insole for Parkinson’s
2023

Smart insole that measures gait parameters and delivers haptic cues as continuous physical therapy.
- Wearables
- US Patent 17/407,615
Unstable-approach detection in aircraft
2022

Deep-learning model that classifies in-flight approach trajectories in real time from flight-recorder telemetry. Triggers go-around warnings before landing, where the margin for correction is smallest.
- Aviation
- Time series
- Paper
Online Link Prediction with Graph Neural Networks
2021

Walk-through of GraphSAGE on the ogb-ddi drug–drug interaction graph. Covers sampling strategies, negative sampling, and evaluation trade-offs.
- GNNs
- Blog post

Earlier work

Evaluating ML-based skin cancer diagnosis
2024

Evaluated two deep-learning skin-lesion classifiers for explainability and fairness on the HAM10000 dataset. Found significant false-positive / false-negative disparities between light and dark skin tones, and showed a Calibrated Equalized Odds postprocessing step narrows the gap.
- Fairness
- Medical imaging
Classification of cellular states for image-based microfluidic sorting
2022

Trained cell-type classifiers on ~2M images from the Steinmetz lab’s Image Cell Sorting (ICS) platform, demonstrating feasibility of deep learning on high-speed (15k cells/s) microfluidic imagery.
- Computer Vision
- Biology
Indianajones.ai: mapping water-management features in the South Indian Neolithic
2020

Fully convolutional networks on multi-temporal PlanetScope imagery, detecting the seasonal spectral signature of water to surface candidate Neolithic–Iron Age water-management sites in the Deccan.
- Remote sensing
- Archaeology
Protectionism in defense aerospace: a comparative analysis of India and Israel
2019

Six-month comparative policy analysis of India’s and Israel’s defense-aerospace postures in a multipolar world, co-written with Shlok Misra. Finalist team at the USAIRE Student Award, presented in Paris.
- Policy
- Aerospace

Contact

Get in touch.

I'm open to conversations about AI/ML engineering roles, retrieval and evaluation work, and applied ML research. The fastest way to reach me is email.

tanishj@stanford.edu LinkedIn GitHub Google Scholar

Tanish Jain.

About

Where I've worked.

Senior AI/ML Engineer · Wiley

Research Assistant · Gurtner Lab, Stanford Medicine

Research Assistant · Changing Cities Research Lab, Stanford Sociology

Research · Stanford Vision and Learning Lab

Projects & publications.

The Moving Goalpost Tracker

iGibson 2.0: Object-centric simulation for robot learning

Wireless closed-loop smart bandage

Gait-correcting insole for Parkinson’s

Unstable-approach detection in aircraft

Online Link Prediction with Graph Neural Networks

Evaluating ML-based skin cancer diagnosis

Classification of cellular states for image-based microfluidic sorting

Indianajones.ai: mapping water-management features in the South Indian Neolithic

Protectionism in defense aerospace: a comparative analysis of India and Israel

Get in touch.