Hi, I'm
Tanish Jain.
I'm a Senior AI/ML Engineer at Wiley. I build RAG systems for scholarly content, design the evaluation frameworks that tell us whether they work, and publish research on the side.
About
About
I'm a production ML engineer who still does research. At Wiley, I work on the systems that bring our scholarly and educational content into AI products: retrieval, evaluation, generation, and the plumbing in between.
I build systems end to end, and evaluation is the part I care most about. The interesting part of shipping a retrieval or generation system isn't the first working prototype; it's the harness around it that tells you whether a change helped or hurt. Q&A benchmarks at collection scale, LLM-as-judge layered over strict matching, sweeping the retrieval configuration space before anything touches production. The kinds of things that keep the rest of the work honest.
Before Wiley I did an MS in CS at Stanford (AI track) and a BS in Electrical Engineering at UCSD. I've worked on robot simulation (iGibson 2.0 at CoRL 2021), bioelectronics (wireless smart bandages, Nature Biotechnology), and deep learning for general aviation. Lately I've been working on an independent project about outcome switching in clinical trials. Paper in progress.
Experience
Where I've worked.
-
Senior AI/ML Engineer · Wiley
July 2022 – PresentI work on retrieval, evaluation, and applied ML for Wiley’s RAG products. The goal across everything: ground generative AI in scholarly sources we trust.
- 0.85 Retrieval MRR
- 0.94 Recall@10
- 2.9s p95 latency
- +13 to 27pp LLM-judge recall lift
- +68% Customer-record coverage
- Designed and shipped a corporate RAG product on Weaviate and AWS Bedrock. Migrated the Domains API off AWS Knowledge Base with a zero-downtime cutover, and added federated patent and PubMed search with per-customer entitlement controls.
- Built the retrieval evaluation framework used across Wiley’s RAG systems. Designed 1,000+ question Q&A benchmarks per collection, an LLM-as-judge layer that catches +13 to 27pp of lenient recall over strict matching, and 12-configuration retrieval sweeps that gate production releases.
- Built the collection-aware retrieval layer: hybrid search with RRF, metadata pre-filtering, and strict tenant isolation with zero cross-collection contamination. Root-caused a reranker regression (−8 to −14pp Recall@10) that led the team to pull reranking from the default pipeline.
- Evaluated scientific (SciBERT, PubMedBERT, BioLinkBERT) and commercial (Gemini, Cohere, Titan v2, Voyage) embeddings on domain benchmarks. Shipped a hierarchical chunking strategy adapted to journal articles, systematic reviews, and long-form content.
- Earlier: productionized graph-neural-network enrichment pipelines that lifted customer-record coverage by 68%, and shipped LLM-powered review and fraud-detection systems with privacy safeguards and bias testing.
-
Research Assistant · Gurtner Lab, Stanford Medicine
June 2021 – July 2022Trained ML models to predict wound-healing trajectories from real-time sensor data on prototype smart bandages, 12pp better than clinician observation-based predictions. The work fed into the Nature Biotechnology paper on wireless closed-loop smart bandages.
-
Research Assistant · Changing Cities Research Lab, Stanford Sociology
March 2021 – July 2022Built a computer vision model that scores neighborhood conditions from Google Street View imagery across three cities. Reached 94% recall, 9pp above the prior single-city benchmark. Used to study links between urban environment and well-being.
-
Research · Stanford Vision and Learning Lab
January 2021 – 2022Co-authored iGibson 2.0, an object-centric simulator for robot learning of household tasks. Accepted at CoRL 2021.
Selected work
Projects & publications.
A mix of current research, published papers, and a patent. Grouped together because several are both.
-
The Moving Goalpost Tracker
2026A pipeline that compares ClinicalTrials.gov registered endpoints to published paper outcomes using embedding similarity and an LLM judge, flagging trials where the reported primary endpoint differs from the pre-registered one. Targeting Phase III oncology trials. Paper in progress.
-
iGibson 2.0: Object-centric simulation for robot learning
2021Open-source simulation environment for embodied AI research. Features object states, logic functions for task specification, and a VR interface for human demonstrations.
-
Wireless closed-loop smart bandage
2023Flexible bioelectronic system with wireless sensing and stimulation that accelerated wound healing in preclinical models. I contributed the ML models that predict healing trajectories from sensor data.
-
Gait-correcting insole for Parkinson’s
2023Smart insole that measures gait parameters and delivers haptic cues as continuous physical therapy.
-
Unstable-approach detection in aircraft
2022Deep-learning model that classifies in-flight approach trajectories in real time from flight-recorder telemetry. Triggers go-around warnings before landing, where the margin for correction is smallest.
-
Online Link Prediction with Graph Neural Networks
2021Walk-through of GraphSAGE on the ogb-ddi drug–drug interaction graph. Covers sampling strategies, negative sampling, and evaluation trade-offs.
Earlier work
-
Evaluating ML-based skin cancer diagnosis
2024Evaluated two deep-learning skin-lesion classifiers for explainability and fairness on the HAM10000 dataset. Found significant false-positive / false-negative disparities between light and dark skin tones, and showed a Calibrated Equalized Odds postprocessing step narrows the gap.
-
Classification of cellular states for image-based microfluidic sorting
2022Trained cell-type classifiers on ~2M images from the Steinmetz lab’s Image Cell Sorting (ICS) platform, demonstrating feasibility of deep learning on high-speed (15k cells/s) microfluidic imagery.
-
Indianajones.ai: mapping water-management features in the South Indian Neolithic
2020Fully convolutional networks on multi-temporal PlanetScope imagery, detecting the seasonal spectral signature of water to surface candidate Neolithic–Iron Age water-management sites in the Deccan.
-
Protectionism in defense aerospace: a comparative analysis of India and Israel
2019Six-month comparative policy analysis of India’s and Israel’s defense-aerospace postures in a multipolar world, co-written with Shlok Misra. Finalist team at the USAIRE Student Award, presented in Paris.
Contact
Get in touch.
I'm open to conversations about AI/ML engineering roles, retrieval and evaluation work, and applied ML research. The fastest way to reach me is email.