Skip to content

saikat107/saikat107

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 

Repository files navigation

🌟 Profile Summary

saikat107

saikat107

πŸ‘‹ About me

Hi, I am Saikat, a Senior Researcher at the Research in Software Engineering (RiSE) group at Microsoft Research, working on reliability of large language models for code and post-training. I bring 10 years of experience in training and evaluating code models, with a focus on improving the correctness and fidelity of generated programs under real-world constraints.

My work guides code generation models through static and dynamic correctness signalsβ€”via tests, program analysis, and verificationβ€”and uses these signals through fine-tuning and reinforcement learning. I view reliability as fundamentally a training problem, driven by structured and composable feedback.

Earlier, I graduated with a Ph.D. in Computer Science from Columbia University, advised by Professor Baishakhi Ray. I wrote my Ph.D. thesis on Learning to Edit Code.

🌐 Website: saikatc.info

πŸ‘€ Research Focus

  • Post-training for code: SFT, RLHF/GRPO, reward modeling, reranking, retrieval-augmented fine-tuning
  • Correctness supervision: Reward design using test generation, execution feedback, mutation testing, specification inference, and program analysis
  • Agent-driven testing: DeepTest β€” symbolic analysis + LLMs for testing production code at scale
  • Formal verification: DeepProof β€” post-training models for theorem proving (F*, Rocq, Lean)
  • Systems: PyTorch, Megatron-LM, Ray, distributed GPU clusters, Kubernetes

πŸ“’ Selected Highlights

  • πŸ† ICSE'25 β€” Neural Synthesis for Proof-Oriented Programming [Distinguished Paper Award]
  • πŸ† ISSTA'23 β€” Contrastive Learning for Code Understanding [Distinguished Paper Award]
  • πŸ“„ ACL'25 β€” Teaching an Old LLM Secure Coding via Localized Preference Optimization
  • πŸ“„ EMNLP'23 β€” Ranking LLM-Generated Loop Invariants
  • πŸ“„ ICSE'24 β€” Causal Learning for Code Understanding
  • πŸ“„ FSE'22 β€” NatGen: Semantic Rewriting for Pretraining of Code Models
  • πŸ“„ NAACL'21 β€” Unified Pretraining for Code Understanding and Generation

πŸ‘ Open to Collaboration

  • Use of LLMs for program synthesis, editing, and verification
  • Reinforcement learning with execution and correctness feedback for code
  • Formal methods meets machine learning (proof generation, specification mining)

✨ Connect with me

Website   Google Scholar   LinkedIn   Twitter

About

Config files for my GitHub profile.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors