Machine Learning Researcher | Reinforcement Learning | LLM Systems
ML researcher focused on reinforcement learning and LLM-based systems.
- Research: RL, RLHF, agent systems, vulnerability discovery
- Building: production-grade ML systems & high-performance inference
- Teaching: ML & RL courses with 500+ students
I work at the intersection of:
- theory (RL, optimization, convergence)
- systems (LLM infra, high-load serving)
- applied research (agents, code intelligence, security)
π Personal website: https://mountainai.tech/
- Reinforcement Learning (from bandits β PPO / GRPO / RLHF)
- LLM agents & tool-use systems
- Offline / Online RL for language models
- Structural reasoning & context reconstruction in code
-
π RL Atlas β interactive visualization of RL algorithms
β https://github.com/pyshka501/mountainai-rl-atlas -
π€ HuggingFace Spaces (interactive demos & experiments)
β https://huggingface.co/pyshka501/spaces -
π§ͺ Research on blind vulnerability discovery with LLMs
(context reconstruction as core bottleneck) -
β‘ High-performance LLM inference systems
(latency optimization, multi-token prediction, scaling)
ML / Research
- PyTorch, RL (TD, MC, PG, PPO, RLHF)
- Transformers, LLM fine-tuning, evaluation
- Statistical modeling & optimization
Systems / Infra
- vLLM, Triton, high-load inference
- Docker, FastAPI, distributed systems
- Performance optimization & scaling
Data
- pandas, NumPy, scikit-learn
- Large-scale dataset processing
Author and instructor of multiple courses:
- Reinforcement Learning: from Bandits to PPO/GRPO & RLHF
- Machine Learning (fundamentals β production)
- NLP & Semantic Search
π Courses launched and scaled to 500+ students
- Mathematical rigor + intuition
- Practical assignments & real implementations
- Focus on modern ML systems and real-world use cases
- RL / LLM research
- Agent systems & reasoning
- ML infra & optimization
- Early-stage AI products