I hold a Ph.D. in Artificial Intelligence from Sapienza University of Rome, specializing in Reinforcement Learning (RL) for Large Language Model (LLM) post-training and alignment. My research and engineering work focuses on developing Transformer-based Foundation Models and building autonomous reasoning agents.
My work heavily focuses on:
- LLM Alignment & RL: Designing policy optimization algorithms (PPO, GRPO, GTPO) to mitigate LLM policy collapse and improve reasoning capabilities.
- LLM Architecture & Retrieval: Building LLM architectures from scratch (Sparse MoE, ROPE, GQA) and implementing complex RAG frameworks for knowledge extraction.
- AI for Cybersecurity: Engineering Foundation Models for cyber attack prediction and modeling Knowledge Graphs for Cyber Threat Intelligence (CTI).
- Machine Learning & Frameworks: Python, PyTorch, TensorFlow, JAX/Flax, HuggingFace, LangChain, Unsloth, vLLM, TRL.
- Cybersecurity & Data: NetworkX, MITRE ATT&CK, MBC, CAPEC, Metasploit, Pwngdb, SQL, Mongodb, Neo4j.
- DevOps & Tools: Docker, Linux, Git.
- GTPO: Trajectory-Based Policy Optimization: Designed a KL-free policy optimization algorithm for LLM post-training that mitigates policy collapse. It boosted reasoning performance by up to 15% on OOD benchmarks (AIME2024, AIME2025, AMC) compared to GRPO.
- MORSE: Mixture-of-RAG-Security-Experts: Developed a dual-cascaded RAG framework with 7 parallel retrievers tailored for cybersecurity Q&A. It outperformed GPT-4 by 15% in response accuracy for general and multi-hop cybersecurity questions.
- DantinoX: From-Scratch LLM: Built an LLM architecture from scratch in JAX/Flax featuring Sparse MoE, ROPE, Attention gating and GQA. Maximized hardware throughput via Sliding Window Attention, Static KV Caching, and Gradient Checkpointing.
- TITAN: Context-Aware Reasoning for CTI: Architected a Knowledge Graph reasoning framework for Cyber Threat Intelligence, automating complex threat analysis by modeling relationships across IoCs, TTPs, and CVEs.
- CNR-IIT & NetGroup | AI Researcher: Engineered an LLM-driven framework to seamlessly automate the translation of natural language requirements into structured XACML access control policies.
- Horus Project | AI Researcher: Architected and trained a custom Transformer-based Foundation Model from scratch, specifically designed for proactive cyber-attack prediction.
- GTPO vs GRPO: A Smarter Path to Stable Reasoning LLMs: An in-depth analysis of addressing GRPO's gradient conflicts and policy collapse through the use of conflict masks and entropy regularization.
- REINFORCE vs. Posterior Token Targets: Two Paths to Steering Language Models: Exploring the core mechanics of how we should change the probabilities a model assigns to different tokens at each step to effectively steer behavior.
- Email: marco.simoni0711@gmail.com
- Website: winstonsmith1897
- Google Scholar: Marco Simoni
- Linkedin: Marco Simoni
- X: Marco Simoni