Marco Simoni winstonsmith1897

Marco Simoni

Ph.D. in Artificial Intelligence | LLM Alignment & Foundation Models

I hold a Ph.D. in Artificial Intelligence from Sapienza University of Rome, specializing in Reinforcement Learning (RL) for Large Language Model (LLM) post-training and alignment. My research and engineering work focuses on developing Transformer-based Foundation Models and building autonomous reasoning agents.

Core Focus

My work heavily focuses on:

LLM Alignment & RL: Designing policy optimization algorithms (PPO, GRPO, GTPO) to mitigate LLM policy collapse and improve reasoning capabilities.
LLM Architecture & Retrieval: Building LLM architectures from scratch (Sparse MoE, ROPE, GQA) and implementing complex RAG frameworks for knowledge extraction.
AI for Cybersecurity: Engineering Foundation Models for cyber attack prediction and modeling Knowledge Graphs for Cyber Threat Intelligence (CTI).

Technical Skills

Machine Learning & Frameworks: Python, PyTorch, TensorFlow, JAX/Flax, HuggingFace, LangChain, Unsloth, vLLM, TRL.
Cybersecurity & Data: NetworkX, MITRE ATT&CK, MBC, CAPEC, Metasploit, Pwngdb, SQL, Mongodb, Neo4j.
DevOps & Tools: Docker, Linux, Git.

Featured Projects

GTPO: Trajectory-Based Policy Optimization: Designed a KL-free policy optimization algorithm for LLM post-training that mitigates policy collapse. It boosted reasoning performance by up to 15% on OOD benchmarks (AIME2024, AIME2025, AMC) compared to GRPO.
MORSE: Mixture-of-RAG-Security-Experts: Developed a dual-cascaded RAG framework with 7 parallel retrievers tailored for cybersecurity Q&A. It outperformed GPT-4 by 15% in response accuracy for general and multi-hop cybersecurity questions.
DantinoX: From-Scratch LLM: Built an LLM architecture from scratch in JAX/Flax featuring Sparse MoE, ROPE, Attention gating and GQA. Maximized hardware throughput via Sliding Window Attention, Static KV Caching, and Gradient Checkpointing.
TITAN: Context-Aware Reasoning for CTI: Architected a Knowledge Graph reasoning framework for Cyber Threat Intelligence, automating complex threat analysis by modeling relationships across IoCs, TTPs, and CVEs.

Research Experience

CNR-IIT & NetGroup | AI Researcher: Engineered an LLM-driven framework to seamlessly automate the translation of natural language requirements into structured XACML access control policies.
Horus Project | AI Researcher: Architected and trained a custom Transformer-based Foundation Model from scratch, specifically designed for proactive cyber-attack prediction.

Technical Writing

GTPO vs GRPO: A Smarter Path to Stable Reasoning LLMs: An in-depth analysis of addressing GRPO's gradient conflicts and policy collapse through the use of conflict masks and entropy regularization.
REINFORCE vs. Posterior Token Targets: Two Paths to Steering Language Models: Exploring the core mechanics of how we should change the probabilities a model assigns to different tokens at each step to effectively steer behavior.

Contact & Links

Email: marco.simoni0711@gmail.com
Website: winstonsmith1897
Google Scholar: Marco Simoni
Linkedin: Marco Simoni
X: Marco Simoni

Provide feedback

Saved searches

Use saved searches to filter your results more quickly