Currently researching LLM evaluation consistency and inference reliability.
Recent work:
- Safety Stability Index — 18-28% of safety prompts flip between refuse/comply on random seed
- vLLM Failure Modes — GPU memory threshold analysis for inference reliability