GitHub

Bio 🌱

Hi!👋😊

I'm José, an NLP researcher deeply passionate about exploring the limitless possibilities of natural language processing. My PhD focused on summarization and attention-based models, but my work spans a wide range of NLP topics, including: 📚 [Zero- and few-shot] Text Classification, 📜 Automatic Summarization, 😊 Sentiment and Emotion Analysis, 🌟 Figurative Language Understanding, 🗣️ Dialogue Systems, 📄 Information Extraction, or 🤖 Machine-Generated Multimodal Content Detection.

Since 2016, my research has centered on the intersection of deep learning and NLP, striving to develop efficient solutions for complex language challenges. I'm also dedicated to advancing NLP for Spanish and co-official languages in Spain, working on initiatives that bridge linguistic and technological gaps.

Over the years, I've been an active participant in shared tasks across a variety of NLP domains. I was part of the winning teams in several competitions, including TASS 2017 to 2020, IroSVA, COSET, and SemEval 2024 Task 8. I’ve also achieved strong results in other SemEval challenges, such as 2017, 2018 (1), 2018 (2), or 2019.

In 2023, I began organizing a line of shared tasks at the Iberian Languages Evaluation Forum (IberLEF), focusing on machine-generated text detection and attribution, such as AuTexTification, IberAuTexTification, and MIMIC. I was also on the program committee for the GenAI content detection task at COLING 2025, and I am one of the three organizers of IberLEF from 2025 to 2027.

Outside of research, I’m passionate about teaching. I currently teach courses on information retrieval, intelligent agents, and programming at Universidad Europea, as well as advanced machine learning techniques in the Master’s in Big Data program at Universidad de Barcelona. Besides, I am a recurrent invited speaker to the Master's in Artificial Intelligence of the UPV to give a talk about language modeling and embeddings.

I'm also proud to share that my PhD thesis was awarded cum laude and received the best NLP thesis award from the Spanish Society for Natural Language Processing.

Works 👨🏻‍🔧

Here are some of my works with public source code and (a few) publications during these years:

Work	Repo	Paper	Journal/Conference
BERT for tweets before HuggingFace's era	Link	Link	Neurocomputing
Hierarchical attention-based models for summarization	Link, Link	Link, Link, Link	Intelligent & Fuzzy Systems, IberSpeech 2022
Spanish and Catalan datasets for summarization	Link	Link	NAACL
Source summary entity aggregations in abstractive summarization	Link	Link	COLING
Transformer-based contextualization for irony detection	Link	Link	Information Processing & Management
LLMixtic, winning system at SemEval 2024 Task 8	Link	Link	Proceedings of SemEval 2024
TextMachina, a framework to build MGT datasets	Link	Link	KES 2024
Text & Multimodal machine-generated content detection & attribution	Link, Link, Link	Link, Link	SEPLN
IberBench, a benchmark of LLMs in Iberian languages	Link	Link	Computer Speech & Language
Copy mechanism for Transformers	Link	N/A	N/A
MinGRU implementation	Link	N/A	N/A
Tuning LLMs by Proxy implementation	Link	N/A	N/A
Implementation of Group Relative Policy Optimization from DeepSeek R1-zero	Link, Link	N/A	N/A