Official repository for the paper "Distilling Knowledge from Large Language Models: A Concept Bottleneck Model for Hate and Counter Speech Recognition", in which we propose Speech Concept Bottleneck Models (SCBMs), a novel approach for automated hate and counter speech recognition.
✨ SCBM brings interpretability to hate and counter speech recognition by routing decisions through human-readable adjective concepts.
🔗 Paper: https://www.sciencedirect.com/science/article/pii/S030645732500250X
- Adjective tools
- ✨ Adjective generation: AdjectiveSetGeneration
- 📖 Adjective definitions: AdjectiveDefinition
- SCBM representations and models
- 🦙 LLaMA-based feature extraction: Llama
- 🎯 SCBM and SCBM-T training: SCBM(T)
- 🧪 Prompt/persona sensitivity: prompt-sensitibity
- Baselines and zero-shot
- ⚡ Transformer baselines: Transformers_baseline
- 🔎 Zero-shot (OpenAI/LLaMA): zero-shot-evaluation
- 🧩 ICL & CoT experiments: ICL & CoT experiments
- 🗂️ Datasets overview: Tasks
- Quickstart
- Model Architecture
- Results
- Training & Evaluation
- SCBM Representation Computation
- Training and Evaluation of SCBM and SCBM-T
- Zero-shot Evaluation on GPT Family
- Citation
Follow these steps to reproduce the main pipeline end-to-end.
- Install dependencies:
pip install -r requirements.txt
- Compute SCBM features with LLaMA (writes .pickle next to CSV):
set -x HF_USER your-username
set -x HF_TOKEN your-token
python Llama/main.py \
--input_files ./Tasks/germeval/train.csv \
--use_context false \
--adjectives_file ./AdjectiveSetGeneration/adjectives.csv
- Train SCBM variants:
# SCBM (HS_CS)
python "SCBM(T)/SCBM.py" \
--train_file_name ./Tasks/hs_cs/train.csv \
--test_file_name ./Tasks/hs_cs/test.csv \
--use_regularization false \
--output_dir ./SCBM(T)
SCBM is designed for hate and counter speech recognition by integrating human-interpretable adjective-based concepts as a bottleneck layer between input text and classification.
SCBM leverages adjectives-based representation as semantically meaningful bottleneck concepts derived probabilistically from LLMs, then classifies texts via a transparent, lightweight classifier that learns to prioritize key adjectives. This results in competitive hate and counter speech recognition performance with strong interpretability compared to black-box transformer models.
We summarize quantitative performance across datasets and show qualitative explanation examples.
Performance of all explored approaches in our paper across all employed datasets in terms of macro-$F_1$ score. The best-performing approach in each category is highlighted in italics, and the best-performing approach per dataset is highlighted in bold.
Method | Dataset | ||||||
---|---|---|---|---|---|---|---|
GermEval | ELF22 | HS-CS | CONAN | TSNH | |||
Random | 0.488 | 0.515 | 0.347 | 0.109 | 0.503 | ||
I | SVM | 0.648±0.000 | 0.553±0.000 | 0.426±0.000 | 0.364±0.000 | 0.696±0.007 | |
LR | 0.586±0.000 | 0.556±0.000 | 0.413±0.000 | 0.322±0.000 | 0.693±0.007 | ||
RF | 0.535±0.009 | 0.531±0.027 | 0.323±0.014 | 0.259±0.005 | 0.689±0.005 | ||
GB | 0.571±0.002 | 0.547±0.027 | 0.374±0.008 | 0.368±0.005 | 0.668±0.008 | ||
MLP | 0.648±0.003 | 0.542±0.010 | 0.398±0.003 | 0.386±0.011 | 0.672±0.005 | ||
II | SVM | Llama 2 | 0.695±0.029 | 0.356±0.000 | 0.504±0.000 | 0.593±0.000 | 0.637±0.090 |
Llama 3.1 | 0.779±0.000 | 0.669±0.000 | 0.577±0.000 | 0.602±0.000 | 0.724±0.010 | ||
LR | Llama 2 | 0.693±0.029 | 0.356±0.000 | 0.504±0.000 | 0.593±0.000 | 0.646±0.093 | |
Llama 3.1 | 0.777±0.000 | 0.671±0.000 | 0.577±0.000 | 0.602±0.000 | 0.723±0.009 | ||
RF | Llama 2 | 0.689±0.028 | 0.646±0.010 | 0.466±0.012 | 0.394±0.012 | 0.604±0.010 | |
Llama 3.1 | 0.757±0.004 | 0.671±0.003 | 0.487±0.009 | 0.486±0.012 | 0.719±0.005 | ||
GB | Llama 2 | 0.729±0.019 | 0.561±0.000 | 0.500±0.001 | 0.481±0.000 | 0.642±0.092 | |
Llama 3.1 | 0.766±0.000 | 0.577±0.001 | 0.562±0.002 | 0.534±0.002 | 0.721±0.006 | ||
MLP | Llama 2 | 0.743±0.017 | 0.396±0.079 | 0.481±0.011 | 0.627±0.011 | 0.640±0.096 | |
Llama 3.1 | 0.762±0.018 | 0.654±0.017 | 0.556±0.014 | 0.618±0.018 | 0.728±0.007 | ||
III | XLM-RoBERTa-base | 0.747±0.017 | 0.645±0.018 | 0.524±0.008 | 0.729±0.016 | 0.747±0.013 | |
BERT-base | 0.654±0.040 | 0.670±0.008 | 0.543±0.004 | 0.721±0.022 | 0.752±0.022 | ||
XLM-RoBERTa-large | 0.786±0.004 | 0.680±0.008 | 0.572±0.021 | 0.746±0.020 | 0.781±0.009 | ||
BERT-large | 0.676±0.014 | 0.683±0.009 | 0.545±0.011 | 0.744±0.007 | 0.773±0.008 | ||
IV | GPT 3.5 | 0.686±0.003 | 0.469±0.078 | 0.247±0.012 | 0.291±0.067 | 0.508±0.022 | |
GPT 4o | 0.833±0.025 | 0.500±0.039 | 0.267±0.014 | 0.361±0.140 | 0.560±0.017 | ||
GPT 4o (ICL) | 0.854±0.002 | 0.651±0.005 | 0.390±0.006 | 0.763±0.007 | 0.642±0.026 | ||
GPT o3-mini (CoT) | 0.666±0.165 | 0.606±0.004 | 0.301±0.008 | 0.542±0.012 | 0.503±0.009 | ||
Llama 3.1 | 0.700±0.112 | 0.510±0.013 | 0.270±0.018 | 0.203±0.017 | 0.438±0.081 | ||
V | HSCBM | Llama 2 | 0.746±0.004 | 0.673±0.007 | 0.536±0.005 | 0.616±0.011 | 0.705±0.013 |
Llama 3.1 | 0.781±0.003 | 0.693±0.011 | 0.581±0.008 | 0.630±0.006 | 0.739±0.008 | ||
HSCBM-R | Llama 2 | 0.745±0.002 | 0.638±0.027 | 0.523±0.004 | 0.611±0.011 | 0.705±0.009 | |
Llama 3.1 | 0.779±0.002 | 0.683±0.006 | 0.574±0.010 | 0.610±0.008 | 0.735±0.008 | ||
HSCBMT | Llama 2 | 0.766±0.004 | 0.658±0.008 | 0.542±0.016 | 0.723±0.016 | 0.709±0.104 | |
Llama 3.1 | 0.768±0.009 | 0.685±0.012 | 0.551±0.013 | 0.714±0.016 | 0.763±0.011 | ||
HSCBMT-R | Llama 2 | 0.757±0.009 | 0.637±0.003 | 0.526±0.023 | 0.710±0.013 | 0.710±0.107 | |
Llama 3.1 | 0.769±0.008 | 0.666±0.012 | 0.540±0.011 | 1.710±0.009 | 0.760±0.006 |
Top-10 most relevant adjectives for individual input samples from each class of the HS-CS dataset provided by SCBM. For comparison, we provide LIME explanations for the same samples generated from the fine-tuned XLM-RoBERTa model.
Transformer baselines are provided in Transformers_baseline/
and operate over the CSVs under Tasks/
.
- Baselines (train/dev split):
run_transformers.py
- Baselines (5-fold CV, e.g., TSNH):
run_transformers-crossval.py
Examples:
# ELF22 split
python Transformers_baseline/run_transformers.py \
--train_file ./Tasks/elf22/train.csv \
--dev_file ./Tasks/elf22/test.csv \
--output_file ./Transformers_baseline/elf22_baselines.pickle
# TSNH cross-validation
python Transformers_baseline/run_transformers-crossval.py \
--train_file ./Tasks/tsnh/TSNH_uniform.csv \
--output_file ./Transformers_baseline/tsnh_cv_baselines.pickle
Use Llama/main.py
to compute SCBM adjective-probability representations with Llama-3.1-8B-Instruct
. The script reads one or more CSVs from Tasks/
and writes a sibling .pickle
with id
, values
(probability vectors), and text
(for no-context runs).
Environment (first run clones the model):
set -x HF_USER your-username
set -x HF_TOKEN your-token
Examples:
# No-context (e.g., GermEval)
python Llama/main.py \
--input_files ./Tasks/germeval/test.csv \
--use_context false \
--adjectives_file ./AdjectiveSetGeneration/adjectives.csv \
--repository meta-llama/Llama-3.1-8B-Instruct \
--batch_size 244
# Context (e.g., HS_CS)
python Llama/main.py \
--input_files "[\"./Tasks/hs_cs/train.csv\", \"./Tasks/hs_cs/test.csv\"]" \
--use_context true \
--adjectives_file ./AdjectiveSetGeneration/adjectives.csv
SCBM variants live in SCBM(T)/
:
SCBM.py
: classifier over adjective-probability featuresSCBMT.py
: text + features fusion variant
These scripts expect the .pickle
feature files created by the Llama
step (same basename as the CSV, e.g., train.csv.pickle
).
Examples:
# SCBM (features only)
python "SCBM(T)/SCBM.py" \
--train_file_name ./Tasks/hs_cs/train.csv \
--test_file_name ./Tasks/hs_cs/test.csv \
--use_regularization false \
--output_dir ./SCBM(T)
# SCBM-T (text + features)
python "SCBM(T)/SCBMT.py" \
--train_file_name ./Tasks/hs_cs/train.csv \
--test_file_name ./Tasks/hs_cs/test.csv \
--use_regularization false \
--output_dir ./SCBM(T)
Zero-shot baselines live in zero-shot-evaluation/
and support both OpenAI Chat Completions and local LLaMA-3.1.
Scripts
openai-zero-shot.py
: Uses OpenAI Chat Completions. Model is configurable (e.g.,gpt-3.5-turbo
,chatgpt-4o-latest
).llama-zero-shot.py
: Uses a local pipeline formeta-llama/Llama-3.1-8B-Instruct
.
If you use this repository in your research, please cite:
@article{distilling-scbm,
title = {Distilling knowledge from large language models: A concept bottleneck model for hate and counter speech recognition},
journal = {Information Processing & Management},
volume = {63},
number = {2, Part A},
pages = {104309},
year = {2026},
issn = {0306-4573},
doi = {https://doi.org/10.1016/j.ipm.2025.104309},
url = {https://www.sciencedirect.com/science/article/pii/S030645732500250X},
author = {Roberto Labadie-Tamayo and Djordje Slijepčević and Xihui Chen and Adrian Jaques Böck and Andreas Babic and Liz Freimann and Christiane Atzmüller and Matthias Zeppelzauer},
}