SCBM

Official repository for the paper "Distilling Knowledge from Large Language Models: A Concept Bottleneck Model for Hate and Counter Speech Recognition", in which we propose Speech Concept Bottleneck Models (SCBMs), a novel approach for automated hate and counter speech recognition.

✨ SCBM brings interpretability to hate and counter speech recognition by routing decisions through human-readable adjective concepts.

🔗 Paper: https://www.sciencedirect.com/science/article/pii/S030645732500250X

Repository index

Adjective tools
- ✨ Adjective generation: AdjectiveSetGeneration
- 📖 Adjective definitions: AdjectiveDefinition
SCBM representations and models
- 🦙 LLaMA-based feature extraction: Llama
- 🎯 SCBM and SCBM-T training: SCBM(T)
- 🧪 Prompt/persona sensitivity: prompt-sensitibity
Baselines and zero-shot
- ⚡ Transformer baselines: Transformers_baseline
- 🔎 Zero-shot (OpenAI/LLaMA): zero-shot-evaluation
🧩 ICL & CoT experiments: ICL & CoT experiments
🗂️ Datasets overview: Tasks

Quickstart

Follow these steps to reproduce the main pipeline end-to-end.

Install dependencies:

pip install -r requirements.txt

Compute SCBM features with LLaMA (writes .pickle next to CSV):

set -x HF_USER your-username
set -x HF_TOKEN your-token
python Llama/main.py \
  --input_files ./Tasks/germeval/train.csv \
  --use_context false \
  --adjectives_file ./AdjectiveSetGeneration/adjectives.csv

Train SCBM variants:

# SCBM (HS_CS)
python "SCBM(T)/SCBM.py" \
  --train_file_name ./Tasks/hs_cs/train.csv \
  --test_file_name ./Tasks/hs_cs/test.csv \
  --use_regularization false \
  --output_dir ./SCBM(T)

Model Architecture

SCBM is designed for hate and counter speech recognition by integrating human-interpretable adjective-based concepts as a bottleneck layer between input text and classification.

SCBM leverages adjectives-based representation as semantically meaningful bottleneck concepts derived probabilistically from LLMs, then classifies texts via a transparent, lightweight classifier that learns to prioritize key adjectives. This results in competitive hate and counter speech recognition performance with strong interpretability compared to black-box transformer models.

Results

We summarize quantitative performance across datasets and show qualitative explanation examples.

Overall performance

Performance of all explored approaches in our paper across all employed datasets in terms of macro-$F_1$ score. The best-performing approach in each category is highlighted in italics, and the best-performing approach per dataset is highlighted in bold.

	Method		Dataset
	Method		GermEval	ELF22	HS-CS	CONAN	TSNH
	Random		0.488	0.515	0.347	0.109	0.503
I	SVM		0.648_±0.000	0.553_±0.000	0.426_±0.000	0.364_±0.000	0.696_±0.007
	LR		0.586_±0.000	0.556_±0.000	0.413_±0.000	0.322_±0.000	0.693_±0.007
	RF		0.535_±0.009	0.531_±0.027	0.323_±0.014	0.259_±0.005	0.689_±0.005
	GB		0.571_±0.002	0.547_±0.027	0.374_±0.008	0.368_±0.005	0.668_±0.008
	MLP		0.648_±0.003	0.542_±0.010	0.398_±0.003	0.386_±0.011	0.672_±0.005
II	SVM	Llama 2	0.695_±0.029	0.356_±0.000	0.504_±0.000	0.593_±0.000	0.637_±0.090
	SVM	Llama 3.1	0.779_±0.000	0.669_±0.000	0.577_±0.000	0.602_±0.000	0.724_±0.010
	LR	Llama 2	0.693_±0.029	0.356_±0.000	0.504_±0.000	0.593_±0.000	0.646_±0.093
	LR	Llama 3.1	0.777_±0.000	0.671_±0.000	0.577_±0.000	0.602_±0.000	0.723_±0.009
	RF	Llama 2	0.689_±0.028	0.646_±0.010	0.466_±0.012	0.394_±0.012	0.604_±0.010
	RF	Llama 3.1	0.757_±0.004	0.671_±0.003	0.487_±0.009	0.486_±0.012	0.719_±0.005
	GB	Llama 2	0.729_±0.019	0.561_±0.000	0.500_±0.001	0.481_±0.000	0.642_±0.092
	GB	Llama 3.1	0.766_±0.000	0.577_±0.001	0.562_±0.002	0.534_±0.002	0.721_±0.006
	MLP	Llama 2	0.743_±0.017	0.396_±0.079	0.481_±0.011	0.627_±0.011	0.640_±0.096
	MLP	Llama 3.1	0.762_±0.018	0.654_±0.017	0.556_±0.014	0.618_±0.018	0.728_±0.007
III	XLM-RoBERTa-base		0.747_±0.017	0.645_±0.018	0.524_±0.008	0.729_±0.016	0.747_±0.013
	BERT-base		0.654_±0.040	0.670_±0.008	0.543_±0.004	0.721_±0.022	0.752_±0.022
	XLM-RoBERTa-large		0.786_±0.004	0.680_±0.008	0.572_±0.021	0.746_±0.020	0.781_±0.009
	BERT-large		0.676_±0.014	0.683_±0.009	0.545_±0.011	0.744_±0.007	0.773_±0.008
IV	GPT 3.5		0.686_±0.003	0.469_±0.078	0.247_±0.012	0.291_±0.067	0.508_±0.022
	GPT 4o		0.833_±0.025	0.500_±0.039	0.267_±0.014	0.361_±0.140	0.560_±0.017
	GPT 4o (ICL)		0.854_±0.002	0.651_±0.005	0.390_±0.006	0.763_±0.007	0.642_±0.026
	GPT o3-mini (CoT)		0.666_±0.165	0.606_±0.004	0.301_±0.008	0.542_±0.012	0.503_±0.009
	Llama 3.1		0.700_±0.112	0.510_±0.013	0.270_±0.018	0.203_±0.017	0.438_±0.081
V	HSCBM	Llama 2	0.746_±0.004	0.673_±0.007	0.536_±0.005	0.616_±0.011	0.705_±0.013
	HSCBM	Llama 3.1	0.781_±0.003	0.693_±0.011	0.581_±0.008	0.630_±0.006	0.739_±0.008
	HSCBM-R	Llama 2	0.745_±0.002	0.638_±0.027	0.523_±0.004	0.611_±0.011	0.705_±0.009
	HSCBM-R	Llama 3.1	0.779_±0.002	0.683_±0.006	0.574_±0.010	0.610_±0.008	0.735_±0.008
	HSCBMT	Llama 2	0.766_±0.004	0.658_±0.008	0.542_±0.016	0.723_±0.016	0.709_±0.104
	HSCBMT	Llama 3.1	0.768_±0.009	0.685_±0.012	0.551_±0.013	0.714_±0.016	0.763_±0.011
	HSCBMT-R	Llama 2	0.757_±0.009	0.637_±0.003	0.526_±0.023	0.710_±0.013	0.710_±0.107
	HSCBMT-R	Llama 3.1	0.769_±0.008	0.666_±0.012	0.540_±0.011	1.710_±0.009	0.760_±0.006

Example explanations (HS-CS)

Top-10 most relevant adjectives for individual input samples from each class of the HS-CS dataset provided by SCBM. For comparison, we provide LIME explanations for the same samples generated from the fine-tuned XLM-RoBERTa model.

Training & Evaluation

Transformer baselines are provided in Transformers_baseline/ and operate over the CSVs under Tasks/.

Baselines (train/dev split): run_transformers.py
Baselines (5-fold CV, e.g., TSNH): run_transformers-crossval.py

Examples:

# ELF22 split
python Transformers_baseline/run_transformers.py \
  --train_file ./Tasks/elf22/train.csv \
  --dev_file ./Tasks/elf22/test.csv \
  --output_file ./Transformers_baseline/elf22_baselines.pickle

# TSNH cross-validation
python Transformers_baseline/run_transformers-crossval.py \
  --train_file ./Tasks/tsnh/TSNH_uniform.csv \
  --output_file ./Transformers_baseline/tsnh_cv_baselines.pickle

SCBM Representation Computation

Use Llama/main.py to compute SCBM adjective-probability representations with Llama-3.1-8B-Instruct. The script reads one or more CSVs from Tasks/ and writes a sibling .pickle with id, values (probability vectors), and text (for no-context runs).

Environment (first run clones the model):

set -x HF_USER your-username
set -x HF_TOKEN your-token

Examples:

# No-context (e.g., GermEval)
python Llama/main.py \
  --input_files ./Tasks/germeval/test.csv \
  --use_context false \
  --adjectives_file ./AdjectiveSetGeneration/adjectives.csv \
  --repository meta-llama/Llama-3.1-8B-Instruct \
  --batch_size 244

# Context (e.g., HS_CS)
python Llama/main.py \
  --input_files "[\"./Tasks/hs_cs/train.csv\", \"./Tasks/hs_cs/test.csv\"]" \
  --use_context true \
  --adjectives_file ./AdjectiveSetGeneration/adjectives.csv

Training and Evaluation of SCBM and SCBM-T

SCBM variants live in SCBM(T)/:

SCBM.py: classifier over adjective-probability features
SCBMT.py: text + features fusion variant

These scripts expect the .pickle feature files created by the Llama step (same basename as the CSV, e.g., train.csv.pickle).

Examples:

# SCBM (features only)
python "SCBM(T)/SCBM.py" \
  --train_file_name ./Tasks/hs_cs/train.csv \
  --test_file_name ./Tasks/hs_cs/test.csv \
  --use_regularization false \
  --output_dir ./SCBM(T)

# SCBM-T (text + features)
python "SCBM(T)/SCBMT.py" \
  --train_file_name ./Tasks/hs_cs/train.csv \
  --test_file_name ./Tasks/hs_cs/test.csv \
  --use_regularization false \
  --output_dir ./SCBM(T)

Zero-shot Evaluation on GPT Family

Zero-shot baselines live in zero-shot-evaluation/ and support both OpenAI Chat Completions and local LLaMA-3.1.

Scripts

openai-zero-shot.py: Uses OpenAI Chat Completions. Model is configurable (e.g., gpt-3.5-turbo, chatgpt-4o-latest).
llama-zero-shot.py: Uses a local pipeline for meta-llama/Llama-3.1-8B-Instruct.

Citation

If you use this repository in your research, please cite:

@article{distilling-scbm,
title = {Distilling knowledge from large language models: A concept bottleneck model for hate and counter speech recognition},
journal = {Information Processing & Management},
volume = {63},
number = {2, Part A},
pages = {104309},
year = {2026},
issn = {0306-4573},
doi = {https://doi.org/10.1016/j.ipm.2025.104309},
url = {https://www.sciencedirect.com/science/article/pii/S030645732500250X},
author = {Roberto Labadie-Tamayo and Djordje Slijepčević and Xihui Chen and Adrian Jaques Böck and Andreas Babic and Liz Freimann and Christiane Atzmüller and Matthias Zeppelzauer},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SCBM

Repository index

Table of Contents

Quickstart

Model Architecture

Results

Overall performance

Example explanations (HS-CS)

Training & Evaluation

SCBM Representation Computation

Training and Evaluation of SCBM and SCBM-T

Zero-shot Evaluation on GPT Family

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.devcontainer		.devcontainer
AdjectiveDefinition		AdjectiveDefinition
AdjectiveSetGeneration		AdjectiveSetGeneration
ICL & CoT experiments		ICL & CoT experiments
Llama		Llama
SCBM(T)		SCBM(T)
Tasks		Tasks
Transformers_baseline		Transformers_baseline
assets		assets
prompt-sensitibity		prompt-sensitibity
zero-shot-evaluation		zero-shot-evaluation
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

fhstp/SCBM

Folders and files

Latest commit

History

Repository files navigation

SCBM

Repository index

Table of Contents

Quickstart

Model Architecture

Results

Overall performance

Example explanations (HS-CS)

Training & Evaluation

SCBM Representation Computation

Training and Evaluation of SCBM and SCBM-T

Zero-shot Evaluation on GPT Family

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages