Skip to content
/ SCBM Public

Official repository for the paper "Distilling Knowledge from Large Language Models: A Concept Bottleneck Model for Hate and Counter Speech Recognition", in which we propose Speech Concept Bottleneck Models (SCBMs), a novel approach for automated hate and counter speech recognition.

License

Notifications You must be signed in to change notification settings

fhstp/SCBM

Repository files navigation

SCBM

Official repository for the paper "Distilling Knowledge from Large Language Models: A Concept Bottleneck Model for Hate and Counter Speech Recognition", in which we propose Speech Concept Bottleneck Models (SCBMs), a novel approach for automated hate and counter speech recognition.

✨ SCBM brings interpretability to hate and counter speech recognition by routing decisions through human-readable adjective concepts.

🔗 Paper: https://www.sciencedirect.com/science/article/pii/S030645732500250X

Repository index

Table of Contents

  1. Quickstart
  2. Model Architecture
  3. Results
  4. Training & Evaluation
  5. SCBM Representation Computation
  6. Training and Evaluation of SCBM and SCBM-T
  7. Zero-shot Evaluation on GPT Family
  8. Citation

Quickstart

Follow these steps to reproduce the main pipeline end-to-end.

  1. Install dependencies:
pip install -r requirements.txt
  1. Compute SCBM features with LLaMA (writes .pickle next to CSV):
set -x HF_USER your-username
set -x HF_TOKEN your-token
python Llama/main.py \
  --input_files ./Tasks/germeval/train.csv \
  --use_context false \
  --adjectives_file ./AdjectiveSetGeneration/adjectives.csv
  1. Train SCBM variants:
# SCBM (HS_CS)
python "SCBM(T)/SCBM.py" \
  --train_file_name ./Tasks/hs_cs/train.csv \
  --test_file_name ./Tasks/hs_cs/test.csv \
  --use_regularization false \
  --output_dir ./SCBM(T)

Model Architecture

SCBM is designed for hate and counter speech recognition by integrating human-interpretable adjective-based concepts as a bottleneck layer between input text and classification.

alt text

SCBM leverages adjectives-based representation as semantically meaningful bottleneck concepts derived probabilistically from LLMs, then classifies texts via a transparent, lightweight classifier that learns to prioritize key adjectives. This results in competitive hate and counter speech recognition performance with strong interpretability compared to black-box transformer models.

Results

We summarize quantitative performance across datasets and show qualitative explanation examples.

Overall performance

Performance of all explored approaches in our paper across all employed datasets in terms of macro-$F_1$ score. The best-performing approach in each category is highlighted in italics, and the best-performing approach per dataset is highlighted in bold.

Method Dataset
GermEval ELF22 HS-CS CONAN TSNH
Random 0.488 0.515 0.347 0.109 0.503
I SVM 0.648±0.000 0.553±0.000 0.426±0.000 0.364±0.000 0.696±0.007
LR 0.586±0.000 0.556±0.000 0.413±0.000 0.322±0.000 0.693±0.007
RF 0.535±0.009 0.531±0.027 0.323±0.014 0.259±0.005 0.689±0.005
GB 0.571±0.002 0.547±0.027 0.374±0.008 0.368±0.005 0.668±0.008
MLP 0.648±0.003 0.542±0.010 0.398±0.003 0.386±0.011 0.672±0.005
II SVM Llama 2 0.695±0.029 0.356±0.000 0.504±0.000 0.593±0.000 0.637±0.090
Llama 3.1 0.779±0.000 0.669±0.000 0.577±0.000 0.602±0.000 0.724±0.010
LR Llama 2 0.693±0.029 0.356±0.000 0.504±0.000 0.593±0.000 0.646±0.093
Llama 3.1 0.777±0.000 0.671±0.000 0.577±0.000 0.602±0.000 0.723±0.009
RF Llama 2 0.689±0.028 0.646±0.010 0.466±0.012 0.394±0.012 0.604±0.010
Llama 3.1 0.757±0.004 0.671±0.003 0.487±0.009 0.486±0.012 0.719±0.005
GB Llama 2 0.729±0.019 0.561±0.000 0.500±0.001 0.481±0.000 0.642±0.092
Llama 3.1 0.766±0.000 0.577±0.001 0.562±0.002 0.534±0.002 0.721±0.006
MLP Llama 2 0.743±0.017 0.396±0.079 0.481±0.011 0.627±0.011 0.640±0.096
Llama 3.1 0.762±0.018 0.654±0.017 0.556±0.014 0.618±0.018 0.728±0.007
III XLM-RoBERTa-base 0.747±0.017 0.645±0.018 0.524±0.008 0.729±0.016 0.747±0.013
BERT-base 0.654±0.040 0.670±0.008 0.543±0.004 0.721±0.022 0.752±0.022
XLM-RoBERTa-large 0.786±0.004 0.680±0.008 0.572±0.021 0.746±0.020 0.781±0.009
BERT-large 0.676±0.014 0.683±0.009 0.545±0.011 0.744±0.007 0.773±0.008
IV GPT 3.5 0.686±0.003 0.469±0.078 0.247±0.012 0.291±0.067 0.508±0.022
GPT 4o 0.833±0.025 0.500±0.039 0.267±0.014 0.361±0.140 0.560±0.017
GPT 4o (ICL) 0.854±0.002 0.651±0.005 0.390±0.006 0.763±0.007 0.642±0.026
GPT o3-mini (CoT) 0.666±0.165 0.606±0.004 0.301±0.008 0.542±0.012 0.503±0.009
Llama 3.1 0.700±0.112 0.510±0.013 0.270±0.018 0.203±0.017 0.438±0.081
V HSCBM Llama 2 0.746±0.004 0.673±0.007 0.536±0.005 0.616±0.011 0.705±0.013
Llama 3.1 0.781±0.003 0.693±0.011 0.581±0.008 0.630±0.006 0.739±0.008
HSCBM-R Llama 2 0.745±0.002 0.638±0.027 0.523±0.004 0.611±0.011 0.705±0.009
Llama 3.1 0.779±0.002 0.683±0.006 0.574±0.010 0.610±0.008 0.735±0.008
HSCBMT Llama 2 0.766±0.004 0.658±0.008 0.542±0.016 0.723±0.016 0.709±0.104
Llama 3.1 0.768±0.009 0.685±0.012 0.551±0.013 0.714±0.016 0.763±0.011
HSCBMT-R Llama 2 0.757±0.009 0.637±0.003 0.526±0.023 0.710±0.013 0.710±0.107
Llama 3.1 0.769±0.008 0.666±0.012 0.540±0.011 1.710±0.009 0.760±0.006

Example explanations (HS-CS)

Top-10 most relevant adjectives for individual input samples from each class of the HS-CS dataset provided by SCBM. For comparison, we provide LIME explanations for the same samples generated from the fine-tuned XLM-RoBERTa model.

alt text

Training & Evaluation

Transformer baselines are provided in Transformers_baseline/ and operate over the CSVs under Tasks/.

  • Baselines (train/dev split): run_transformers.py
  • Baselines (5-fold CV, e.g., TSNH): run_transformers-crossval.py

Examples:

# ELF22 split
python Transformers_baseline/run_transformers.py \
  --train_file ./Tasks/elf22/train.csv \
  --dev_file ./Tasks/elf22/test.csv \
  --output_file ./Transformers_baseline/elf22_baselines.pickle

# TSNH cross-validation
python Transformers_baseline/run_transformers-crossval.py \
  --train_file ./Tasks/tsnh/TSNH_uniform.csv \
  --output_file ./Transformers_baseline/tsnh_cv_baselines.pickle

SCBM Representation Computation

Use Llama/main.py to compute SCBM adjective-probability representations with Llama-3.1-8B-Instruct. The script reads one or more CSVs from Tasks/ and writes a sibling .pickle with id, values (probability vectors), and text (for no-context runs).

Environment (first run clones the model):

set -x HF_USER your-username
set -x HF_TOKEN your-token

Examples:

# No-context (e.g., GermEval)
python Llama/main.py \
  --input_files ./Tasks/germeval/test.csv \
  --use_context false \
  --adjectives_file ./AdjectiveSetGeneration/adjectives.csv \
  --repository meta-llama/Llama-3.1-8B-Instruct \
  --batch_size 244

# Context (e.g., HS_CS)
python Llama/main.py \
  --input_files "[\"./Tasks/hs_cs/train.csv\", \"./Tasks/hs_cs/test.csv\"]" \
  --use_context true \
  --adjectives_file ./AdjectiveSetGeneration/adjectives.csv

Training and Evaluation of SCBM and SCBM-T

SCBM variants live in SCBM(T)/:

  • SCBM.py: classifier over adjective-probability features
  • SCBMT.py: text + features fusion variant

These scripts expect the .pickle feature files created by the Llama step (same basename as the CSV, e.g., train.csv.pickle).

Examples:

# SCBM (features only)
python "SCBM(T)/SCBM.py" \
  --train_file_name ./Tasks/hs_cs/train.csv \
  --test_file_name ./Tasks/hs_cs/test.csv \
  --use_regularization false \
  --output_dir ./SCBM(T)

# SCBM-T (text + features)
python "SCBM(T)/SCBMT.py" \
  --train_file_name ./Tasks/hs_cs/train.csv \
  --test_file_name ./Tasks/hs_cs/test.csv \
  --use_regularization false \
  --output_dir ./SCBM(T)

Zero-shot Evaluation on GPT Family

Zero-shot baselines live in zero-shot-evaluation/ and support both OpenAI Chat Completions and local LLaMA-3.1.

Scripts

  • openai-zero-shot.py: Uses OpenAI Chat Completions. Model is configurable (e.g., gpt-3.5-turbo, chatgpt-4o-latest).
  • llama-zero-shot.py: Uses a local pipeline for meta-llama/Llama-3.1-8B-Instruct.

Citation

If you use this repository in your research, please cite:

@article{distilling-scbm,
title = {Distilling knowledge from large language models: A concept bottleneck model for hate and counter speech recognition},
journal = {Information Processing & Management},
volume = {63},
number = {2, Part A},
pages = {104309},
year = {2026},
issn = {0306-4573},
doi = {https://doi.org/10.1016/j.ipm.2025.104309},
url = {https://www.sciencedirect.com/science/article/pii/S030645732500250X},
author = {Roberto Labadie-Tamayo and Djordje Slijepčević and Xihui Chen and Adrian Jaques Böck and Andreas Babic and Liz Freimann and Christiane Atzmüller and Matthias Zeppelzauer},
}

About

Official repository for the paper "Distilling Knowledge from Large Language Models: A Concept Bottleneck Model for Hate and Counter Speech Recognition", in which we propose Speech Concept Bottleneck Models (SCBMs), a novel approach for automated hate and counter speech recognition.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •