Skip to content

Natural Language Processing (NLP) analysis of climate-related corporate disclosures using ClimateBERT and fine-tuned models. πŸ“ŠπŸŒ

License

Notifications You must be signed in to change notification settings

villacampaporta/climateBERT-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

19 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🌍 Climate Disclosure Analysis with NLP

This repository contains the code and data for the paper:
"Impact of EU Non-Financial Reporting Regulation on Spanish Companies’ Environmental Disclosure: A Cutting-Edge Natural Language Processing Approach"
Published in Environmental Sciences Europe
πŸ“„ Read the Paper

πŸ“Œ Overview

This study investigates the impact of EU regulations on environmental disclosure quality among Spanish companies from 2015 to 2022. We leverage Natural Language Processing (NLP), specifically ClimateBERT and a fine-tuned version on ClimaText, to analyze 729 corporate sustainability reports.

The study reveals how mandatory disclosure requirements improve specificity and commitment in climate risk reporting, aligning companies with international sustainability frameworks.

πŸ— Repository Structure

climateBERT-analysis/
│── Climate_Disclosure_Analysis.ipynb  # Main Jupyter Notebook
│── data/  # Folder for input reports (not included)
│── results/  # Folder for generated outputs (e.g., graphs, tables)
│── requirements.txt  # Python dependencies
│── LICENSE  # Repository license
│── README.md  # This file
│── Paper.pdf  # Optional: Paper PDF
  • πŸ“ data/ – (Not included) Place corporate reports here before running the notebook.
  • πŸ“ results/ – Stores generated figures and tables from the analysis.
  • πŸ“„ Climate_Disclosure_Analysis.ipynb – The core analysis notebook.
  • πŸ“„ requirements.txt – Required dependencies for running the notebook.

πŸš€ Getting Started

1️⃣ Clone this repository

git clone https://github.com/villacampaporta/climateBERT-analysis.git
cd climateBERT-analysis

2️⃣ Install dependencies

Ensure you have Python 3.8+ installed. Then, install the required packages:

pip install -r requirements.txt

3️⃣ Prepare the data

  • Download corporate sustainability reports from:
  • Place PDF reports inside the data/ folder.

4️⃣ Run the Jupyter Notebook

Start Jupyter and open Climate_Disclosure_Analysis.ipynb:

jupyter notebook

Run the notebook step by step to reproduce the analysis.


βš™ Features & Methods

βœ… Natural Language Processing (NLP)

  • Transformers-based models:
  • Text extraction & preprocessing:
    • PyPDF2, pytesseract, wand, Pillow for PDF processing
    • nltk, transformers, scikit-learn for NLP

πŸ“Š Analysis Highlights

  • Comparing voluntary vs. mandatory disclosure (2015–2022)
  • Measuring commitment, specificity, and neutrality in reports
  • Assessing alignment with international sustainability standards

πŸ“Š Example Results

πŸ“Œ How has environmental disclosure changed under regulation?

Year Average Commitment Score Reports Analyzed
2015 2.1 75
2018 3.4 110
2022 4.8 130

πŸ“Œ Model Performance: ClimateBERT vs. Fine-Tuned Model

Model Accuracy Commitment Sensitivity
ClimateBERT 85.2% Medium
Fine-Tuned Model 91.5% High

πŸ”₯ Reproducibility

To replicate this study:

  1. Use Python 3.8+ and install requirements.txt
  2. Place reports in data/
  3. Run Climate_Disclosure_Analysis.ipynb in Jupyter
  4. Analyze the outputs in results/

πŸ“ Citation

If you use this code or dataset, please cite:

@article{VillacampaPorta2025,
  author    = {Javier Villacampa-Porta and MarΓ­a Coronado-Vaca and Eduardo C. Garrido-MerchΓ‘n},
  title     = {Impact of EU Non-Financial Reporting Regulation on Spanish Companies’ Environmental Disclosure: A Cutting-Edge Natural Language Processing Approach},
  journal   = {Environmental Sciences Europe},
  year      = {2025},
  doi       = {10.1186/s12302-025-01067-z}
}

πŸ“œ License

This repository is licensed under the MIT License. See LICENSE for details.


🀝 Contributing

Feel free to contribute! Open an issue or pull request if you have suggestions for improvements.


πŸ“¬ Contact

For questions or collaboration inquiries, contact:
πŸ“§ [Email] πŸ”— [LinkedIn]


πŸš€ Ready to Explore Climate Disclosures with AI? Let's Get Started!

About

Natural Language Processing (NLP) analysis of climate-related corporate disclosures using ClimateBERT and fine-tuned models. πŸ“ŠπŸŒ

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published