From Images to Insights: Explainable Biodiversity Monitoring with Plain Language Habitat Explanations
An End-to-End Framework for Interpretable Ecological Inference
π¬ Species recognition β Causal analysis of habitat preferences β Human-readable ecological explanations
- π Our paper is accepted to the European Conference on Artificial Intelligence (ECAI 2025) Workshop (AISE 2025).
BioX is an ecological AI pipeline that transforms species images into causal habitat insights. It contains:
- π Species recognition from images (via BioCLIP)
- π Presence & background point collection (via GBIF + Spatial Sampling)
- π¦οΈ Bioclimatic variable extraction (BIO1βBIO19 from WorldClim)
- π Causal graph discovery among climate variables (via NOTEARS)
- π Causal inference on species presence and climate variables (via DoWhy)
- π Explanation generation in nature language (Rule-based or LLMs)
run.py: Run the full pipeline from image to explanation.config.py: Configuration for paths, model params, etc.modules/: All core modulesspecies_recognition.py: Run BioCLIP on input imagesdata_collection.py: Get GBIF data (gbif_extractor.py), background points (background_generator.py), and BIO variables (bioclim_extractor.py).notears_bios.py: Learn causal structure among BIOsbio2p_dowhy.py: Perform causal inference with DoWhyexplain.py: Generate human-readable habitat explanations
utils/logger_setup.py: Unified logging.test/: Just put input image(s) here.requirements.txt: Python dependencies
- Microsoft Windows 11
- Python version: 3.10
- PyTorch version: 2.7.0 (Official Website)
- CUDA version: 11.8
- GPU: NVIDIA RTX A5000
Clone the repo and create a virtual environment:
git clone https://github.com/Yutong-Zhou-cv/BioX.git
cd BioX
conda env create -f environment.yaml
conda activate gta
pip install -r requirements.txtPlace your species image(s) in the test/ folder.
Supported formats: "jpg", "png", "jpeg", "bmp".
This program requires several external datasets. If automatic download fails due to network restrictions, please download them manually as described below.
Download the Natural Earth land mask (ne_10m_land.zip) from: https://naturalearth.s3.amazonaws.com/10m_physical/ne_10m_land.zip
Unzip the files and confirm ne_10m_land.shp is placed under temp/.
Download the BIO variables (wc2.1_2.5m_bio.zip) from: https://geodata.ucdavis.edu/climate/worldclim/2_1/base/
Place all downloaded files and extract all under temp/wc2.1_2.5m_bio/.
In config.py, set the explanation mode:
EXPLAIN_PARMS = {
"explanation_mode": "llm", # "llm" (uses a local LLM via Ollama) or "rule" (template-based)
"llm_model": "llama3.3:70b"
}"llm": Use LLM (e.g., LLaMA3) to generate rich, biologically grounded narratives.
"rule": Use concise, template-based explanations for speed or offline use.
If you want the paper-style narratives, use "llm" and set up Ollama as follows.
Download from https://ollama.com/download and verify installation:
ollama --versionLinux (optional, background service):
sudo systemctl enable --now ollama
systemctl status ollamaThe default in config.py is llama3.3:70b (very large). If your machine is resource-limited, pull a smaller model and update llm_model accordingly.
# Large (example from config)
ollama pull llama3.3:70b
# Smaller alternatives (pick one you can run)
ollama pull llama3.2:3b
ollama pull llama3.1:8bQuick test (should print a response):
ollama run llama3.2:3b "hello"If you want the pipeline to auto-fallback from "llm" to "rule" when Ollama isnβt reachable, add this to your explanation module (e.g., in explain.py):
import json
import os
import socket
import urllib.request
def _ollama_available(host="127.0.0.1", port=11434, http_url="http://127.0.0.1:11434/api/tags", timeout=1.5):
try:
with socket.create_connection((host, port), timeout=timeout):
pass
with urllib.request.urlopen(http_url, timeout=timeout) as r:
if r.status == 200:
_ = json.loads(r.read().decode("utf-8"))
return True
except Exception:
return False
return False
def pick_explanation_mode(explain_parms):
mode = explain_parms.get("explanation_mode", "rule")
if mode == "llm" and not _ollama_available():
print("[Explain] Ollama not detected; falling back to rule-based explanations.")
return "rule"
return modepython -m run.pyIf you use BioX in your research, please cite:
Zhou, Y. & Ryo, M. (2025). From Images to Insights: Explainable Biodiversity Monitoring with Plain Language Habitat Explanations.
arXiv:2506.10559
@article{zhou2025images,
title={From Images to Insights: Explainable Biodiversity Monitoring with Plain Language Habitat Explanations},
author={Zhou, Yutong and Ryo, Masahiro},
journal={arXiv preprint arXiv:2506.10559},
year={2025}
}βIf I have seen further, it is by standing on the shoulders of giants.β β Isaac Newton
This project builds upon a rich ecosystem of open ecological, statistical, and machine-learning tools. We are deeply grateful to the scientific community for the foundational contributions:
@inproceedings{stevens2024bioclip,
title={Bioclip: A vision foundation model for the tree of life},
author={Stevens, Samuel and Wu, Jiaman and Thompson, Matthew J and Campolongo, Elizabeth G and Song, Chan Hee and Carlyn, David Edward and Dong, Li and Dahdul, Wasila M and Stewart, Charles and Berger-Wolf, Tanya and others},
booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
pages={19412--19424},
year={2024}
}- GBIF | Global Biodiversity Information Facility [Website]
- WorldClim | Global climate and weather data [Website]
- Natural Earth | Land [Website]
- DAGs with NO TEARS: Continuous Optimization for Structure Learning, Xun Zheng et al. [Paper] [Code]
@inproceedings{zheng2018dags,
author = {Zheng, Xun and Aragam, Bryon and Ravikumar, Pradeep and Xing, Eric P.},
booktitle = {Advances in Neural Information Processing Systems},
title = {DAGs with NO TEARS: Continuous Optimization for Structure Learning},
year = {2018}
}@misc{sharma2020dowhyendtoendlibrarycausal,
title={DoWhy: An End-to-End Library for Causal Inference},
author={Amit Sharma and Emre Kiciman},
year={2020},
eprint={2011.04216},
archivePrefix={arXiv},
primaryClass={stat.ME},
url={https://arxiv.org/abs/2011.04216},
}