This guide will walk you through the steps required to use the benchmark package, including setting up environments, configuring datasets, and properly updating variables to execute desired benchmarks.
Before using the benchmark package, ensure you have the correct environment set up. You can create and activate a conda environment as follows:
- Install Conda.
- Create a new environment and install necessary dependencies.
Example:
conda create --name benchmark_env python=3.8
conda activate benchmark_env
pip install -r requirements.txtEnsure all required libraries for benchmarking are listed in the requirements.txt file.
Moreover, the required moa.jar file is available at DropBox. Please put it at the root of this package or modify the jar path in the experiment files.
The datasets are downloadable at DropBox.
Make sure that all the required datasets are available in the benchmarking_datasets directory. The datasets you want to use for benchmarking should be placed inside this directory. The structure should be as follows:
benchmarking_datasets/classification/
dataset1.csv
dataset2.csv
...
Ensure that the datasets are properly formatted and ready for use.
After placing the datasets in the benchmarking_datasets directory, update the dataset_path variables in the benchmark script to correctly point to the respective dataset files.
Ensure that the path variables are updated for each benchmark script to refer to the correct dataset.
Different machine learning libraries may have different learner (algorithm) names. Update the learner variables in the benchmarking script based on the specific library you are using.
For example:
- For
scikit-learn, useRandomForestClassifier. - For
river, useAdaptiveRandomForestClassifier.
Make sure the correct learner name is assigned in each script for consistency.
Each learner may have different parameter combinations for benchmarking. The arguments variable should be updated to reflect these combinations in the format:
'learner name' : [list of different parameter combinations]
Example:
'RandomForestClassifier': [
{'n_estimators': 100, 'max_depth': 10},
{'n_estimators': 200, 'max_depth': 20}
]
Ensure that you list all relevant combinations for the learners you are benchmarking.
To execute only the desired benchmarks, you will need to modify the __name__ function in the script. Update it to only include the learners, datasets, and arguments you wish to run.
This ensures that only the specific benchmarks are executed when the script is run, preventing unnecessary or irrelevant executions.
Following these steps will ensure that your benchmarking environment, datasets, learners, and parameters are properly configured, allowing you to run the desired benchmarks efficiently. Make sure to revisit and update these variables as needed for different benchmarking scenarios.
This Markdown file should be clear and structured for users to follow along with your benchmarking package setup and usage.