Investigating work relating to learning across times scales, temporal abstraction, and event modeling.
Clone and navigate to the project:
git clone https://github.com/APRashedAhmed/time-scales.git
cd time-scalesInstall the conda environment, any extra dependencies, and the repo:
# Install the desired base environment
./run install conda-envs/ts102u2.yaml --dev --jupyterCreate a softlink to data_cifs and weights:
cd time-scales
ln -s /media/data_cifs/projects/prj_timescales/arasheda/data .
ln -s /media/data_cifs/projects/prj_timescales/arasheda/weights .
The repo uses the bouncing ball task repo to generate bouncing ball sequences. To install it as a package:
# Make sure the environment is activated
conda activate ts102u2
# Clone the repo
git clone git@github.com:APRashedAhmed/Bouncing-Ball-Task.git
# Navigate into the directory
cd Bouncing-Ball-Task
# Install the package
pip install -e .[OPTIONAL] Create env file with environment variables that aren't tracked by git. Start by copying the template and add in relevant information:
$ cp .env.template .env
$ more .env
# This is a template of the file that can be used for storing private and user
# specific environment variables, like keys or system paths. By default .env
# will be excluded from version control the variables declared in .env are
# loaded automatically in run.py
NEPTUNE_API_TOKEN=""
SLACK_WEBHOOK_URL=""Train model with default configuration
# Default
./run experiment
# Train using GPU 3
./run experiment --cuda 3
# Train using both GPU 5 and 6
./run experiment --cuda 5,6 -a trainer.gpus=2
# Rerun the experiment 10 times
./run experiment -n 10Train model with chosen experiment configuration from configs/experiment/
./run experiment -a experiment=experiment_nameYou can override any parameter from command line like this
./run experiment -a trainer.max_epochs=20 datamodule.batch_size=64To see more examples of how to use the run experiment API, run with the -e argument:
./run experiment -eSince the project was installed, all its components are importable:
# Import the top-level module
import timescales as ts
# Import a specific model
from timescales.models import LinearDecoder
# Import a specific datamodule
from timescales.datamodules import SAYCamDataModuleSome portions of the project were done using Jupyter notebooks. To include the
jupyter packages, run the install script with the -j or --jupyter flags:
./run install <path_to_yaml> --jupyterTo install the dependencies manually, run the following command to update the existing conda environment:
conda env update -f conda-envs/jupyter.yamlSee the README file in directories with jupyter notebooks (workbooks for example) for more details.
Additional requirements are necessary for development. To include the development
packages, run the install script with the -d or --dev flags:
./run install <path_to_yaml> --devTo install the dependencies manually, run the following command to update the existing conda environment:
conda env update -f conda-envs/dev.yamlAnd now install the precommit hooks for the project:
pre-commit installThe minimal workflow is as follows:
- Write your PyTorch Lightning model (see this linear_decoder.py for example)
- Write your PyTorch Lightning datamodule (see saycam.py for example)
- Write your experiment workflow if needed (see train_test.py for example)
- Write your experiment config, containing paths to your model and datamodule (see temporal_classification.py for example)
- Run the experiment with the corresponding config:
./run experiment -a experiment=<experiment_name>
When committing changes, a set of precommit hooks will be run, which broadly check for code oversights and formatting, before executing the commit (see .pre-commit-config.yaml for the full list of hooks used in this project):
# Commit command that passes
$ git commit -m '<commit message>'
Trim Trailing Whitespace.................................................Passed
Debug Statements (Python)................................................Passed
Detect Private Key.......................................................Passed
Check Yaml...............................................................Passed
Check for merge conflicts................................................Passed
Fix End of Files.........................................................Passed
black....................................................................Passed
isort....................................................................Passed
[<branch> <commit hash>] <commit message>In the event that a commit does not pass all the hooks, the failing files will be modified to comply with the hook requirements:
# Commit command that has two failures
$ git commit -m '<commit message>'
Trim Trailing Whitespace.................................................Passed
Debug Statements (Python)................................................Passed
Detect Private Key.......................................................Passed
Check Yaml...............................................................Passed
Check for merge conflicts................................................Passed
Fix End of Files.........................................................Passed
black....................................................................Failed
- hook id: black
- files were modified by this hook
reformatted timescales/datamodules/saycam.py
All done! β¨ π° β¨
1 file reformatted, 48 files left unchanged.
isort....................................................................Failed
- hook id: isort
- files were modified by this hook
Fixing timescales/datamodules/video.pyChecking git status confirms the previously staged files have new changes:
$ git status
On branch <branch>
Your branch is up-to-date with 'origin/<branch>'.
Changes to be committed:
(use "git reset HEAD <file>..." to unstage)
modified: timescales/datamodules/saycam.py
modified: timescales/datamodules/video.py
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: timescales/datamodules/saycam.py
modified: timescales/datamodules/video.py
Which need to be re-staged before running the commit again, after which the precommit hooks should pass:
# Stage the files
$ git add timescales/datamodules/saycam.py timescales/datamodules/video.py
# Commit with additional changes that now pass precommit hooks
$ git commit -m '<commit message>'
Trim Trailing Whitespace.................................................Passed
Debug Statements (Python)................................................Passed
Detect Private Key.......................................................Passed
Check Yaml...............................................................Passed
Check for merge conflicts................................................Passed
Fix End of Files.........................................................Passed
black....................................................................Passed
isort....................................................................Passed
[<branch> <commit hash>] <commit message>The hooks can also be run directly without executing a commit:
# Run commit hooks on currently staged files
pre-commit run
# Run commit hooks on all valid files
pre-commit run -aThe repo is structured as the following
βββ .autoenv.template # Template to auto activate a conda environment
β # and enable hydra tab completion
β
βββ conda-envs/ # Directory for conda environments
β βββ cuda10.yaml # CUDA 10 environment
β βββ cuda11.yaml # Identical to above but with CUDA 11
β βββ dev.yaml # Development tools
β
βββ configs/ # Hydra configuration files
β βββ callbacks/ # Callback configs
β βββ datamodule/ # Datamodule configs
β βββ experiment/ # Experiment configs
β βββ logger/ # Logger configs (neptune only for now)
β βββ model/ # Model configs
β βββ trainer/ # Trainer configs
β β
β βββ config.yaml # Main project configuration file
β
βββ data/ # Directory to all data used in the repo
β βββ raw/ # Raw, unaltered data
β βββ interim/ # Data that has not been fully processed
β βββ processed/ # Fully processed and usable data
β
βββ .env.template # Template of file for storing private
β # environment variables
β
βββ .gitignore # List of files/folders ignored by git
βββ LICENSE # Brown software license
β
βββ logs/ # Directory for logs
β βββ multiruns/ # Multirun logs
β βββ runs/ # Single run logs
β
βββ models/ -> weights/ # Softlink to weights directory (legacy)
βββ notebooks/ # Directory for jupyter notebooks. Naming
β # convention is a number (for ordering), the
β # creator's initials, and a short `-` delimited
β # description, e.g. `1.0-apra-initial-data-exploration.ipynb`
β
βββ notes.md # Development notes for the repo
βββ .pre-commit-config.yaml # Config for precommit hooks
βββ pytest.ini # Pytest config file
βββ README.md # README for the repo
β
βββ run.py # Run any pipeline with chosen experiment
β # configuration
β
βββ setup.py # Python setup file for installing the repo
β
βββ timescales/ # Main code (importables) for the repo
β βββ babyvision/ # Code taken from the Lake repo
β βββ base/ # Base classes
β βββ constants.py # Constants
β βββ datamodules/ # Pytorch Lightning datamodules
β βββ experiments/ # Specific experiment workflows
β βββ index.py # Paths to locations outside the importables
β βββ loss.py # Custom loss functions
β βββ metrics.py # Custom metrics
β βββ models/ # Pytorch Lightning models
β βββ tests/ # Unit tests
β βββ utils/ # Utility scripts
β
βββ weights # Directory of saved weights and checkpoints
Hydra creates new working directory for every executed run.
By default, logs have the following structure:
β
βββ logs
β βββ runs # Folder for logs generated from single runs
β β βββ 2021-02-15 # Date of executing run
β β β βββ 16-50-49 # Hour of executing run
β β β β βββ .hydra # Hydra logs
β β β β βββ wandb # Weights&Biases logs
β β β β βββ checkpoints # Training checkpoints
β β β β βββ ... # Any other thing saved during training
β β β βββ ...
β β β βββ ...
β β βββ ...
β β βββ ...
β β
β βββ multiruns # Folder for logs generated from multiruns (sweeps)
β βββ 2021-02-15_16-50-49 # Date and hour of executing sweep
β β βββ 0 # Job number
β β β βββ .hydra # Hydra logs
β β β βββ wandb # Weights&Biases logs
β β β βββ checkpoints # Training checkpoints
β β β βββ ... # Any other thing saved during training
β β βββ 1
β β βββ 2
β β βββ ...
β βββ ...
β βββ ...
β
You can change this structure by modifying paths in main project configuration.