OpenSet-AdaTime

Unified benchmark for closed-set, OSDA, PDA and UniDA domain adaptation on time-series Human Activity Recognition (HAR). Combines re-implemented scenario-native methods (OSBP, TSFA, SPADA, PDAAN) with seven UniDA methods (UDA, OVANet, DANCE, PPOT, UniOT, UniJDOT, RAINCOAT) and eighteen closed-set baselines under a single training/evaluation protocol, plus a hardest-first curriculum sweep over private-class counts.

1. Requirements

The codebase is developed against the following pinned versions; matching them exactly is the safest bet, anything reasonably close should also work.

Package	Version
python	3.9
torch	2.7.1+cu118
torchvision	0.22.1+cu118
torchmetrics	1.8.2
numpy	2.0.2
pandas	2.3.3
scipy	1.13.1
scikit-learn	1.6.1
scikit-image	0.24.0
matplotlib	3.9.4
seaborn	0.13.2
POT (`ot`)	0.9.6
optuna	4.8.0
mlflow	3.1.4
tqdm	4.67.1

Start an MLflow server first

Every entry-point (main.py, main_sweep.py, run_curriculum.py, extract_best_hparams.py) logs to MLflow over HTTP at http://127.0.0.1:5001. Launch the tracking server before training:

mlflow server --host 127.0.0.1 --port 5001

If you want a different port, change it both on the server command line and in the mlflow.set_tracking_uri(...) call at the top of main.py, main_sweep.py, run_curriculum.py and extract_best_hparams.py.

2. Datasets

Three public HAR datasets are used (we resample everything to 50 Hz and window at 150 samples = 3 s):

Dataset	Subjects	Sensor location	Notes
RealWorld	15	waist (accel + gyro)	~41,900 windows. Split into `_male` / `_female`
Pamap2	9	wrist + chest + ankle	~19,100 windows after dropping the "transient" class
MHEALTH	10	chest + ankle + wrist	~4,600 windows

The four classes shared by all three datasets — lying / sitting / walking / running — form the closed-set label space; every other class is treated as private to its source or target in OSDA/PDA/UniDA.

Preprocessing

The raw → windowed pipeline lives in preprocessing/:

preprocessing/raw_data_processing.py — per-dataset parsers (downloads, resampling, segmentation).
preprocessing/run_datasets.py — driver: builds *_processed.pkl files ready for the dataloader.
preprocessing/split_realworld_by_gender.py — produces RealWorld_male_processed.pkl and RealWorld_female_processed.pkl for the RealWorld male→female protocol used in the hyper-parameter sweep.

Run them once to produce the *_processed.pkl files; everything downstream expects them at --data_path (default ../dataset).

3. Algorithms

All algorithms live in algorithms/algorithms.py. Each is a subclass of Algorithm and declares which scenario(s) it covers via the SCENARIO attribute (used by the --scenario shortcut in main.py).

Family	Methods
Closed-set	NO_ADAPT, TARGET_ONLY, DANN, CDAN, DDC, Deep_Coral, DSAN, HoMM, MMDA, DIRT, AdvSKM, DAAN, CoDATS, CoTMix, CLUDA, SASA, SSSS_TSA, SWL_Adapt, ACON, uDAR
OSDA-native	OSBP, TSFA
PDA-native	SPADA, PDAAN
UniDA	UDA, OVANet, DANCE, PPOT, UniOT, UniJDOT, RAINCOAT

Backbones live in models/models.py; default is FNO (Fourier Neural Operator), CNN is also available via --backbone CNN.

4. Training procedure

The full pipeline is sweep → extract → train → curriculum → plot.

4.1 Train a single configuration

main.py runs one (source, target, algorithm, scenario) configuration with --num_runs random seeds and logs everything to MLflow.

# One method on one pair:
python main.py \
    --source_dataset RealWorld \
    --target_dataset Pamap2 \
    --da_method UniJDOT \
    --scenario UniDA \
    --backbone FNO \
    --num_runs 5 \
    --exp_name EXP1

# Or run every method registered for a scenario:
python main.py --source_dataset RealWorld --target_dataset Pamap2 \
               --scenario OSDA --da_method ALL --num_runs 5

Hyper-parameters for the training run come from configs/hparams.py (get_hparams_class(source_dataset, backbone)), which is the only place main.py / run_curriculum.py look. best_hparams.json (see §4.2) is a sweep artifact — you have to copy its values into configs/hparams.py yourself for them to take effect.

4.2 Hyper-parameter sweep + extraction

We pick hyper-parameters once, on the RealWorld male → female within-dataset split, then reuse them for every cross-dataset run. main_sweep.py runs Optuna trials and extract_best_hparams.py distills the best trial into best_hparams.json.

# Sweep one method (50 Bayesian trials, 3 seeds each):
python main_sweep.py \
    --source_dataset RealWorld_male \
    --target_dataset RealWorld_female \
    --scenario UniDA \
    --da_method UniJDOT \
    --num_runs 3 \
    --num_sweeps 50 \
    --hp_search_strategy bayes \
    --metric_to_minimize H_score \
    --exp_name sweep_unijdot

# Extract the best trial for that algorithm/scenario into best_hparams.json:
python extract_best_hparams.py \
    --exp_name sweep_unijdot \
    --source_dataset RealWorld_male \
    --target_dataset RealWorld_female

best_hparams.json is not read by main.py / run_curriculum.py; it is a sweep artifact you inspect and then copy the relevant values into configs/hparams.py (under alg_hparams[<method>] for the matching source dataset class) before re-running.

4.3 Curriculum runs over private-class counts

run_curriculum.py adds the top-n hardest private classes (as ranked by FNO mean-cosine distance to the source-known prototypes) and runs one training experiment per n. One CLI invocation = one process, so multiple n values can be fanned out by a shell loop.

# OSDA: 4 hardest target-privates added to the target side, 5 seeds.
python run_curriculum.py \
    --source_dataset RealWorld \
    --target_dataset Pamap2 \
    --scenario OSDA --strategy hard --n_unknown 4 \
    --da_method UniJDOT --backbone FNO --num_runs 5

Scenarios:

OSDA — grows the target side by n target-private classes.
PDA — grows the source side by n source-private classes.
UniDA — does both, with the same n per side.

Ranking JSONs are read from feature_distance_4known_fno_mean/ (OSDA target ranking) and feature_distance_4known_fno_mean_pda/ (PDA source ranking).

The example shell wrappers under bashes_and_logs/run_curriculum_*.sh show how a full sweep over all six pairs × all methods × all n is launched.

4.4 Aggregating + plotting results

Once the MLflow store has the runs, dump them to CSV and plot:

# Export every run's metrics + params into analysis/runs.csv.
python plotting/export_mlruns_csv.py

# Parse stdout training logs in experiments_logs/ into a per-step CSV
# (used for loss-curve plots).
python plotting/parse_training_logs.py

Visualizations (all write to figures/):

# H-score / OS* / UNK / F1 curves vs. n, per pair (2x3 grid):
python plotting/plot_curves.py --experiment OSDA --metric H_score
python plotting/plot_curves.py --experiment UniDA --metric H_score
python plotting/plot_curves.py --experiment PDA  --metric target_f1

# Bar versions of the same (closed-set has no n axis):
python plotting/plot_bars.py --experiment closed_set --metric target_f1
python plotting/plot_bars.py --experiment UniDA      --metric H_score

# Curriculum-order chip figure (hardest → easiest per pair):
python plotting/plot_curriculum_chips.py

# Per-method loss-curve diagnostics:
python plotting/plot_loss_curves.py --preset ppot_collapse
python plotting/plot_loss_curves.py --preset raincoat_phases

# Train-duration breakdown + t-SNE feature visualization:
python plotting/plot_train_duration.py
python plotting/make_tsne.py --source_dataset RealWorld --target_dataset Pamap2 \
                             --da_method UniJDOT --scenario UniDA --n_unknown 4

# LaTeX result tables for the thesis appendix:
python plotting/make_latex_tables.py        # writes tables/*.tex + extended_results.tex

5. Repository layout

algorithms/                 — every DA algorithm (one file)
configs/                    — per-dataset / per-hparam configs + sweep spaces
dataloader/                 — windowed-data loaders
models/                     — CNN / FNO backbones, RAINCOAT + TSFA blocks
preprocessing/              — raw HAR → windowed .pkl pipeline
trainers/                   — Trainer (single run) + sweep Trainer
analysis/                   — runs.csv + training_logs.csv + EDA outputs
plotting/                   — every plot/export/aggregation script
feature_distance_4known_fno_mean/      — OSDA ranking JSON
feature_distance_4known_fno_mean_pda/  — PDA ranking JSON
figures/                    — generated plots
tables/                     — generated LaTeX tables
bashes_and_logs/            — all sweep wrappers + their stdout logs
main.py                     — single-config trainer
main_sweep.py               — Optuna hyper-parameter sweep
extract_best_hparams.py     — sweep → best_hparams.json
run_curriculum.py           — n-private-class curriculum runner
best_hparams.json           — sweep artifact (not auto-loaded; copy into configs/hparams.py)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenSet-AdaTime

1. Requirements

Start an MLflow server first

2. Datasets

Preprocessing

3. Algorithms

4. Training procedure

4.1 Train a single configuration

4.2 Hyper-parameter sweep + extraction

4.3 Curriculum runs over private-class counts

4.4 Aggregating + plotting results

5. Repository layout

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
__pycache__		__pycache__
algorithms		algorithms
analysis		analysis
bashes_and_logs		bashes_and_logs
configs		configs
dataloader		dataloader
feature_distance_4known_fno_mean		feature_distance_4known_fno_mean
feature_distance_4known_fno_mean_pda		feature_distance_4known_fno_mean_pda
figures		figures
mlruns		mlruns
models		models
plotting		plotting
preprocessing		preprocessing
similarity_comparison		similarity_comparison
tables		tables
trainers		trainers
tsne_plots/RealWorld_to_Pamap2_OSDA_hard_n3_fno_mean/OSBP		tsne_plots/RealWorld_to_Pamap2_OSDA_hard_n3_fno_mean/OSBP
--no-rebase		--no-rebase
.gitignore		.gitignore
README.md		README.md
audit_mlruns.py		audit_mlruns.py
best_hparams.json		best_hparams.json
data_pre_processing.py		data_pre_processing.py
extract_best_hparams.py		extract_best_hparams.py
load_data.py		load_data.py
main.py		main.py
main_sweep.py		main_sweep.py
run_curriculum.py		run_curriculum.py
run_curriculum_mhealth_realworld_mean.sh		run_curriculum_mhealth_realworld_mean.sh
self_har_utilities.py		self_har_utilities.py
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

OpenSet-AdaTime

1. Requirements

Start an MLflow server first

2. Datasets

Preprocessing

3. Algorithms

4. Training procedure

4.1 Train a single configuration

4.2 Hyper-parameter sweep + extraction

4.3 Curriculum runs over private-class counts

4.4 Aggregating + plotting results

5. Repository layout

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages