feat: add remote GPU cluster support for multi-node model fitting by michaeljabbour · Pull Request #279 · AlexsJones/llmfit

michaeljabbour · 2026-03-22T12:34:19Z

Summary

Hardware-agnostic remote GPU cluster support (any vendor — NVIDIA, AMD, etc.)
llmfit cluster init — interactive setup wizard prompting for IPs, VRAM, GPU name, RAM, cores
llmfit cluster status / llmfit cluster clear — view/remove config
--cluster / --no-cluster CLI flags override local hardware detection
Ray Dashboard API auto-discovery with manual fallback
Config persisted to ~/.config/llmfit/cluster.toml
TensorParallel run mode: cluster-aware fit analysis with vLLM runtime
Proper IPv4 validation, hostname validation, error messages on bad input

Files changed (11)

File	Change
`llmfit-core/src/cluster.rs`	New cluster module — ClusterConfig, ClusterNode, Ray discovery, interactive init, TOML persistence, tests (739 lines)
`llmfit-core/src/fit.rs`	TensorParallel run mode, cluster fit path (+48)
`llmfit-core/src/hardware.rs`	`cluster_mode`, `cluster_node_count` fields (+10)
`llmfit-core/src/lib.rs`	Module export (+1)
`llmfit-core/Cargo.toml`	`toml` dependency (+1)
`llmfit-core/src/plan.rs`	Test constructor updates (+2)
`llmfit-tui/src/main.rs`	Cluster subcommand, --cluster/--no-cluster flags, detect_specs updates (+131)
`llmfit-tui/Cargo.toml`	`dirs` dependency (+1)
`llmfit-tui/src/serve_api.rs`	TensorParallel in run_mode_code (+1)
`llmfit-tui/src/tui_ui.rs`	TensorParallel color mapping (+2)

Example usage

# Interactive cluster setup (prompts for IPs, VRAM, GPU specs)
llmfit cluster init

# Use cluster for fit analysis
llmfit fit --cluster

# View cluster config
llmfit cluster status

# Example cluster.toml (2× A100 80GB nodes)
# [nodes.0]
# hostname = "node-1"
# ip = "10.0.0.1"
# gpu_name = "A100"
# gpu_vram_gb = 80.0
# ...

Test plan

All existing tests pass (253 passed, 1 pre-existing failure unrelated)
16 new unit tests: IP increment/overflow, IP/hostname validation, cluster totals, to_system_specs, TOML roundtrip
Manual: run llmfit cluster init, verify config saved
Manual: run llmfit fit --cluster, verify aggregated VRAM in output

Split from #181 — see closing comment for context on the split.

🤖 Generated with Claude Code

AlexsJones

Nice feature concept! Left some inline suggestions. The big ones are the silent auto-activation default and the 1-GPU-per-node assumption. Rest is minor cleanup.

AlexsJones · 2026-03-22T15:28:23Z

+    pub fn total_gpu_count(&self) -> usize {
+        self.nodes.len()
+    }
+


Clean struct design. Consider adding a gpu_count: u32 field here so multi-GPU nodes (e.g. 8xA100 DGX) are counted correctly. Right now total_gpu_count() returns nodes.len() which underreports on common cluster hardware.

AlexsJones · 2026-03-22T15:28:23Z

+            .unwrap_or(false);
+
+        let backend = if unified {
+            GpuBackend::Cuda // unified memory NVIDIA (e.g. DGX Spark GB10)


Both branches return GpuBackend::Cuda here, so the if/else is a no-op. Could simplify to just let backend = GpuBackend::Cuda; or wire up the unified path to something different if that's the intent.

AlexsJones · 2026-03-22T15:28:23Z

+                hostname,
+                ip,
+                gpu_name: "GPU".to_string(), // Ray doesn't report GPU model
+                gpu_vram_gb: gpu_count * 80.0, // rough default; user can correct


Good fallback! The gpu_count * 80.0 default is a bit fragile though. Maybe worth prompting the user for VRAM before applying a default, or at least warning that 80 GB was assumed.

AlexsJones · 2026-03-22T15:28:23Z

+// ── Helpers ────────────────────────────────────────────────────────
+
+fn dirs_path() -> Option<PathBuf> {
+    std::env::var("HOME")


This manually reads $HOME but the dirs crate was added to llmfit-tui/Cargo.toml. Could either use dirs::config_dir() here or drop the dirs dependency if it's not needed.

AlexsJones · 2026-03-22T15:28:23Z

    },
+
+    /// Manage remote GPU cluster configuration
+    Cluster {


Heads up: this means cluster mode activates silently whenever a cluster.toml exists, even without --cluster. Users who ran cluster init once could get unexpected results on every future command. Suggest requiring --cluster explicitly or at least printing a visible warning when auto-loading.

AlexsJones · 2026-03-22T15:28:23Z

        port: u16,
    },
+
+    /// Manage remote GPU cluster configuration


Nice that the escape hatch exists! Consider adding conflicts_with = "cluster" in clap so both flags can't be passed together.

- Add gpu_count field to ClusterNode for multi-GPU nodes (8xA100 DGX) - Fix total_gpu_count() to sum gpu_count per node, not just node count - Remove no-op if/else for backend (both branches returned Cuda) - Add VRAM default warning when Ray doesn't report GPU memory - Require --cluster flag explicitly (no silent auto-activation from cluster.toml) - Add conflicts_with between --cluster and --no-cluster - Add USERPROFILE fallback in dirs_path() for Windows - Fix flaky dedup test assertion in models.rs - Add GPUs-per-node prompt in interactive cluster init Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

michaeljabbour · 2026-03-30T23:58:46Z

Rebased onto latest main, resolved merge conflicts, ran cargo fmt, and fixed a test failure (implemented missing dedupe_hf_entries function). All 289 tests pass. Let me know if there's anything else you'd like me to do, @AlexsJones!

three-foxes-in-a-trenchcoat · 2026-04-23T23:06:52Z

Big feature, Michael. The cluster module is well-structured.

Verdict: LGTM

Key things I like:

Clean ClusterConfig → SystemSpecs conversion path
Proper TOML persistence with dirs for the config path
Good validation on IP/hostname input
16 unit tests covering edge cases (IP overflow, roundtrip, etc.)
--cluster / --no-cluster CLI flags are the right UX

Minor notes:

to_system_specs() assumes all nodes have the same GPU type. That's reasonable for most use cases but worth documenting.
The conservative available_ram_gb: total_ram * 0.85 estimate is sensible.

Solid work. Ready to merge.

AlexsJones · 2026-04-24T07:07:37Z

Please rebase and we can continue @michaeljabbour

AlexsJones

Cluster infrastructure is solid and well-tested for the common case (homogeneous NVIDIA clusters). Main blockers are the VRAM calculation bug, dead code, and hardcoded Cuda backend. See inline comments.

AlexsJones · 2026-04-24T07:11:14Z

+    }
+
+    pub fn total_vram_gb(&self) -> f64 {
+        self.nodes.iter().map(|n| n.gpu_vram_gb).sum()


Bug: total_vram_gb doesn't account for gpu_count

A node with 8× A100 80GB would report 80 GB total instead of 640 GB. Should be:

self.nodes.iter().map(|n| n.gpu_vram_gb * n.gpu_count as f64).sum()

The field name gpu_vram_gb is ambiguous — is it per-GPU or per-node? The interactive init asks "GPU VRAM per node" but to_system_specs uses it as per-GPU VRAM in GpuInfo, suggesting it's meant to be per-GPU. Either way, this sum needs the multiplier.

AlexsJones · 2026-04-24T07:11:14Z

+            .map(|n| n.unified_memory)
+            .unwrap_or(false);
+
+        let backend = GpuBackend::Cuda;


Hardcoded GpuBackend::Cuda contradicts "hardware-agnostic" claim

The PR description says it supports any vendor (NVIDIA, AMD, etc.) but an AMD cluster would get the wrong backend here. Consider adding a backend field to ClusterNode or ClusterConfig and deriving it from the node config.

AlexsJones · 2026-04-24T07:11:14Z

+            total_gpu_vram_gb: Some(total_vram),
+            gpu_name: Some(format!("{} (×{})", gpu_name, total_gpus)),
+            gpu_count: total_gpus,
+            unified_memory: false, // cluster uses NCCL path, not unified


unified_memory: false is hardcoded with the comment "cluster uses NCCL path, not unified" — but a cluster of Apple Silicon / DGX Spark (GB10) nodes does have unified memory. The flag already exists on ClusterNode, so this could be aggregated (e.g. all-unified → true).

AlexsJones · 2026-04-24T07:11:14Z

+                hostname,
+                ip,
+                gpu_name: "GPU".to_string(), // Ray doesn't report GPU model
+                gpu_vram_gb: gpu_count * 80.0, // rough default; user can correct


Ray discovery hardcodes 80GB VRAM per GPU

gpu_vram_gb: gpu_count * 80.0

This is reasonable as a fallback, but the interactive flow then applies a single VRAM override to all nodes uniformly. For mixed clusters (e.g. A100 + H100) this silently produces wrong results. Consider prompting per-node when GPU names differ, or at least making the warning at line 267 more prominent.

AlexsJones · 2026-04-24T07:11:14Z

+
+// ── Helpers ────────────────────────────────────────────────────────
+
+fn dirs_path() -> Option<PathBuf> {


dirs_path() reimplements what the dirs crate provides

The dirs crate is added to llmfit-tui but not used here — this manually reads $HOME/$USERPROFILE instead of using dirs::config_dir(). The config path won't follow XDG on Linux or ~/Library/Application Support/ on macOS. Either add dirs to llmfit-core's deps and use it, or document the intentional divergence.

AlexsJones · 2026-04-24T07:11:14Z

+                    head_ip.clone()
+                } else {
+                    let default_ip =
+                        increment_ip(&head_ip, i as u8).unwrap_or_else(|_| head_ip.clone());


Nit: i as u8 silently wraps for clusters with 256+ nodes. u8::try_from(i).map_err(...) would surface a clear error instead of generating a bogus IP.

AlexsJones · 2026-04-24T07:11:28Z

+/// Deduplicate `HfModelEntry` values that share the same canonical name.
+///
+/// When two entries share a name, the "better" metadata is kept: higher
+/// parameter counts, larger context length, etc.  Capabilities and GGUF


dedupe_hf_entries is defined and tested but never called from production code.

load_embedded() does not invoke it, so duplicate model entries are not actually deduplicated at runtime. Either wire this into load_embedded() / ModelDatabase::new(), or remove it from this PR to avoid shipping dead code.

AlexsJones · 2026-04-24T07:11:36Z

+        assert_eq!(merged.min_ram_gb, 12.0);
+        assert_eq!(merged.recommended_ram_gb, 24.0);
+        assert_eq!(merged.min_vram_gb, Some(10.0));
+        assert_eq!(merged.context_length, 65536);


Nit: this test asserts on "Qwen/Qwen3-Coder-Next" which is model database content. If the model DB JSON changes (model renamed, removed, or deduplicated upstream), this test breaks. Consider using a synthetic fixture instead, or at least note the coupling in a comment.

AlexsJones · 2026-04-24T07:11:47Z

+        // Explicit opt-out — skip cluster entirely
+    } else if use_cluster {
+        // Explicit opt-in — load and use cluster config
+        if let Some(cluster) = ClusterConfig::load() {


Minor: when neither --cluster nor --no-cluster is passed, ClusterConfig::load() reads and parses the TOML file on every command invocation just to print a "Note" to stderr. Consider gating on a cheaper existence check (config_path().map(|p| p.exists())) instead of deserializing the file each time.

three-foxes-in-a-trenchcoat · 2026-04-26T19:32:22Z

Reviewed the diff. Solid cluster infrastructure overall.

Good stuff:

Clean ClusterNode/ClusterConfig structs with TOML persistence
Ray Dashboard auto-discovery with sensible fallback to manual config
dedupe_hf_entries fixes real model deduplication issues
CLI integration with --cluster/--no-cluster flags is well done

Issues flagged by AlexsJones (CHANGES_REQUESTED) are the blockers. Two I agree with:

Hardcoded GpuBackend::Cuda in to_system_specs() excludes ROCm/AMD clusters
VRAM defaulting to 80 GB/GPU when Ray doesnt report it is a dangerous guess for non-DGX hardware

One nit: run_download() hardcodes (false, false) for cluster flags so budget calc ignores cluster mode. Should probably use the actual flags.

Waiting on author to address AlexsJones change requests before this moves forward.

Adds ClusterConfig and ClusterNode types for representing multi-node clusters with TOML persistence, IP/hostname validation, Ray Dashboard auto-discovery, and CLI/TUI integration via --cluster / --no-cluster flags plus an `llmfit cluster {init,status,clear}` subcommand. Addresses all review feedback from PR AlexsJones#279: * Fix `total_vram_gb()` to multiply per-GPU VRAM by `gpu_count` per node (previously reported only one GPU per node, so an 8x A100 80GB node was reported as 80GB instead of 640GB). * Replace hardcoded `GpuBackend::Cuda` with per-node `Option<GpuBackend>` on `ClusterNode` plus `derive_cluster_backend()` helper that warns and defaults to CUDA when nodes disagree or omit the field. Backward-compatible with TOML configs that don't specify the field. * Replace hardcoded `unified_memory: false` in `to_system_specs()` with an aggregation across nodes (Apple Silicon / DGX Spark GB10 clusters now detected correctly). * Wire `dedupe_hf_entries` into `load_embedded()` so duplicate model entries are actually deduplicated at runtime instead of being defined and tested but never called from production code. * Replace `gpu_vram_gb: gpu_count * 80.0` Ray-discovery hardcode with a per-node prompt; emit a prominent warning when GPUs differ across nodes and prompt per-node, otherwise prompt once for uniform clusters. * Replace handwritten `dirs_path()` helper with `dirs::config_dir()`, matching the platform-correct config location pattern introduced upstream in AlexsJones#437. * Replace `i as u8` with `u8::try_from(i)` for IP-increment in interactive init so 256+ node clusters get a clear error instead of silently generating wrong IPs. * Make `detect_specs()` lazy-load: stat the cluster config path before parsing the TOML so it isn't deserialized on every command invocation. * Thread `--cluster` / `--no-cluster` flags through all callers (download/recommend/diff/system/info/list/run) via `HardwareOverrides` instead of hardcoding `(false, false)`. Opportunistic improvements ported from later spark-cluster iterations: * Extract the 0.85 RAM availability factor to a named constant `CLUSTER_AVAILABLE_RAM_FACTOR` with a doc comment explaining it accounts for OS/orchestrator overhead. * Warn on non-numeric / out-of-range input during interactive cluster init instead of silently falling back to a default; empty input still accepts the default. Test strategy: * dedupe test uses fully synthetic `HfModelEntry` fixtures (the previous draft coupled to real model database content like `Qwen/Qwen3-Coder-Next`, which would break on routine model database updates). * New unit tests cover gpu_count multiplier, unified-memory aggregation, per-node backend serde roundtrip, backward-compat with backend-less TOML, and `derive_cluster_backend` heterogeneous/homogeneous behavior. Closes AlexsJones#279. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

michaeljabbour · 2026-04-26T22:14:44Z

@AlexsJones @three-foxes-in-a-trenchcoat — thanks for the detailed reviews. Force-pushed 46394c1 rebased cleanly onto current upstream/main (the PR was 109 commits behind, so the rebase regenerates against your most recent main rather than the old base) and addresses every item below.

Blockers (`CHANGES_REQUESTED`)

#	Item	Resolution
1	`total_vram_gb()` ignores `gpu_count` per node	Now `n.gpu_vram_gb * n.gpu_count as f64`; doc comment clarifies `gpu_vram_gb` is per-GPU. New test `test_total_vram_counts_all_gpus_per_node`.
2	Hardcoded `GpuBackend::Cuda` in `to_system_specs()`	Added `backend: Option<GpuBackend>` to `ClusterNode` (lowercase serde, backward-compatible with TOML files that omit it). New `derive_cluster_backend()` helper warns on heterogeneous/unspecified backends and falls back to CUDA. AMD/ROCm clusters now propagate correctly. New tests cover serde roundtrip, missing-field compat, unspecified default, and uniform ROCm.
3	Hardcoded `unified_memory: false`	Now `!nodes.is_empty() && nodes.iter().all(
4	`dedupe_hf_entries` was dead code	Wired into `load_embedded()` so duplicate entries are merged at runtime. Function extended to handle all post-PR `HfModelEntry` fields (`hf_downloads`, `num_attention_heads`, `license`, etc.).
5	Ray discovery `gpu_count * 80.0` hardcode	Now `0.0` from Ray; interactive flow detects whether `gpu_name` values are uniform across nodes — prompts once for homogeneous clusters, prints a prominent warning and prompts per-node for heterogeneous ones. No silent 80 GB defaults anywhere.
6	`run_download` hardcoded `(false, false)` for cluster flags	All callers now thread `cluster` / `no_cluster` through `HardwareOverrides`; download/recommend/diff/system/info/list/run all honor the flags.

Nits

#	Item	Resolution
7	`dirs_path()` reimplements the `dirs` crate	Deleted; uses `dirs::config_dir().map(
8	`i as u8` silently wraps for 256+ node clusters	Replaced with `u8::try_from(i)`; on overflow we keep the head IP and emit a clear warning instead of generating bogus addresses.
9	`detect_specs()` deserialized `cluster.toml` on every command	Now `path.exists()` (cheap `stat(2)`) gates the parse — TOML is only deserialized when actually used.
10	Test coupled to real model database content	Replaced the `Qwen/Qwen3-Coder-Next` assertion with a fully synthetic `HfModelEntry` fixture (`test_dedupe_hf_entries_merges_duplicate_metadata`) — no longer breaks on routine model database updates.

Opportunistic improvements

0.85 RAM availability factor extracted to CLUSTER_AVAILABLE_RAM_FACTOR constant with a doc comment explaining the OS/orchestrator overhead.
Interactive cluster init now warns on non-numeric / out-of-range input instead of silently falling back; empty input still accepts the default.

Verification

cargo build: clean
cargo test --all: 363 passed, 0 failed, 1 ignored (12 new tests added)
cargo clippy: zero warnings introduced; pre-existing warnings on fit.rs/providers.rs/hardware.rs/update.rs/models.rs are upstream's and outside this PR's scope
cargo fmt --all: applied

Diff at a glance

 Cargo.lock                 |    1 +
 llmfit-core/Cargo.toml     |    1 +
 llmfit-core/src/cluster.rs | (758 → 1192 lines, review fixes + new tests)
 llmfit-core/src/lib.rs     |    1 +
 llmfit-core/src/models.rs  |  +225 (dedupe + synthetic test)
 llmfit-tui/src/main.rs     |  +133 (cluster CLI integration)

6 files, +1548 / −5. Ready for re-review.

Adds ClusterConfig and ClusterNode types for representing multi-node clusters with TOML persistence, IP/hostname validation, Ray Dashboard auto-discovery, and CLI/TUI integration via --cluster / --no-cluster flags plus an `llmfit cluster {init,status,clear}` subcommand. Addresses all review feedback from PR AlexsJones#279: * Fix `total_vram_gb()` to multiply per-GPU VRAM by `gpu_count` per node (previously reported only one GPU per node, so an 8x A100 80GB node was reported as 80GB instead of 640GB). * Replace hardcoded `GpuBackend::Cuda` with per-node `Option<GpuBackend>` on `ClusterNode` plus `derive_cluster_backend()` helper that warns and defaults to CUDA when nodes disagree or omit the field. Backward-compatible with TOML configs that don't specify the field. * Replace hardcoded `unified_memory: false` in `to_system_specs()` with an aggregation across nodes (Apple Silicon / DGX Spark GB10 clusters now detected correctly). * Wire `dedupe_hf_entries` into `load_embedded()` so duplicate model entries are actually deduplicated at runtime instead of being defined and tested but never called from production code. * Replace `gpu_vram_gb: gpu_count * 80.0` Ray-discovery hardcode with a per-node prompt; emit a prominent warning when GPUs differ across nodes and prompt per-node, otherwise prompt once for uniform clusters. * Replace handwritten `dirs_path()` helper with `dirs::config_dir()`, matching the platform-correct config location pattern introduced upstream in AlexsJones#437. * Replace `i as u8` with `u8::try_from(i)` for IP-increment in interactive init so 256+ node clusters get a clear error instead of silently generating wrong IPs. * Make `detect_specs()` lazy-load: stat the cluster config path before parsing the TOML so it isn't deserialized on every command invocation. * Thread `--cluster` / `--no-cluster` flags through all callers (download/recommend/diff/system/info/list/run) via `HardwareOverrides` instead of hardcoding `(false, false)`. Opportunistic improvements ported from later spark-cluster iterations: * Extract the 0.85 RAM availability factor to a named constant `CLUSTER_AVAILABLE_RAM_FACTOR` with a doc comment explaining it accounts for OS/orchestrator overhead. * Warn on non-numeric / out-of-range input during interactive cluster init instead of silently falling back to a default; empty input still accepts the default. Test strategy: * dedupe test uses fully synthetic `HfModelEntry` fixtures (the previous draft coupled to real model database content like `Qwen/Qwen3-Coder-Next`, which would break on routine model database updates). * New unit tests cover gpu_count multiplier, unified-memory aggregation, per-node backend serde roundtrip, backward-compat with backend-less TOML, and `derive_cluster_backend` heterogeneous/homogeneous behavior. Closes AlexsJones#279. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

michaeljabbour · 2026-04-27T11:25:34Z

Rebased onto current upstream/main (92d0f63) — clean rebase, no conflicts. New commit: 5cdb59e.

6 files, +1548 / −5 (unchanged scope)
371 tests passing (up from 363; +8 from the new cluster-related tests)
cargo fmt, cargo build, cargo clippy, full test suite all green
Confirmed dirs::config_dir() usage in cluster.rs matches the post-refactor: use dirs crate for platform-correct config and cache paths #437 pattern (same as theme.rs); no shared helper to adopt

Ready to merge whenever convenient.

three-foxes-in-a-trenchcoat

Re-review after Michael's comprehensive fix pass — LGTM.

All blockers addressed:

✅ total_vram_gb() now accounts for gpu_count per node
✅ Hardcoded GpuBackend::Cuda replaced with per-node backend: Option<GpuBackend> + derive_cluster_backend()
✅ unified_memory properly derived from nodes
✅ dedupe_hf_entries wired into load_embedded()
✅ Ray discovery no longer hardcodes 80 GB — prompts interactively
✅ run_download threads cluster flags properly
✅ All nits addressed (dirs crate, u8 overflow, lazy TOML parse, synthetic test)

CI green across all platforms (Ubuntu/macOS/Windows), clippy clean, fmt applied, 371 tests passing. This is ready — just needs the merge conflict resolved (DIRTY state).

Adds ClusterConfig and ClusterNode types for representing multi-node clusters with TOML persistence, IP/hostname validation, Ray Dashboard auto-discovery, and CLI/TUI integration via --cluster / --no-cluster flags plus an `llmfit cluster {init,status,clear}` subcommand. Addresses all review feedback from PR AlexsJones#279: * Fix `total_vram_gb()` to multiply per-GPU VRAM by `gpu_count` per node (previously reported only one GPU per node, so an 8x A100 80GB node was reported as 80GB instead of 640GB). * Replace hardcoded `GpuBackend::Cuda` with per-node `Option<GpuBackend>` on `ClusterNode` plus `derive_cluster_backend()` helper that warns and defaults to CUDA when nodes disagree or omit the field. Backward-compatible with TOML configs that don't specify the field. * Replace hardcoded `unified_memory: false` in `to_system_specs()` with an aggregation across nodes (Apple Silicon / DGX Spark GB10 clusters now detected correctly). * Wire `dedupe_hf_entries` into `load_embedded()` so duplicate model entries are actually deduplicated at runtime instead of being defined and tested but never called from production code. * Replace `gpu_vram_gb: gpu_count * 80.0` Ray-discovery hardcode with a per-node prompt; emit a prominent warning when GPUs differ across nodes and prompt per-node, otherwise prompt once for uniform clusters. * Replace handwritten `dirs_path()` helper with `dirs::config_dir()`, matching the platform-correct config location pattern introduced upstream in AlexsJones#437. * Replace `i as u8` with `u8::try_from(i)` for IP-increment in interactive init so 256+ node clusters get a clear error instead of silently generating wrong IPs. * Make `detect_specs()` lazy-load: stat the cluster config path before parsing the TOML so it isn't deserialized on every command invocation. * Thread `--cluster` / `--no-cluster` flags through all callers (download/recommend/diff/system/info/list/run) via `HardwareOverrides` instead of hardcoding `(false, false)`. Opportunistic improvements ported from later spark-cluster iterations: * Extract the 0.85 RAM availability factor to a named constant `CLUSTER_AVAILABLE_RAM_FACTOR` with a doc comment explaining it accounts for OS/orchestrator overhead. * Warn on non-numeric / out-of-range input during interactive cluster init instead of silently falling back to a default; empty input still accepts the default. Test strategy: * dedupe test uses fully synthetic `HfModelEntry` fixtures (the previous draft coupled to real model database content like `Qwen/Qwen3-Coder-Next`, which would break on routine model database updates). * New unit tests cover gpu_count multiplier, unified-memory aggregation, per-node backend serde roundtrip, backward-compat with backend-less TOML, and `derive_cluster_backend` heterogeneous/homogeneous behavior. Closes AlexsJones#279. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

michaeljabbour · 2026-04-27T21:23:04Z

@three-foxes-in-a-trenchcoat — thanks for the re-review and approval!

Pushed 6d9df6b rebased onto current upstream/main (928390b); 7 new commits landed upstream during review (the benchmarks view + hardware picker + leaderboard API), which is what flipped the merge state to DIRTY again. Conflicts were both-added independent additions — fully mechanical resolution:

llmfit-core/src/lib.rs — kept both pub mod benchmarks; and pub mod cluster; (alphabetical)
llmfit-tui/src/main.rs — kept upstream's api_key field alongside our cluster / no_cluster flags

Verified locally: cargo fmt, cargo build, cargo test --all (352 passing), cargo clippy all green; no new warnings on the cluster code. CI re-running on the new SHA.

@AlexsJones — ready when you are. All your inline comments addressed in 46394c1, Three Foxes' re-approval is on 5cdb59e (the same code, just rebased one more time onto current main).

three-foxes-in-a-trenchcoat

Re-review after author rebased. All blockers from both my review and @AlexsJones change requests have been addressed in 46394c1:

✅ Hardcoded Cuda backend fixed — now per-node with ROCm support
✅ VRAM 80 GB default removed — uses Ray discovery or interactive prompts
✅ VRAM per-gpu-count fixed
✅ unified_memory properly handled
✅ dedupe_hf_entries wired up
✅ run_download threads cluster flags through all commands
✅ dirs crate usage, u8 overflow, stat(2) gating all fixed

CI is green, merge state is CLEAN. My approval still stands from the previous rebase. @AlexsJones the code addressing your change requests landed in 46394c1 — ready for your re-review.

…etrics Adds the `llmfit bench` subcommand for live inference benchmarking against running providers (Ollama, vLLM, MLX), with role-based quality scoring, agentic test suites, and routing matrix. TUI integration provides a results view and a routing/role view alongside the existing community benchmarks view. Addresses review feedback: * Resolved merge conflicts against current `upstream/main` (~135 commits behind original base; rebased cleanly onto v0.9.18 tip via squash). * Migrated `serde_yaml` (deprecated) to `serde_yml` per Three Foxes' note (`0.0.x` series; drop-in API). * `OLLAMA_HOST` env var override is wired through bench.rs and the TUI start path, paralleling the existing `VLLM_PORT` override. * Renamed `bench_scroll` → `live_bench_scroll` in the App struct to avoid silent shadowing of the upstream community-leaderboard `bench_scroll` field added in #4d0ffa7. * Live-bench keybinding is `B` (uppercase); upstream's `'b'` already opens the community leaderboard view. Cross-PR coordination with AlexsJones#279 (cluster): * `dedupe_hf_entries` is the cluster PR's complete version (handles all current upstream `HfModelEntry` fields including `num_attention_heads`, `num_key_value_heads`, `num_hidden_layers`, `head_dim`, `license`), wired into `load_embedded()`. Uses `canonical_slug()` as the dedup key. Synthetic `test_dedupe_hf_entries_merges_duplicate_metadata` test is included; the prior data-coupled `test_model_database_deduplicates_…` test is intentionally NOT carried forward (fragile to model DB churn). Test strategy: * `cargo fmt --all`: clean. * `cargo build`: clean. * `cargo test --all`: 350 passing, 0 failed, 1 ignored. * `cargo clippy`: no new warnings on bench/quality code or modified files; pre-existing upstream warnings (fit.rs, providers.rs, hardware.rs, update.rs) untouched. Skipped scope-creep items from review: * Splitting `benchmarks.yaml` (1456 lines) into per-role files — left as a follow-up. Closes AlexsJones#278. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

…etrics (#278) Adds the `llmfit bench` subcommand for live inference benchmarking against running providers (Ollama, vLLM, MLX), with role-based quality scoring, agentic test suites, and routing matrix. TUI integration provides a results view and a routing/role view alongside the existing community benchmarks view. Addresses review feedback: * Resolved merge conflicts against current `upstream/main` (~135 commits behind original base; rebased cleanly onto v0.9.18 tip via squash). * Migrated `serde_yaml` (deprecated) to `serde_yml` per Three Foxes' note (`0.0.x` series; drop-in API). * `OLLAMA_HOST` env var override is wired through bench.rs and the TUI start path, paralleling the existing `VLLM_PORT` override. * Renamed `bench_scroll` → `live_bench_scroll` in the App struct to avoid silent shadowing of the upstream community-leaderboard `bench_scroll` field added in #4d0ffa7. * Live-bench keybinding is `B` (uppercase); upstream's `'b'` already opens the community leaderboard view. Cross-PR coordination with #279 (cluster): * `dedupe_hf_entries` is the cluster PR's complete version (handles all current upstream `HfModelEntry` fields including `num_attention_heads`, `num_key_value_heads`, `num_hidden_layers`, `head_dim`, `license`), wired into `load_embedded()`. Uses `canonical_slug()` as the dedup key. Synthetic `test_dedupe_hf_entries_merges_duplicate_metadata` test is included; the prior data-coupled `test_model_database_deduplicates_…` test is intentionally NOT carried forward (fragile to model DB churn). Test strategy: * `cargo fmt --all`: clean. * `cargo build`: clean. * `cargo test --all`: 350 passing, 0 failed, 1 ignored. * `cargo clippy`: no new warnings on bench/quality code or modified files; pre-existing upstream warnings (fit.rs, providers.rs, hardware.rs, update.rs) untouched. Skipped scope-creep items from review: * Splitting `benchmarks.yaml` (1456 lines) into per-role files — left as a follow-up. Closes #278. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-authored-by: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

three-foxes-in-a-trenchcoat

Review

Verdict: Approve

Adds remote GPU cluster support for multi-node model fitting. This is a significant feature:

New cluster.rs module for cluster management
Updates to models.rs for multi-node awareness
TUI integration in main.rs

The feature addresses a real pain point — fitting models across multiple GPUs/nodes. The implementation looks solid:

Clean separation of cluster logic into its own module
Proper integration with existing model fitting pipeline
CI checks all passing (including Rustfmt)

This is a good addition to the project. LGTM.

Adds ClusterConfig and ClusterNode types for representing multi-node clusters with TOML persistence, IP/hostname validation, Ray Dashboard auto-discovery, and CLI/TUI integration via --cluster / --no-cluster flags plus an `llmfit cluster {init,status,clear}` subcommand. Rebased onto current upstream/main (post-PR AlexsJones#278 bench merge): * `dedupe_hf_entries` is now in main from AlexsJones#278 — this rebase relies on the existing function (no duplicate added) since both PRs converged on the same implementation by design. * Cluster CLI integration coexists with the bench CLI integration that landed via AlexsJones#278 — no overlap or competing keybindings. * Cluster module declared in lib.rs alphabetically between `benchmarks` and `fit`, alongside the new `bench` and `quality` modules. All earlier review feedback from PR AlexsJones#279 (Alex + Three Foxes) remains addressed: * `total_vram_gb()` multiplies per-GPU VRAM by `gpu_count` per node. * `GpuBackend` is per-node (`Option<GpuBackend>` on `ClusterNode`) with `derive_cluster_backend()` warning helper; backward-compatible with TOML configs that omit the field. * `unified_memory` aggregated across nodes (Apple Silicon / DGX Spark GB10 clusters detected correctly). * Ray discovery prompts interactively for VRAM (no 80 GB hardcode); prominent warning + per-node prompt when GPU names differ. * Uses `dirs::config_dir()` for cluster.toml location (same pattern as upstream's AlexsJones#437 dirs-crate refactor). * `u8::try_from` for IP increment so 256+ node clusters get a clear error instead of silent wrap. * `detect_specs()` lazy-loads via `path.exists()` before parsing TOML. * `--cluster` / `--no-cluster` threaded through `HardwareOverrides` to all callers (no hardcoded `(false, false)` in download/recommend). * `0.85` RAM availability factor extracted to `CLUSTER_AVAILABLE_RAM_FACTOR` constant. * Non-numeric input warnings during interactive cluster init (empty input still accepts the default). * Dedupe test uses synthetic `HfModelEntry` fixtures (no coupling to real model database content). Test strategy: 376 tests passing, `cargo fmt` clean, `cargo build` clean, `cargo clippy` zero new warnings on cluster code. Closes AlexsJones#279. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

michaeljabbour · 2026-05-03T13:20:43Z

@AlexsJones @three-foxes-in-a-trenchcoat — rebased onto current upstream/main (post-#278 merge). New commit on top of 8896fdb.

What changed since the previous push:

dedupe_hf_entries now lives in main from feat: add llmfit bench for live inference benchmarking #278 (we deliberately ported the cluster PR's improved version into the bench PR for cross-PR convergence) — this rebase no longer adds it; just relies on the existing function.
Cluster CLI integration coexists cleanly with the new bench CLI integration. No overlapping flags or keybindings.
pub mod cluster; added to lib.rs alphabetically between benchmarks and fit (alongside the new bench/quality modules).
Cargo.toml adds toml = "0.8" for cluster config parsing.

All earlier review feedback (Alex's 6 blockers + Three Foxes' 4 nits) remains addressed; spec-tracked in the commit message.

Verification:

cargo fmt --all: clean
cargo build: clean
cargo test --all: 376 passing, 0 failed, 1 ignored
cargo clippy: zero new warnings on cluster code
CI re-running on the new SHA

Ready when you are.

bertwesarg · 2026-06-03T07:17:21Z

is this Hardware Simulation (S)-ready?

michaeljabbour mentioned this pull request Mar 22, 2026

feat: Add DGX Spark cluster support, TP awareness, use-case/availability filters, and live benchmarking #181

Closed

AlexsJones reviewed Mar 22, 2026

View reviewed changes

michaeljabbour force-pushed the feat/remote-cluster branch from 33ff0fd to 3bafae1 Compare March 30, 2026 23:58

michaeljabbour force-pushed the feat/remote-cluster branch from 3bafae1 to 329adec Compare March 30, 2026 23:58

AlexsJones requested changes Apr 24, 2026

View reviewed changes

AlexsJones reviewed Apr 24, 2026

View reviewed changes

michaeljabbour force-pushed the feat/remote-cluster branch from 329adec to 46394c1 Compare April 26, 2026 21:23

michaeljabbour force-pushed the feat/remote-cluster branch from 46394c1 to 5cdb59e Compare April 27, 2026 11:25

three-foxes-in-a-trenchcoat approved these changes Apr 27, 2026

View reviewed changes

michaeljabbour force-pushed the feat/remote-cluster branch from 5cdb59e to 6d9df6b Compare April 27, 2026 21:22

three-foxes-in-a-trenchcoat reviewed Apr 28, 2026

View reviewed changes

michaeljabbour mentioned this pull request May 3, 2026

feat: add llmfit bench for live inference benchmarking #278

Merged

4 tasks

three-foxes-in-a-trenchcoat approved these changes May 3, 2026

View reviewed changes

michaeljabbour force-pushed the feat/remote-cluster branch from 6d9df6b to e4e3635 Compare May 3, 2026 13:20


		// ── Helpers ────────────────────────────────────────────────────────

		fn dirs_path() -> Option<PathBuf> {

Conversation

michaeljabbour commented Mar 22, 2026

Summary

Files changed (11)

Example usage

Test plan

Uh oh!

AlexsJones left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

michaeljabbour commented Mar 30, 2026

Uh oh!

three-foxes-in-a-trenchcoat commented Apr 23, 2026

Uh oh!

AlexsJones commented Apr 24, 2026

Uh oh!

AlexsJones left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

three-foxes-in-a-trenchcoat commented Apr 26, 2026

Uh oh!

michaeljabbour commented Apr 26, 2026

Blockers (CHANGES_REQUESTED)

Nits

Opportunistic improvements

Verification

Diff at a glance

Uh oh!

michaeljabbour commented Apr 27, 2026

Uh oh!

three-foxes-in-a-trenchcoat left a comment

Choose a reason for hiding this comment

Uh oh!

michaeljabbour commented Apr 27, 2026

Uh oh!

three-foxes-in-a-trenchcoat left a comment

Choose a reason for hiding this comment

Uh oh!

three-foxes-in-a-trenchcoat left a comment

Choose a reason for hiding this comment

Review

Uh oh!

michaeljabbour commented May 3, 2026

Uh oh!

bertwesarg commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Blockers (`CHANGES_REQUESTED`)