Skip to content

feat: add remote GPU cluster support for multi-node model fitting#279

Open
michaeljabbour wants to merge 1 commit into
AlexsJones:mainfrom
michaeljabbour:feat/remote-cluster
Open

feat: add remote GPU cluster support for multi-node model fitting#279
michaeljabbour wants to merge 1 commit into
AlexsJones:mainfrom
michaeljabbour:feat/remote-cluster

Conversation

@michaeljabbour

Copy link
Copy Markdown
Contributor

Summary

  • Hardware-agnostic remote GPU cluster support (any vendor — NVIDIA, AMD, etc.)
  • llmfit cluster init — interactive setup wizard prompting for IPs, VRAM, GPU name, RAM, cores
  • llmfit cluster status / llmfit cluster clear — view/remove config
  • --cluster / --no-cluster CLI flags override local hardware detection
  • Ray Dashboard API auto-discovery with manual fallback
  • Config persisted to ~/.config/llmfit/cluster.toml
  • TensorParallel run mode: cluster-aware fit analysis with vLLM runtime
  • Proper IPv4 validation, hostname validation, error messages on bad input

Files changed (11)

File Change
llmfit-core/src/cluster.rs New cluster module — ClusterConfig, ClusterNode, Ray discovery, interactive init, TOML persistence, tests (739 lines)
llmfit-core/src/fit.rs TensorParallel run mode, cluster fit path (+48)
llmfit-core/src/hardware.rs cluster_mode, cluster_node_count fields (+10)
llmfit-core/src/lib.rs Module export (+1)
llmfit-core/Cargo.toml toml dependency (+1)
llmfit-core/src/plan.rs Test constructor updates (+2)
llmfit-tui/src/main.rs Cluster subcommand, --cluster/--no-cluster flags, detect_specs updates (+131)
llmfit-tui/Cargo.toml dirs dependency (+1)
llmfit-tui/src/serve_api.rs TensorParallel in run_mode_code (+1)
llmfit-tui/src/tui_ui.rs TensorParallel color mapping (+2)

Example usage

# Interactive cluster setup (prompts for IPs, VRAM, GPU specs)
llmfit cluster init

# Use cluster for fit analysis
llmfit fit --cluster

# View cluster config
llmfit cluster status

# Example cluster.toml (2× A100 80GB nodes)
# [nodes.0]
# hostname = "node-1"
# ip = "10.0.0.1"
# gpu_name = "A100"
# gpu_vram_gb = 80.0
# ...

Test plan

  • All existing tests pass (253 passed, 1 pre-existing failure unrelated)
  • 16 new unit tests: IP increment/overflow, IP/hostname validation, cluster totals, to_system_specs, TOML roundtrip
  • Manual: run llmfit cluster init, verify config saved
  • Manual: run llmfit fit --cluster, verify aggregated VRAM in output

Split from #181 — see closing comment for context on the split.

🤖 Generated with Claude Code

@AlexsJones AlexsJones left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice feature concept! Left some inline suggestions. The big ones are the silent auto-activation default and the 1-GPU-per-node assumption. Rest is minor cleanup.

pub fn total_gpu_count(&self) -> usize {
self.nodes.len()
}

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean struct design. Consider adding a gpu_count: u32 field here so multi-GPU nodes (e.g. 8xA100 DGX) are counted correctly. Right now total_gpu_count() returns nodes.len() which underreports on common cluster hardware.

Comment thread llmfit-core/src/cluster.rs Outdated
.unwrap_or(false);

let backend = if unified {
GpuBackend::Cuda // unified memory NVIDIA (e.g. DGX Spark GB10)

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both branches return GpuBackend::Cuda here, so the if/else is a no-op. Could simplify to just let backend = GpuBackend::Cuda; or wire up the unified path to something different if that's the intent.

Comment thread llmfit-core/src/cluster.rs Outdated
hostname,
ip,
gpu_name: "GPU".to_string(), // Ray doesn't report GPU model
gpu_vram_gb: gpu_count * 80.0, // rough default; user can correct

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fallback! The gpu_count * 80.0 default is a bit fragile though. Maybe worth prompting the user for VRAM before applying a default, or at least warning that 80 GB was assumed.

Comment thread llmfit-core/src/cluster.rs Outdated
// ── Helpers ────────────────────────────────────────────────────────

fn dirs_path() -> Option<PathBuf> {
std::env::var("HOME")

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This manually reads $HOME but the dirs crate was added to llmfit-tui/Cargo.toml. Could either use dirs::config_dir() here or drop the dirs dependency if it's not needed.

Comment thread llmfit-tui/src/main.rs
},

/// Manage remote GPU cluster configuration
Cluster {

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heads up: this means cluster mode activates silently whenever a cluster.toml exists, even without --cluster. Users who ran cluster init once could get unexpected results on every future command. Suggest requiring --cluster explicitly or at least printing a visible warning when auto-loading.

Comment thread llmfit-tui/src/main.rs
port: u16,
},

/// Manage remote GPU cluster configuration

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice that the escape hatch exists! Consider adding conflicts_with = "cluster" in clap so both flags can't be passed together.

michaeljabbour added a commit to michaeljabbour/llmfit-spark that referenced this pull request Mar 22, 2026
- Add gpu_count field to ClusterNode for multi-GPU nodes (8xA100 DGX)
- Fix total_gpu_count() to sum gpu_count per node, not just node count
- Remove no-op if/else for backend (both branches returned Cuda)
- Add VRAM default warning when Ray doesn't report GPU memory
- Require --cluster flag explicitly (no silent auto-activation from cluster.toml)
- Add conflicts_with between --cluster and --no-cluster
- Add USERPROFILE fallback in dirs_path() for Windows
- Fix flaky dedup test assertion in models.rs
- Add GPUs-per-node prompt in interactive cluster init

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
michaeljabbour added a commit to michaeljabbour/llmfit-spark that referenced this pull request Mar 30, 2026
- Add gpu_count field to ClusterNode for multi-GPU nodes (8xA100 DGX)
- Fix total_gpu_count() to sum gpu_count per node, not just node count
- Remove no-op if/else for backend (both branches returned Cuda)
- Add VRAM default warning when Ray doesn't report GPU memory
- Require --cluster flag explicitly (no silent auto-activation from cluster.toml)
- Add conflicts_with between --cluster and --no-cluster
- Add USERPROFILE fallback in dirs_path() for Windows
- Fix flaky dedup test assertion in models.rs
- Add GPUs-per-node prompt in interactive cluster init

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@michaeljabbour

Copy link
Copy Markdown
Contributor Author

Rebased onto latest main, resolved merge conflicts, ran cargo fmt, and fixed a test failure (implemented missing dedupe_hf_entries function). All 289 tests pass. Let me know if there's anything else you'd like me to do, @AlexsJones!

@three-foxes-in-a-trenchcoat

Copy link
Copy Markdown
Collaborator

Big feature, Michael. The cluster module is well-structured.

Verdict: LGTM

Key things I like:

  • Clean ClusterConfigSystemSpecs conversion path
  • Proper TOML persistence with dirs for the config path
  • Good validation on IP/hostname input
  • 16 unit tests covering edge cases (IP overflow, roundtrip, etc.)
  • --cluster / --no-cluster CLI flags are the right UX

Minor notes:

  • to_system_specs() assumes all nodes have the same GPU type. That's reasonable for most use cases but worth documenting.
  • The conservative available_ram_gb: total_ram * 0.85 estimate is sensible.

Solid work. Ready to merge.

@AlexsJones

Copy link
Copy Markdown
Owner

Please rebase and we can continue @michaeljabbour

@AlexsJones AlexsJones left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cluster infrastructure is solid and well-tested for the common case (homogeneous NVIDIA clusters). Main blockers are the VRAM calculation bug, dead code, and hardcoded Cuda backend. See inline comments.

Comment thread llmfit-core/src/cluster.rs Outdated
}

pub fn total_vram_gb(&self) -> f64 {
self.nodes.iter().map(|n| n.gpu_vram_gb).sum()

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: total_vram_gb doesn't account for gpu_count

A node with 8× A100 80GB would report 80 GB total instead of 640 GB. Should be:

self.nodes.iter().map(|n| n.gpu_vram_gb * n.gpu_count as f64).sum()

The field name gpu_vram_gb is ambiguous — is it per-GPU or per-node? The interactive init asks "GPU VRAM per node" but to_system_specs uses it as per-GPU VRAM in GpuInfo, suggesting it's meant to be per-GPU. Either way, this sum needs the multiplier.

Comment thread llmfit-core/src/cluster.rs Outdated
.map(|n| n.unified_memory)
.unwrap_or(false);

let backend = GpuBackend::Cuda;

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded GpuBackend::Cuda contradicts "hardware-agnostic" claim

The PR description says it supports any vendor (NVIDIA, AMD, etc.) but an AMD cluster would get the wrong backend here. Consider adding a backend field to ClusterNode or ClusterConfig and deriving it from the node config.

Comment thread llmfit-core/src/cluster.rs Outdated
total_gpu_vram_gb: Some(total_vram),
gpu_name: Some(format!("{} (×{})", gpu_name, total_gpus)),
gpu_count: total_gpus,
unified_memory: false, // cluster uses NCCL path, not unified

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unified_memory: false is hardcoded with the comment "cluster uses NCCL path, not unified" — but a cluster of Apple Silicon / DGX Spark (GB10) nodes does have unified memory. The flag already exists on ClusterNode, so this could be aggregated (e.g. all-unified → true).

Comment thread llmfit-core/src/cluster.rs Outdated
hostname,
ip,
gpu_name: "GPU".to_string(), // Ray doesn't report GPU model
gpu_vram_gb: gpu_count * 80.0, // rough default; user can correct

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ray discovery hardcodes 80GB VRAM per GPU

gpu_vram_gb: gpu_count * 80.0

This is reasonable as a fallback, but the interactive flow then applies a single VRAM override to all nodes uniformly. For mixed clusters (e.g. A100 + H100) this silently produces wrong results. Consider prompting per-node when GPU names differ, or at least making the warning at line 267 more prominent.

Comment thread llmfit-core/src/cluster.rs Outdated

// ── Helpers ────────────────────────────────────────────────────────

fn dirs_path() -> Option<PathBuf> {

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dirs_path() reimplements what the dirs crate provides

The dirs crate is added to llmfit-tui but not used here — this manually reads $HOME/$USERPROFILE instead of using dirs::config_dir(). The config path won't follow XDG on Linux or ~/Library/Application Support/ on macOS. Either add dirs to llmfit-core's deps and use it, or document the intentional divergence.

Comment thread llmfit-core/src/cluster.rs Outdated
head_ip.clone()
} else {
let default_ip =
increment_ip(&head_ip, i as u8).unwrap_or_else(|_| head_ip.clone());

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: i as u8 silently wraps for clusters with 256+ nodes. u8::try_from(i).map_err(...) would surface a clear error instead of generating a bogus IP.

Comment thread llmfit-core/src/models.rs Outdated
/// Deduplicate `HfModelEntry` values that share the same canonical name.
///
/// When two entries share a name, the "better" metadata is kept: higher
/// parameter counts, larger context length, etc. Capabilities and GGUF

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dedupe_hf_entries is defined and tested but never called from production code.

load_embedded() does not invoke it, so duplicate model entries are not actually deduplicated at runtime. Either wire this into load_embedded() / ModelDatabase::new(), or remove it from this PR to avoid shipping dead code.

Comment thread llmfit-core/src/models.rs Outdated
assert_eq!(merged.min_ram_gb, 12.0);
assert_eq!(merged.recommended_ram_gb, 24.0);
assert_eq!(merged.min_vram_gb, Some(10.0));
assert_eq!(merged.context_length, 65536);

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this test asserts on "Qwen/Qwen3-Coder-Next" which is model database content. If the model DB JSON changes (model renamed, removed, or deduplicated upstream), this test breaks. Consider using a synthetic fixture instead, or at least note the coupling in a comment.

Comment thread llmfit-tui/src/main.rs Outdated
// Explicit opt-out — skip cluster entirely
} else if use_cluster {
// Explicit opt-in — load and use cluster config
if let Some(cluster) = ClusterConfig::load() {

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: when neither --cluster nor --no-cluster is passed, ClusterConfig::load() reads and parses the TOML file on every command invocation just to print a "Note" to stderr. Consider gating on a cheaper existence check (config_path().map(|p| p.exists())) instead of deserializing the file each time.

@three-foxes-in-a-trenchcoat

Copy link
Copy Markdown
Collaborator

Reviewed the diff. Solid cluster infrastructure overall.

Good stuff:

  • Clean ClusterNode/ClusterConfig structs with TOML persistence
  • Ray Dashboard auto-discovery with sensible fallback to manual config
  • dedupe_hf_entries fixes real model deduplication issues
  • CLI integration with --cluster/--no-cluster flags is well done

Issues flagged by AlexsJones (CHANGES_REQUESTED) are the blockers. Two I agree with:

  • Hardcoded GpuBackend::Cuda in to_system_specs() excludes ROCm/AMD clusters
  • VRAM defaulting to 80 GB/GPU when Ray doesnt report it is a dangerous guess for non-DGX hardware

One nit: run_download() hardcodes (false, false) for cluster flags so budget calc ignores cluster mode. Should probably use the actual flags.

Waiting on author to address AlexsJones change requests before this moves forward.

michaeljabbour added a commit to michaeljabbour/llmfit-spark that referenced this pull request Apr 26, 2026
Adds ClusterConfig and ClusterNode types for representing multi-node
clusters with TOML persistence, IP/hostname validation, Ray Dashboard
auto-discovery, and CLI/TUI integration via --cluster / --no-cluster flags
plus an `llmfit cluster {init,status,clear}` subcommand.

Addresses all review feedback from PR AlexsJones#279:

* Fix `total_vram_gb()` to multiply per-GPU VRAM by `gpu_count` per node
  (previously reported only one GPU per node, so an 8x A100 80GB node was
  reported as 80GB instead of 640GB).
* Replace hardcoded `GpuBackend::Cuda` with per-node `Option<GpuBackend>`
  on `ClusterNode` plus `derive_cluster_backend()` helper that warns and
  defaults to CUDA when nodes disagree or omit the field. Backward-compatible
  with TOML configs that don't specify the field.
* Replace hardcoded `unified_memory: false` in `to_system_specs()` with an
  aggregation across nodes (Apple Silicon / DGX Spark GB10 clusters now
  detected correctly).
* Wire `dedupe_hf_entries` into `load_embedded()` so duplicate model entries
  are actually deduplicated at runtime instead of being defined and tested
  but never called from production code.
* Replace `gpu_vram_gb: gpu_count * 80.0` Ray-discovery hardcode with a
  per-node prompt; emit a prominent warning when GPUs differ across nodes
  and prompt per-node, otherwise prompt once for uniform clusters.
* Replace handwritten `dirs_path()` helper with `dirs::config_dir()`,
  matching the platform-correct config location pattern introduced upstream
  in AlexsJones#437.
* Replace `i as u8` with `u8::try_from(i)` for IP-increment in interactive
  init so 256+ node clusters get a clear error instead of silently
  generating wrong IPs.
* Make `detect_specs()` lazy-load: stat the cluster config path before
  parsing the TOML so it isn't deserialized on every command invocation.
* Thread `--cluster` / `--no-cluster` flags through all callers
  (download/recommend/diff/system/info/list/run) via `HardwareOverrides`
  instead of hardcoding `(false, false)`.

Opportunistic improvements ported from later spark-cluster iterations:

* Extract the 0.85 RAM availability factor to a named constant
  `CLUSTER_AVAILABLE_RAM_FACTOR` with a doc comment explaining it accounts
  for OS/orchestrator overhead.
* Warn on non-numeric / out-of-range input during interactive cluster init
  instead of silently falling back to a default; empty input still accepts
  the default.

Test strategy:

* dedupe test uses fully synthetic `HfModelEntry` fixtures (the previous
  draft coupled to real model database content like `Qwen/Qwen3-Coder-Next`,
  which would break on routine model database updates).
* New unit tests cover gpu_count multiplier, unified-memory aggregation,
  per-node backend serde roundtrip, backward-compat with backend-less TOML,
  and `derive_cluster_backend` heterogeneous/homogeneous behavior.

Closes AlexsJones#279.

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
@michaeljabbour

Copy link
Copy Markdown
Contributor Author

@AlexsJones @three-foxes-in-a-trenchcoat — thanks for the detailed reviews. Force-pushed 46394c1 rebased cleanly onto current upstream/main (the PR was 109 commits behind, so the rebase regenerates against your most recent main rather than the old base) and addresses every item below.

Blockers (CHANGES_REQUESTED)

# Item Resolution
1 total_vram_gb() ignores gpu_count per node Now n.gpu_vram_gb * n.gpu_count as f64; doc comment clarifies gpu_vram_gb is per-GPU. New test test_total_vram_counts_all_gpus_per_node.
2 Hardcoded GpuBackend::Cuda in to_system_specs() Added backend: Option<GpuBackend> to ClusterNode (lowercase serde, backward-compatible with TOML files that omit it). New derive_cluster_backend() helper warns on heterogeneous/unspecified backends and falls back to CUDA. AMD/ROCm clusters now propagate correctly. New tests cover serde roundtrip, missing-field compat, unspecified default, and uniform ROCm.
3 Hardcoded unified_memory: false Now `!nodes.is_empty() && nodes.iter().all(
4 dedupe_hf_entries was dead code Wired into load_embedded() so duplicate entries are merged at runtime. Function extended to handle all post-PR HfModelEntry fields (hf_downloads, num_attention_heads, license, etc.).
5 Ray discovery gpu_count * 80.0 hardcode Now 0.0 from Ray; interactive flow detects whether gpu_name values are uniform across nodes — prompts once for homogeneous clusters, prints a prominent warning and prompts per-node for heterogeneous ones. No silent 80 GB defaults anywhere.
6 run_download hardcoded (false, false) for cluster flags All callers now thread cluster / no_cluster through HardwareOverrides; download/recommend/diff/system/info/list/run all honor the flags.

Nits

# Item Resolution
7 dirs_path() reimplements the dirs crate Deleted; uses `dirs::config_dir().map(
8 i as u8 silently wraps for 256+ node clusters Replaced with u8::try_from(i); on overflow we keep the head IP and emit a clear warning instead of generating bogus addresses.
9 detect_specs() deserialized cluster.toml on every command Now path.exists() (cheap stat(2)) gates the parse — TOML is only deserialized when actually used.
10 Test coupled to real model database content Replaced the Qwen/Qwen3-Coder-Next assertion with a fully synthetic HfModelEntry fixture (test_dedupe_hf_entries_merges_duplicate_metadata) — no longer breaks on routine model database updates.

Opportunistic improvements

  • 0.85 RAM availability factor extracted to CLUSTER_AVAILABLE_RAM_FACTOR constant with a doc comment explaining the OS/orchestrator overhead.
  • Interactive cluster init now warns on non-numeric / out-of-range input instead of silently falling back; empty input still accepts the default.

Verification

  • cargo build: clean
  • cargo test --all: 363 passed, 0 failed, 1 ignored (12 new tests added)
  • cargo clippy: zero warnings introduced; pre-existing warnings on fit.rs/providers.rs/hardware.rs/update.rs/models.rs are upstream's and outside this PR's scope
  • cargo fmt --all: applied

Diff at a glance

 Cargo.lock                 |    1 +
 llmfit-core/Cargo.toml     |    1 +
 llmfit-core/src/cluster.rs | (758 → 1192 lines, review fixes + new tests)
 llmfit-core/src/lib.rs     |    1 +
 llmfit-core/src/models.rs  |  +225 (dedupe + synthetic test)
 llmfit-tui/src/main.rs     |  +133 (cluster CLI integration)

6 files, +1548 / −5. Ready for re-review.

michaeljabbour added a commit to michaeljabbour/llmfit-spark that referenced this pull request Apr 27, 2026
Adds ClusterConfig and ClusterNode types for representing multi-node
clusters with TOML persistence, IP/hostname validation, Ray Dashboard
auto-discovery, and CLI/TUI integration via --cluster / --no-cluster flags
plus an `llmfit cluster {init,status,clear}` subcommand.

Addresses all review feedback from PR AlexsJones#279:

* Fix `total_vram_gb()` to multiply per-GPU VRAM by `gpu_count` per node
  (previously reported only one GPU per node, so an 8x A100 80GB node was
  reported as 80GB instead of 640GB).
* Replace hardcoded `GpuBackend::Cuda` with per-node `Option<GpuBackend>`
  on `ClusterNode` plus `derive_cluster_backend()` helper that warns and
  defaults to CUDA when nodes disagree or omit the field. Backward-compatible
  with TOML configs that don't specify the field.
* Replace hardcoded `unified_memory: false` in `to_system_specs()` with an
  aggregation across nodes (Apple Silicon / DGX Spark GB10 clusters now
  detected correctly).
* Wire `dedupe_hf_entries` into `load_embedded()` so duplicate model entries
  are actually deduplicated at runtime instead of being defined and tested
  but never called from production code.
* Replace `gpu_vram_gb: gpu_count * 80.0` Ray-discovery hardcode with a
  per-node prompt; emit a prominent warning when GPUs differ across nodes
  and prompt per-node, otherwise prompt once for uniform clusters.
* Replace handwritten `dirs_path()` helper with `dirs::config_dir()`,
  matching the platform-correct config location pattern introduced upstream
  in AlexsJones#437.
* Replace `i as u8` with `u8::try_from(i)` for IP-increment in interactive
  init so 256+ node clusters get a clear error instead of silently
  generating wrong IPs.
* Make `detect_specs()` lazy-load: stat the cluster config path before
  parsing the TOML so it isn't deserialized on every command invocation.
* Thread `--cluster` / `--no-cluster` flags through all callers
  (download/recommend/diff/system/info/list/run) via `HardwareOverrides`
  instead of hardcoding `(false, false)`.

Opportunistic improvements ported from later spark-cluster iterations:

* Extract the 0.85 RAM availability factor to a named constant
  `CLUSTER_AVAILABLE_RAM_FACTOR` with a doc comment explaining it accounts
  for OS/orchestrator overhead.
* Warn on non-numeric / out-of-range input during interactive cluster init
  instead of silently falling back to a default; empty input still accepts
  the default.

Test strategy:

* dedupe test uses fully synthetic `HfModelEntry` fixtures (the previous
  draft coupled to real model database content like `Qwen/Qwen3-Coder-Next`,
  which would break on routine model database updates).
* New unit tests cover gpu_count multiplier, unified-memory aggregation,
  per-node backend serde roundtrip, backward-compat with backend-less TOML,
  and `derive_cluster_backend` heterogeneous/homogeneous behavior.

Closes AlexsJones#279.

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
@michaeljabbour

Copy link
Copy Markdown
Contributor Author

Rebased onto current upstream/main (92d0f63) — clean rebase, no conflicts. New commit: 5cdb59e.

  • 6 files, +1548 / −5 (unchanged scope)
  • 371 tests passing (up from 363; +8 from the new cluster-related tests)
  • cargo fmt, cargo build, cargo clippy, full test suite all green
  • Confirmed dirs::config_dir() usage in cluster.rs matches the post-refactor: use dirs crate for platform-correct config and cache paths #437 pattern (same as theme.rs); no shared helper to adopt

Ready to merge whenever convenient.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review after Michael's comprehensive fix pass — LGTM.

All blockers addressed:

  • total_vram_gb() now accounts for gpu_count per node
  • ✅ Hardcoded GpuBackend::Cuda replaced with per-node backend: Option<GpuBackend> + derive_cluster_backend()
  • unified_memory properly derived from nodes
  • dedupe_hf_entries wired into load_embedded()
  • ✅ Ray discovery no longer hardcodes 80 GB — prompts interactively
  • run_download threads cluster flags properly
  • ✅ All nits addressed (dirs crate, u8 overflow, lazy TOML parse, synthetic test)

CI green across all platforms (Ubuntu/macOS/Windows), clippy clean, fmt applied, 371 tests passing. This is ready — just needs the merge conflict resolved (DIRTY state).

michaeljabbour added a commit to michaeljabbour/llmfit-spark that referenced this pull request Apr 27, 2026
Adds ClusterConfig and ClusterNode types for representing multi-node
clusters with TOML persistence, IP/hostname validation, Ray Dashboard
auto-discovery, and CLI/TUI integration via --cluster / --no-cluster flags
plus an `llmfit cluster {init,status,clear}` subcommand.

Addresses all review feedback from PR AlexsJones#279:

* Fix `total_vram_gb()` to multiply per-GPU VRAM by `gpu_count` per node
  (previously reported only one GPU per node, so an 8x A100 80GB node was
  reported as 80GB instead of 640GB).
* Replace hardcoded `GpuBackend::Cuda` with per-node `Option<GpuBackend>`
  on `ClusterNode` plus `derive_cluster_backend()` helper that warns and
  defaults to CUDA when nodes disagree or omit the field. Backward-compatible
  with TOML configs that don't specify the field.
* Replace hardcoded `unified_memory: false` in `to_system_specs()` with an
  aggregation across nodes (Apple Silicon / DGX Spark GB10 clusters now
  detected correctly).
* Wire `dedupe_hf_entries` into `load_embedded()` so duplicate model entries
  are actually deduplicated at runtime instead of being defined and tested
  but never called from production code.
* Replace `gpu_vram_gb: gpu_count * 80.0` Ray-discovery hardcode with a
  per-node prompt; emit a prominent warning when GPUs differ across nodes
  and prompt per-node, otherwise prompt once for uniform clusters.
* Replace handwritten `dirs_path()` helper with `dirs::config_dir()`,
  matching the platform-correct config location pattern introduced upstream
  in AlexsJones#437.
* Replace `i as u8` with `u8::try_from(i)` for IP-increment in interactive
  init so 256+ node clusters get a clear error instead of silently
  generating wrong IPs.
* Make `detect_specs()` lazy-load: stat the cluster config path before
  parsing the TOML so it isn't deserialized on every command invocation.
* Thread `--cluster` / `--no-cluster` flags through all callers
  (download/recommend/diff/system/info/list/run) via `HardwareOverrides`
  instead of hardcoding `(false, false)`.

Opportunistic improvements ported from later spark-cluster iterations:

* Extract the 0.85 RAM availability factor to a named constant
  `CLUSTER_AVAILABLE_RAM_FACTOR` with a doc comment explaining it accounts
  for OS/orchestrator overhead.
* Warn on non-numeric / out-of-range input during interactive cluster init
  instead of silently falling back to a default; empty input still accepts
  the default.

Test strategy:

* dedupe test uses fully synthetic `HfModelEntry` fixtures (the previous
  draft coupled to real model database content like `Qwen/Qwen3-Coder-Next`,
  which would break on routine model database updates).
* New unit tests cover gpu_count multiplier, unified-memory aggregation,
  per-node backend serde roundtrip, backward-compat with backend-less TOML,
  and `derive_cluster_backend` heterogeneous/homogeneous behavior.

Closes AlexsJones#279.

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
@michaeljabbour

Copy link
Copy Markdown
Contributor Author

@three-foxes-in-a-trenchcoat — thanks for the re-review and approval!

Pushed 6d9df6b rebased onto current upstream/main (928390b); 7 new commits landed upstream during review (the benchmarks view + hardware picker + leaderboard API), which is what flipped the merge state to DIRTY again. Conflicts were both-added independent additions — fully mechanical resolution:

  • llmfit-core/src/lib.rs — kept both pub mod benchmarks; and pub mod cluster; (alphabetical)
  • llmfit-tui/src/main.rs — kept upstream's api_key field alongside our cluster / no_cluster flags

Verified locally: cargo fmt, cargo build, cargo test --all (352 passing), cargo clippy all green; no new warnings on the cluster code. CI re-running on the new SHA.

@AlexsJones — ready when you are. All your inline comments addressed in 46394c1, Three Foxes' re-approval is on 5cdb59e (the same code, just rebased one more time onto current main).

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review after author rebased. All blockers from both my review and @AlexsJones change requests have been addressed in 46394c1:

  • ✅ Hardcoded Cuda backend fixed — now per-node with ROCm support
  • ✅ VRAM 80 GB default removed — uses Ray discovery or interactive prompts
  • ✅ VRAM per-gpu-count fixed
  • ✅ unified_memory properly handled
  • ✅ dedupe_hf_entries wired up
  • ✅ run_download threads cluster flags through all commands
  • ✅ dirs crate usage, u8 overflow, stat(2) gating all fixed

CI is green, merge state is CLEAN. My approval still stands from the previous rebase. @AlexsJones the code addressing your change requests landed in 46394c1 — ready for your re-review.

michaeljabbour added a commit to michaeljabbour/llmfit-spark that referenced this pull request May 3, 2026
…etrics

Adds the `llmfit bench` subcommand for live inference benchmarking against
running providers (Ollama, vLLM, MLX), with role-based quality scoring,
agentic test suites, and routing matrix. TUI integration provides a results
view and a routing/role view alongside the existing community benchmarks
view.

Addresses review feedback:

* Resolved merge conflicts against current `upstream/main` (~135 commits
  behind original base; rebased cleanly onto v0.9.18 tip via squash).
* Migrated `serde_yaml` (deprecated) to `serde_yml` per Three Foxes' note
  (`0.0.x` series; drop-in API).
* `OLLAMA_HOST` env var override is wired through bench.rs and the TUI
  start path, paralleling the existing `VLLM_PORT` override.
* Renamed `bench_scroll` → `live_bench_scroll` in the App struct to avoid
  silent shadowing of the upstream community-leaderboard `bench_scroll`
  field added in #4d0ffa7.
* Live-bench keybinding is `B` (uppercase); upstream's `'b'` already opens
  the community leaderboard view.

Cross-PR coordination with AlexsJones#279 (cluster):

* `dedupe_hf_entries` is the cluster PR's complete version (handles all
  current upstream `HfModelEntry` fields including `num_attention_heads`,
  `num_key_value_heads`, `num_hidden_layers`, `head_dim`, `license`),
  wired into `load_embedded()`. Uses `canonical_slug()` as the dedup key.
  Synthetic `test_dedupe_hf_entries_merges_duplicate_metadata` test is
  included; the prior data-coupled `test_model_database_deduplicates_…`
  test is intentionally NOT carried forward (fragile to model DB churn).

Test strategy:

* `cargo fmt --all`: clean.
* `cargo build`: clean.
* `cargo test --all`: 350 passing, 0 failed, 1 ignored.
* `cargo clippy`: no new warnings on bench/quality code or modified files;
  pre-existing upstream warnings (fit.rs, providers.rs, hardware.rs,
  update.rs) untouched.

Skipped scope-creep items from review:

* Splitting `benchmarks.yaml` (1456 lines) into per-role files — left as a
  follow-up.

Closes AlexsJones#278.

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
AlexsJones pushed a commit that referenced this pull request May 3, 2026
…etrics (#278)

Adds the `llmfit bench` subcommand for live inference benchmarking against
running providers (Ollama, vLLM, MLX), with role-based quality scoring,
agentic test suites, and routing matrix. TUI integration provides a results
view and a routing/role view alongside the existing community benchmarks
view.

Addresses review feedback:

* Resolved merge conflicts against current `upstream/main` (~135 commits
  behind original base; rebased cleanly onto v0.9.18 tip via squash).
* Migrated `serde_yaml` (deprecated) to `serde_yml` per Three Foxes' note
  (`0.0.x` series; drop-in API).
* `OLLAMA_HOST` env var override is wired through bench.rs and the TUI
  start path, paralleling the existing `VLLM_PORT` override.
* Renamed `bench_scroll` → `live_bench_scroll` in the App struct to avoid
  silent shadowing of the upstream community-leaderboard `bench_scroll`
  field added in #4d0ffa7.
* Live-bench keybinding is `B` (uppercase); upstream's `'b'` already opens
  the community leaderboard view.

Cross-PR coordination with #279 (cluster):

* `dedupe_hf_entries` is the cluster PR's complete version (handles all
  current upstream `HfModelEntry` fields including `num_attention_heads`,
  `num_key_value_heads`, `num_hidden_layers`, `head_dim`, `license`),
  wired into `load_embedded()`. Uses `canonical_slug()` as the dedup key.
  Synthetic `test_dedupe_hf_entries_merges_duplicate_metadata` test is
  included; the prior data-coupled `test_model_database_deduplicates_…`
  test is intentionally NOT carried forward (fragile to model DB churn).

Test strategy:

* `cargo fmt --all`: clean.
* `cargo build`: clean.
* `cargo test --all`: 350 passing, 0 failed, 1 ignored.
* `cargo clippy`: no new warnings on bench/quality code or modified files;
  pre-existing upstream warnings (fit.rs, providers.rs, hardware.rs,
  update.rs) untouched.

Skipped scope-creep items from review:

* Splitting `benchmarks.yaml` (1456 lines) into per-role files — left as a
  follow-up.

Closes #278.

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-authored-by: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

Verdict: Approve

Adds remote GPU cluster support for multi-node model fitting. This is a significant feature:

  • New cluster.rs module for cluster management
  • Updates to models.rs for multi-node awareness
  • TUI integration in main.rs

The feature addresses a real pain point — fitting models across multiple GPUs/nodes. The implementation looks solid:

  • Clean separation of cluster logic into its own module
  • Proper integration with existing model fitting pipeline
  • CI checks all passing (including Rustfmt)

This is a good addition to the project. LGTM.

Adds ClusterConfig and ClusterNode types for representing multi-node
clusters with TOML persistence, IP/hostname validation, Ray Dashboard
auto-discovery, and CLI/TUI integration via --cluster / --no-cluster
flags plus an `llmfit cluster {init,status,clear}` subcommand.

Rebased onto current upstream/main (post-PR AlexsJones#278 bench merge):

* `dedupe_hf_entries` is now in main from AlexsJones#278 — this rebase relies on
  the existing function (no duplicate added) since both PRs converged
  on the same implementation by design.
* Cluster CLI integration coexists with the bench CLI integration that
  landed via AlexsJones#278 — no overlap or competing keybindings.
* Cluster module declared in lib.rs alphabetically between `benchmarks`
  and `fit`, alongside the new `bench` and `quality` modules.

All earlier review feedback from PR AlexsJones#279 (Alex + Three Foxes) remains
addressed:

* `total_vram_gb()` multiplies per-GPU VRAM by `gpu_count` per node.
* `GpuBackend` is per-node (`Option<GpuBackend>` on `ClusterNode`) with
  `derive_cluster_backend()` warning helper; backward-compatible with
  TOML configs that omit the field.
* `unified_memory` aggregated across nodes (Apple Silicon / DGX Spark
  GB10 clusters detected correctly).
* Ray discovery prompts interactively for VRAM (no 80 GB hardcode);
  prominent warning + per-node prompt when GPU names differ.
* Uses `dirs::config_dir()` for cluster.toml location (same pattern as
  upstream's AlexsJones#437 dirs-crate refactor).
* `u8::try_from` for IP increment so 256+ node clusters get a clear
  error instead of silent wrap.
* `detect_specs()` lazy-loads via `path.exists()` before parsing TOML.
* `--cluster` / `--no-cluster` threaded through `HardwareOverrides` to
  all callers (no hardcoded `(false, false)` in download/recommend).
* `0.85` RAM availability factor extracted to `CLUSTER_AVAILABLE_RAM_FACTOR`
  constant.
* Non-numeric input warnings during interactive cluster init (empty
  input still accepts the default).
* Dedupe test uses synthetic `HfModelEntry` fixtures (no coupling to
  real model database content).

Test strategy: 376 tests passing, `cargo fmt` clean, `cargo build`
clean, `cargo clippy` zero new warnings on cluster code.

Closes AlexsJones#279.

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
@michaeljabbour michaeljabbour force-pushed the feat/remote-cluster branch from 6d9df6b to e4e3635 Compare May 3, 2026 13:20
@michaeljabbour

Copy link
Copy Markdown
Contributor Author

@AlexsJones @three-foxes-in-a-trenchcoat — rebased onto current upstream/main (post-#278 merge). New commit on top of 8896fdb.

What changed since the previous push:

  • dedupe_hf_entries now lives in main from feat: add llmfit bench for live inference benchmarking #278 (we deliberately ported the cluster PR's improved version into the bench PR for cross-PR convergence) — this rebase no longer adds it; just relies on the existing function.
  • Cluster CLI integration coexists cleanly with the new bench CLI integration. No overlapping flags or keybindings.
  • pub mod cluster; added to lib.rs alphabetically between benchmarks and fit (alongside the new bench/quality modules).
  • Cargo.toml adds toml = "0.8" for cluster config parsing.

All earlier review feedback (Alex's 6 blockers + Three Foxes' 4 nits) remains addressed; spec-tracked in the commit message.

Verification:

  • cargo fmt --all: clean
  • cargo build: clean
  • cargo test --all: 376 passing, 0 failed, 1 ignored
  • cargo clippy: zero new warnings on cluster code
  • CI re-running on the new SHA

Ready when you are.

@bertwesarg

Copy link
Copy Markdown

is this Hardware Simulation (S)-ready?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants