A lightweight, LinearSVC-style crate for Rust:
- Linear SVM with
hinge/squared-hinge - Two solvers: Pegasos (primal SGD) and DCD (LIBLINEAR-style dual coordinate descent + shrinking)
- CSR sparse input
- Multiclass strategies:
Binary,OneVsRest,OneVsOne - Builder-style params with
Solver::Autoheuristic - Optional Platt calibration stub for probabilities (Binary)
use light_svm::{CsrMatrix, LinearSVC, ClassStrategy, SvmParams, Loss, Solver, PlattCalibrator, DecisionScores};
let x = CsrMatrix::from_dense(&vec![ vec![2.0, 1.0], vec![-1.0, -2.0] ], 0.0);
let y = vec![1, -1];
let params = SvmParams::builder()
.c(1.0)
.loss(Loss::Hinge)
.fit_intercept(true)
.tol(1e-3)
.solver(Solver::Auto)
.build();
let mut svc = LinearSVC::builder()
.class_strategy(ClassStrategy::Binary)
.params(params)
.build();
svc.fit(&x, &y);
// Decision function
match svc.decision_function(&x) {
DecisionScores::Binary { classes, scores } => {
println!("classes={:?}, scores={:?}", classes, &scores[..]);
}
_ => unreachable!(),
}
// Calibrated probabilities (Binary)
let scores = match svc.decision_function(&x) { DecisionScores::Binary{scores, ..} => scores, _ => unreachable!() };
let pos = *y.iter().max().unwrap();
let y01: Vec<u8> = y.iter().map(|&yy| if yy==pos {1} else {0}).collect();
let calib = PlattCalibrator::fit(&scores, &y01);
svc.with_calibration(calib);
let proba = svc.predict_proba(&x); // Vec<[P(neg), P(pos)]>let mut svc2 = LinearSVC::builder()
.class_strategy(ClassStrategy::OneVsRest)
.c(1.0).loss(Loss::Hinge).fit_intercept(true)
.tol(1e-3).solver(Solver::Auto)
.build();The decision_function returns a strategy-aligned enumeration:
DecisionScores::Binary { classes: [neg,pos], scores: Vec<f32> }scores[i]is the raw marginw·x_i + b; positive =>pos.
DecisionScores::OneVsRest { classes: Vec<i32>, scores: Vec<Vec<f32>> }scoresis shaped rows × classes, aligned toclasses.
DecisionScores::OneVsOne { pairs: Vec<(i32,i32)>, scores: Vec<Vec<f32>> }scoresis rows × pairs; positive means vote for the first class in the pair.
LinearSVC::predict_proba(&self, x)returnsVec<[P(neg), P(pos)]>for Binary models.- It uses a stored
PlattCalibrator, attached viasvc.with_calibration(calib). - Alternatively, call
svc.predict_proba_with(x, &calib)without storing.
Note
Multiclass probability calibration (OvR/OvO) often uses one-vs-rest Platt or isotonic with normalization (e.g., pairwise coupling). The crate keeps a light stub for Binary; multiclass calibration can be added later.
tolcontrols when DCD stops via the projected-gradient (PG) gap: stop whenPGmax - PGmin ≤ tol.- Practical defaults:
1e-2for quick training / rough models.1e-3(default) for balanced speed/accuracy.1e-4for tighter convergence (slower).
- Pegasos ignores
tol; usemax_epochsto trade accuracy vs time.
- Set per-class C directly:
.c_by_class(neg, pos)or.c_neg(v),.c_pos(v). - If provided, these override
class_weight_*. - If not provided:
C_-1 = c * class_weight_neg,C_+1 = c * class_weight_pos. - DCD uses
C_iexactly; Pegasos scales updates by(C_i / c).
.eval_every(k).verbose(true)prints metrics everykpasses:pgmax,pgmin,kkt = pgmax - pgmin- Primal / Dual objectives and duality gap
- The binary
LinearSvmsummary carrieskkt_historyandgap_history.
Solver::Autopicks DCD for sparse, in-memory problems with ≤ 200k features, ≤ 2e8 nonzeros, and density ≤ 1%; otherwise Pegasos.
- Add a new optimizer by implementing
fit_binary_*insolver.rsand routing viaSolver. - Add kernel SVMs with a
Kerneltrait and aKernelSvmmodel; theLinearSVCAPI remains. - Add multiclass probability calibration (pairwise coupling / isotonic) as a follow-up.
The repo keeps shared CSV fixtures under tests/data/.
If you run the Python sandboxes or the Rust examples with --write-data, they will (re)generate the
expected files in that directory, and the integration tests read from the same location. Make sure the
tests/data folder exists before running the examples:
python sandbox/data_flair.py --write-data # writes tests/data/train.csv + tests/data/test.csv
python sandbox/iris_multiclass.py --write-data # writes tests/data/iris_train.csv + iris_test.csvTip: Rayon-powered helpers are enabled by default (disable with
--no-default-features if desired); you can still add
RUSTFLAGS="-C target-feature=+avx2" to squeeze the most out of SIMD-heavy
sections on capable CPUs.