light-svm

A lightweight, LinearSVC-style crate for Rust:

Linear SVM with hinge / squared-hinge
Two solvers: Pegasos (primal SGD) and DCD (LIBLINEAR-style dual coordinate descent + shrinking)
CSR sparse input
Multiclass strategies: Binary, OneVsRest, OneVsOne
Builder-style params with Solver::Auto heuristic
Optional Platt calibration stub for probabilities (Binary)

Quick start

use light_svm::{CsrMatrix, LinearSVC, ClassStrategy, SvmParams, Loss, Solver, PlattCalibrator, DecisionScores};

let x = CsrMatrix::from_dense(&vec![ vec![2.0, 1.0], vec![-1.0, -2.0] ], 0.0);
let y = vec![1, -1];

let params = SvmParams::builder()
    .c(1.0)
    .loss(Loss::Hinge)
    .fit_intercept(true)
    .tol(1e-3)
    .solver(Solver::Auto)
    .build();

let mut svc = LinearSVC::builder()
    .class_strategy(ClassStrategy::Binary)
    .params(params)
    .build();

svc.fit(&x, &y);

// Decision function
match svc.decision_function(&x) {
    DecisionScores::Binary { classes, scores } => {
        println!("classes={:?}, scores={:?}", classes, &scores[..]);
    }
    _ => unreachable!(),
}

// Calibrated probabilities (Binary)
let scores = match svc.decision_function(&x) { DecisionScores::Binary{scores, ..} => scores, _ => unreachable!() };
let pos = *y.iter().max().unwrap();
let y01: Vec<u8> = y.iter().map(|&yy| if yy==pos {1} else {0}).collect();
let calib = PlattCalibrator::fit(&scores, &y01);
svc.with_calibration(calib);
let proba = svc.predict_proba(&x); // Vec<[P(neg), P(pos)]>

Inline builder style

let mut svc2 = LinearSVC::builder()
    .class_strategy(ClassStrategy::OneVsRest)
    .c(1.0).loss(Loss::Hinge).fit_intercept(true)
    .tol(1e-3).solver(Solver::Auto)
    .build();

Decision function shapes

The decision_function returns a strategy-aligned enumeration:

DecisionScores::Binary { classes: [neg,pos], scores: Vec<f32> }
- scores[i] is the raw margin w·x_i + b; positive => pos.
DecisionScores::OneVsRest { classes: Vec<i32>, scores: Vec<Vec<f32>> }
- scores is shaped rows × classes, aligned to classes.
DecisionScores::OneVsOne { pairs: Vec<(i32,i32)>, scores: Vec<Vec<f32>> }
- scores is rows × pairs; positive means vote for the first class in the pair.

`predict_proba` (Binary)

LinearSVC::predict_proba(&self, x) returns Vec<[P(neg), P(pos)]> for Binary models.
It uses a stored PlattCalibrator, attached via svc.with_calibration(calib).
Alternatively, call svc.predict_proba_with(x, &calib) without storing.

Note

Multiclass probability calibration (OvR/OvO) often uses one-vs-rest Platt or isotonic with normalization (e.g., pairwise coupling). The crate keeps a light stub for Binary; multiclass calibration can be added later.

Solver tolerance (`tol`): practical defaults

tol controls when DCD stops via the projected-gradient (PG) gap: stop when PGmax - PGmin ≤ tol.
Practical defaults:
- 1e-2 for quick training / rough models.
- 1e-3 (default) for balanced speed/accuracy.
- 1e-4 for tighter convergence (slower).
Pegasos ignores tol; use max_epochs to trade accuracy vs time.

Per-class C vs class weights

Set per-class C directly: .c_by_class(neg, pos) or .c_neg(v), .c_pos(v).
If provided, these override class_weight_*.
If not provided: C_-1 = c * class_weight_neg, C_+1 = c * class_weight_pos.
DCD uses C_i exactly; Pegasos scales updates by (C_i / c).

Diagnostics (DCD)

.eval_every(k).verbose(true) prints metrics every k passes:
- pgmax, pgmin, kkt = pgmax - pgmin
- Primal / Dual objectives and duality gap
The binary LinearSvm summary carries kkt_history and gap_history.

Auto solver selection

Solver::Auto picks DCD for sparse, in-memory problems with ≤ 200k features, ≤ 2e8 nonzeros, and density ≤ 1%; otherwise Pegasos.

Extending the crate

Add a new optimizer by implementing fit_binary_* in solver.rs and routing via Solver.
Add kernel SVMs with a Kernel trait and a KernelSvm model; the LinearSVC API remains.
Add multiclass probability calibration (pairwise coupling / isotonic) as a follow-up.

Sandbox / integration datasets

The repo keeps shared CSV fixtures under tests/data/.
If you run the Python sandboxes or the Rust examples with --write-data, they will (re)generate the expected files in that directory, and the integration tests read from the same location. Make sure the tests/data folder exists before running the examples:

python sandbox/data_flair.py --write-data      # writes tests/data/train.csv + tests/data/test.csv
python sandbox/iris_multiclass.py --write-data # writes tests/data/iris_train.csv + iris_test.csv

Performance notes

Tip: Rayon-powered helpers are enabled by default (disable with --no-default-features if desired); you can still add RUSTFLAGS="-C target-feature=+avx2" to squeeze the most out of SIMD-heavy sections on capable CPUs.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
examples		examples
sandbox		sandbox
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

light-svm

Quick start

Inline builder style

Decision function shapes

`predict_proba` (Binary)

Solver tolerance (`tol`): practical defaults

Per-class C vs class weights

Diagnostics (DCD)

Auto solver selection

Extending the crate

Sandbox / integration datasets

Performance notes

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

light-svm

Quick start

Inline builder style

Decision function shapes

predict_proba (Binary)

Solver tolerance (tol): practical defaults

Per-class C vs class weights

Diagnostics (DCD)

Auto solver selection

Extending the crate

Sandbox / integration datasets

Performance notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`predict_proba` (Binary)

Solver tolerance (`tol`): practical defaults

Packages