This repository contains the implementation for the IJCAI 2025 paper:
Hiroki Shiraishi, Hisao Ishibuchi, and Masaya Nakata. 2025. X-KAN: Optimizing Local Kolmogorov-Arnold Networks via Evolutionary Rule-Based Machine Learning. In Proceedings of the 34th International Joint Conference on Artificial Intelligence (IJCAI '25) Main Track. Pages 8930-8938. https://doi.org/10.24963/ijcai.2025/993.
An extended version of this paper, including appendices, are available at: https://doi.org/10.48550/arXiv.2505.14273.
This repository provides the implementation of X-KAN, a novel function approximation method that integrates KAN (Kolmogorov-Arnold Networks) [1] with an evolutionary rule-based machine learning algorithm called XCSF (X Classifier System for Function Approximation) [2] [3]. The evolutionary framework is written in Julia, with Python integration for the KAN models.
X-KAN is a function approximation method that combines KAN's high representational power with XCSF's adaptive partitioning capability. Unlike conventional neural networks that rely on a single global model, X-KAN represents the entire input space using multiple local KAN models, as shown above. Specifically:
-
Localized KAN Modeling: Each rule in X-KAN consists of an antecedent defined as an n-dimensional hyperrectangle (IF part) and a consequent (THEN part) implemented as a single KAN model.
-
Dual Optimization Framework: Utilizing the XCSF framework, X-KAN constructs general and accurate local KAN models through evolutionary optimization of rule antecedents and backpropagation-based consequent refinement.
-
Divide-and-Conquer Architecture: Unlike global models, X-KAN integrates multiple specialized local KAN models through adaptive input space partitioning.
X-KAN has demonstrated superior performance compared to XCSF, Multi-Layer Perceptron (MLP) [4], and KAN across various benchmark problems, achieving significant reductions in approximation error while maintaining compact rulesets.
A schematic illustration of X-KAN is shown above. The run cycle depends on the type of run: training or testing. During the training mode, X-KAN searches for an accurate and maximally general ruleset,
-
Match Set Formation: For each training data point, X-KAN identifies the set of rules whose local region (antecedent) matches the input.
-
Covering Operation: If no existing rule matches the input, a new rule is created to cover that region.
-
Rule Update: The local KAN model in each matching rule is updated using backpropagation, and the rule’s fitness is updated using the Widrow-Hoff learning rule.
-
Evolutionary Search: Periodically, the algorithm selects parent rules based on fitness, applies crossover and mutation to create new rules, and integrates them into the population.
-
Subsumption: More general rules can absorb similar, more specific rules to keep the rule set compact.
-
Rule Compaction: After training, redundant rules are removed to produce a concise and efficient set of local models.
-
Match Set Formation: For each test input, X-KAN finds the rules whose local region includes the input.
-
Single Winner Selection: The rule with the highest fitness is selected as the "winner."
-
Prediction: The winner’s KAN model is used to make the final prediction for the input.
- Ubuntu 24.04.1 LTS or WSL2
- Julia v1.10.5 LTS or higher
- Python 3.9.7 or higher
Once Julia is installed, you can install the required Julia packages with the following command via the command line interface.
julia packages.jlOnce Python is installed, you can install the required Python packages with the following command via the command line interface.
pip install -r requirements.txtYou can train a X-KAN model using any tabular dataset (e.g., yourfile.csv) for regression via the command line interface. Note that:
- The rightmost column of the CSV file must represent the actual real value to be predicted, while all other columns should constitute the input features.
- Any missing values in the dataset should be represented by a question mark (?).
- Ensure that the CSV file does not contain a header row; the data should start from the first line.
Here are examples of how to run the models:
julia ./main.jl --csv=yourfile.csvjulia ./main.jl --csv=yourfile.csv --compare_kan=truejulia ./main.jl --helpTo reproduce the results from the paper:
julia ./main.jl --all=truejulia ./main.jl --all=true --compare_kan=trueThese codes will run experiments on the 8 datasets used in the main paper and output results.
Upon completion of the experiment, X-KAN generates the following files:
summary.csvcompacted_summary.csvglobal_kan_summary.csvclassifier.csvcompacted_classifier.csvparameters.csvtime.csv
X-KAN tracks the following features every epoch:
- Iteration (
Iteration): The number of training instances used up to the epoch - Average Training Mean Absolute Error (
TrainErr): MAE on the training dataset - Average Testing Mean Absolute Error (
TestErr): MAE on the testing dataset - Average Population Size (
PopSize): The number of rules in the population (ruleset) - Occurrence Rate of the Covering Operation (
CovOccRate) - Number of the Subsumption Operation (
SubOccNum)
These values are saved as summary.csv. An example of log output during the experiment is shown below.
Epoch Iteration TrainErr TestErr PopSize CovOccRate SubOccNum
=========== =========== =========== =========== =========== =========== ===========
1 900 0.166245 0.229590 19.000 0.006 1
2 1800 0.166245 0.229590 23.000 0.0 14
3 2700 0.149212 0.214724 18.000 0.0 22
4 3600 0.109026 0.189545 21.000 0.0 32
5 4500 0.091613 0.182431 23.000 0.001 31
6 5400 0.082146 0.203242 30.000 0.001 21
7 6300 0.094334 0.211943 22.000 0.003 22
8 7200 0.080364 0.204120 21.000 0.001 23
9 8100 0.090401 0.208691 21.000 0.0 36
10 9000 0.093264 0.184493 18.000 0.0 34
After applying the single winner-based rule compaction algorithm [5] to the final ruleset compacted_summary.csv .
X-KAN after Rule Compaction:
Training Mean Absolute Error: 0.09326392201698498
Testing Mean Absolute Error: 0.17995177819339678
Compacted Ruleset Size: 12
For comparative analysis, when running X-KAN with the benchmarking flag --compare_kan=true, the system simultaneously executes both X-KAN and a conventional single global KAN model. The performance metric of the global KAN model - including training MAE and testing MAE - are saved as global_kan_summary.csv.
A Single Global KAN Model:
Training Mean Absolute Error: 0.20299688265732746
Testing Mean Absolute Error: 0.24679258616589186
An n-dimensional rule (a.k.a. classifier) k is represented as:
where:
- Rule-Antecedent (IF part)
$\mathbf{A}^k=(\mathbf{l}^k,\mathbf{u}^k)=\bigl(l_{i}^{k}, u_{i}^{k}\bigr)_{i=1}^n$ :- Determines whether the rule matches a given input
-
$x_i$ is conditioned by$\bigl(l_{i}^{k}, u_{i}^{k}\bigr)$ , which is represented by an interval with a lower bound$l_i^k$ and an upper bound$u_i^k$
- Rule-Consequent (THEN part)
$y$ :- Indicates a local KAN model
- Rule-Fitness (WITH part)
$F^k\in(0,1]$ :- Denotes the quality reflecting accuracy
$\kappa^k$ and generality${\rm num}^k$
- Denotes the quality reflecting accuracy
Each rule
- Error
$\epsilon^k\in\mathbb{R}_0^+$ :- Represents absolute approximation error of the consequent function approximation model
- Accuracy
$\kappa^k\in(0,1]$ :- Calculated based on the error
- Experience
${\rm exp}^k\in\mathbb{N}_0$ :- Counts the number of times since its creation that the rule has belonged to a match set
- Time Stamp
${\rm ts}^k\in\mathbb{N}$ :- Denotes the time-step of the last execution of the EA to create new rules, where the rule was considered as a candidate to be a parent
- Match Set Size
${\rm ms}^k\in\mathbb{R}^+$ :- Averages the sizes of the match sets in which the rule has participated
- Generality (a.k.a. Numerosity)
${\rm num}^k\in\mathbb{N}_0$ :- Indicates the number of rules that have been subsumed
The acquired X-KAN ruleset (i.e., classifier.csv as shown below.
| ID | Antecedent | Consequent | Fitness | Error | Accuracy | Experience | Time Stamp | Match Set Size | Numerosity |
|---|---|---|---|---|---|---|---|---|---|
| 98 | [0.209:0.438, 0.0:0.915] | KAN | 0.410814138 | 0.071178369 | 0.280984242 | 1514 | 8976 | 9.296128125 | 3 |
| 145 | [0.0:0.255, 0.0:0.915] | KAN | 0.733842598 | 0.148575569 | 0.134611633 | 1372 | 8959 | 9.208115507 | 7 |
| 181 | [0.0:0.115, 0.0:0.287] | KAN | 0.476522753 | 0.015659895 | 1 | 167 | 8748 | 8.880699421 | 1 |
| 275 | [0.548:0.95, 0.915:0.995] | KAN | 0.781893166 | 0.003271927 | 1 | 140 | 8992 | 6.978722149 | 5 |
| 352 | [0.209:0.381, 0.0:0.915] | KAN | 0.629774426 | 0.067386367 | 0.296795939 | 511 | 8976 | 10.80689915 | 6 |
| 401 | [0.0:0.592, 0.915:0.998] | KAN | 0.972902217 | 0.008250442 | 1 | 147 | 8872 | 9.76802591 | 6 |
| 446 | [0.62:1.0, 0.0:0.818] | KAN | 0.893397233 | 0.135653658 | 0.147434285 | 801 | 8969 | 4.545329739 | 4 |
| 476 | [0.896:1.0, 0.563:1.0] | KAN | 0.591219282 | 0.030992238 | 0.645322875 | 105 | 8992 | 8.398823403 | 3 |
| 595 | [0.589:0.62, 0.0:0.818] | KAN | 0.468114703 | 0.00256358 | 1 | 28 | 8965 | 7.348897755 | 3 |
| 623 | [0.403:0.607, 0.0:0.915] | KAN | 0.484197517 | 0.032312569 | 0.618954189 | 183 | 8965 | 7.110068367 | 3 |
| 657 | [0.589:0.62, 0.0:0.873] | KAN | 0.206033376 | 0.00256358 | 1 | 18 | 8965 | 7.348897755 | 1 |
| 718 | [0.828:0.913, 0.563:1.0] | KAN | 0.408082902 | 0.007539413 | 1 | 10 | 8969 | 8.189997977 | 1 |
| 722 | [0.403:0.62, 0.0:0.818] | KAN | 0.28634317 | 0.034280678 | 0.583419026 | 50 | 8965 | 7.123276783 | 2 |
| 735 | [0.796:1.0, 0.563:1.0] | KAN | 0.075661865 | 0.108398092 | 0.184505092 | 11 | 8992 | 8.034556613 | 1 |
| 748 | [0.403:0.607, 0.0:1.0] | KAN | 0.080864096 | 0.058110521 | 0.344171757 | 11 | 8965 | 7.110068367 | 1 |
| 752 | [0.403:0.62, 0.0:0.915] | KAN | 0.098179685 | 0.035562109 | 0.562396344 | 4 | 8965 | 7.178635651 | 1 |
| 753 | [0.913:1.0, 0.0:0.818] | KAN | 0.12839579 | 0.10062216 | 0.198763374 | 2 | 8969 | 7.041598705 | 1 |
| 754 | [0.62:0.828, 0.513:1.0] | KAN | 0.143667379 | 0.068123007 | 0.29358657 | 3 | 8969 | 5.446660514 | 1 |
The compacted X-KAN ruleset compacted_classifier.csv as shown below.
| ID | Antecedent | Consequent | Fitness | Error | Accuracy | Experience | Time Stamp | Match Set Size | Numerosity |
|---|---|---|---|---|---|---|---|---|---|
| 145 | [0.0:0.255, 0.0:0.915] | KAN | 0.733842598 | 0.148575569 | 0.134611633 | 1372 | 8959 | 9.208115507 | 7 |
| 748 | [0.403:0.607, 0.0:1.0] | KAN | 0.080864096 | 0.058110521 | 0.344171757 | 11 | 8965 | 7.110068367 | 1 |
| 476 | [0.896:1.0, 0.563:1.0] | KAN | 0.591219282 | 0.030992238 | 0.645322875 | 105 | 8992 | 8.398823403 | 3 |
| 401 | [0.0:0.592, 0.915:0.998] | KAN | 0.972902217 | 0.008250442 | 1 | 147 | 8872 | 9.76802591 | 6 |
| 623 | [0.403:0.607, 0.0:0.915] | KAN | 0.484197517 | 0.032312569 | 0.618954189 | 183 | 8965 | 7.110068367 | 3 |
| 275 | [0.548:0.95, 0.915:0.995] | KAN | 0.781893166 | 0.003271927 | 1 | 140 | 8992 | 6.978722149 | 5 |
| 718 | [0.828:0.913, 0.563:1.0] | KAN | 0.408082902 | 0.007539413 | 1 | 10 | 8969 | 8.189997977 | 1 |
| 446 | [0.62:1.0, 0.0:0.818] | KAN | 0.893397233 | 0.135653658 | 0.147434285 | 801 | 8969 | 4.545329739 | 4 |
| 754 | [0.62:0.828, 0.513:1.0] | KAN | 0.143667379 | 0.068123007 | 0.29358657 | 3 | 8969 | 5.446660514 | 1 |
| 98 | [0.209:0.438, 0.0:0.915] | KAN | 0.410814138 | 0.071178369 | 0.280984242 | 1514 | 8976 | 9.296128125 | 3 |
| 352 | [0.209:0.381, 0.0:0.915] | KAN | 0.629774426 | 0.067386367 | 0.296795939 | 511 | 8976 | 10.80689915 | 6 |
| 595 | [0.589:0.62, 0.0:0.818] | KAN | 0.468114703 | 0.00256358 | 1 | 28 | 8965 | 7.348897755 | 3 |
X-KAN has the following hyperparameters [6]:
-
$N$ (N): Maximum size of the ruleset -
$\beta$ (beta): Learning rate for$F^k$ and${\rm ms}^k$ -
$\epsilon_0$ (e0): Target error, under which$\kappa^k$ is set to 1 -
$\theta_{\rm EA}$ (theta_EA): EA invocation frequency. The EA is applied in a match set when the average time since the last EA in the match set is grater than$\theta_{\rm EA}$ -
$\chi$ (chi): Probability of applying crossover in the EA -
$\mu$ (mu): Probability of mutating an allele in the offspring -
$m_0$ (m0): Maximum change amount in mutation -
$\theta_{\rm del}$ (theta_del): The deletion threshold. If${\rm exp}^k$ is greater than$\theta_{\rm del}$ ,$F^k$ may be considered in its probability of deletion -
$\delta$ (delta): Fraction of the mean fitness in the ruleset below which the fitness of a rule may be considered in its probability of deletion -
$P_\#$ (P_hash): Probability of using$[0,1]$ in one attribute in$\mathbf{A}^k$ when covering -
$r_0$ (r0): Maximum range in covering -
$F_I$ (F_I): Rule initial fitness -
$\tau$ (tau): Tournament size in parent selection -
$doSubsumption$ (doSubsumption): Whether subsumption is performed
The used hyperparameter values are saved as parameters.csv as shown below.
| Parameter | Value |
|---|---|
| N | 50 |
| beta | 0.2 |
| e0 | 0.02 |
| theta_EA | 100 |
| chi | 0.8 |
| mu | 0.04 |
| m0 | 0.1 |
| theta_del | 100 |
| delta | 0.1 |
| P_hash | 0.8 |
| r0 | 1.0 |
| F_I | 0.01 |
| tau | 0.4 |
| doSubsumption | TRUE |
The runtime (sec.) is saved as time.csv as shown below.
44.36
We use the implementation of KAN from pykan repository as a rule consequent in X-KAN.
The copyright of this X-KAN repository belongs to the authors in the Evolutionary Intelligence Research Group (Nakata Lab) at Yokohama National University, Japan. You are free to use this code for research purposes. In such cases, we kindly request that you cite the following paper:
Hiroki Shiraishi, Hisao Ishibuchi, and Masaya Nakata. 2025. X-KAN: Optimizing Local Kolmogorov-Arnold Networks via Evolutionary Rule-Based Machine Learning. In Proceedings of the 34th International Joint Conference on Artificial Intelligence (IJCAI '25) Main Track. Pages 8930-8938. https://doi.org/10.24963/ijcai.2025/993.
@inproceedings{shiraishi2025xkan,
title = {X-KAN: Optimizing Local Kolmogorov-Arnold Networks via Evolutionary Rule-Based Machine Learning},
author = {Shiraishi, Hiroki and Ishibuchi, Hisao and Nakata, Masaya},
booktitle = {Proceedings of the Thirty-Fourth International Joint Conference on
Artificial Intelligence, {IJCAI-25}},
publisher = {International Joint Conferences on Artificial Intelligence Organization},
editor = {James Kwok},
pages = {8930--8938},
year = {2025},
month = {8},
note = {Main Track},
doi = {10.24963/ijcai.2025/993},
url = {https://doi.org/10.24963/ijcai.2025/993},
}[1] Liu, Ziming, et al. "KAN: Kolmogorov-arnold networks." arXiv preprint arXiv:2404.19756 (2024). https://doi.org/10.48550/arXiv.2404.19756 [↑]
[2] Wilson, Stewart W. "Classifiers that approximate functions." Natural Computing 1.2 (2002): 211-234. https://doi.org/10.1023/A:1016535925043 [↑]
[3] Preen, Richard J., Stewart W. Wilson, and Larry Bull. "Autoencoding with a classifier system." IEEE Transactions on Evolutionary Computation 25.6 (2021): 1079-1090. https://doi.org/10.1109/TEVC.2021.3079320 [↑]
[4] Cybenko, George. "Approximation by superpositions of a sigmoidal function." Mathematics of control, signals and systems 2.4 (1989): 303-314. https://doi.org/10.1007/BF02551274 [↑]
[5] Orriols-Puig, Albert, Jorge Casillas, and Ester Bernadó-Mansilla. "Fuzzy-UCS: a Michigan-style learning fuzzy-classifier system for supervised learning." IEEE Transactions on Evolutionary Computation 13.2 (2008): 260-283. https://doi.org/10.1109/TEVC.2008.925144 [↑]
[6] Butz, Martin V., and Stewart W. Wilson. "An algorithmic description of XCS." Soft Computing 6 (2002): 144-153. https://doi.org/10.1007/s005000100111 [↑]