Jurij Jukić

My name is Jurij (pronounced “Yuri”).

My current goal is to create a static, predictive theory of training. Training seems to be governed by only a few variables (architecture, seed, data) with a lot of invariance across hyperparameters, seeds or datasets.

My suspicion is that the way we currently do training is not computationally irreducible, but rather, given a better mathematical understanding of it, we could literally “skip” the training, predict the solution and jump straight to it. Whether this is possible or not, trying to answer that question will be informative.

We know that for adversarially constructed datasets, finding the optimal weights is NP-hard, but for most datasets, this may not be the case, and in fact, we have no a priori reason to believe that the current architectures are the best fit for the data.

So, why can’t we predict the outcome of training?

[Post] Analytic View of Training

In this post I introduce an analytic way of looking at training: a set of “accounting” equations that connect what happens during training without going into the mechanics of it. No matmul, no circuits, only n-dimensional spaces where dots move around. Simple calculus and Euclidean geometry, purely observational.

[Paper] Key Search Might Explain Neural Network Training

[Video] Brief History of Complexity; Logical Depth and Neural Networks

I go through the history of complexity: entropy, Kolmogorov complexity, Levin complexity, minimum description length, epiplexity, logical depth, and multiscale logical depth. I compare program length, runtime, and precision aspects of these theories. I relate them to neural network training dynamics and conjecture that logical depth is the most useful one.

[Post] Logical Depth as a Framework for Understanding Neural Networks

In this post I briefly introduce how we can use Charles Bennett’s logical depth as a general framework to understand neural networks. I find it incredibly rich, and one can apply it in many interesting ways when thinking about the training pipeline and interpretability. Intro - entropy and Kolmogorov complexity If one were to try to mathematically describe a neural network, neither entropy nor Kolmogorov complexity seem sufficient. Entropy could be described as a measure of average surprise. Order is, on average, very unsurprising, because we can easily predict where the particles are located (assuming a physics metaphor). Disorder is on average surprising, because we can never really predict where the next particle will show up. Neither end of the spectrum of entropy, order or disorder, seems to capture the intelligence that is contained within a neural network. ...