Residual Calculation Fix and Miscellaneous Improvements by TheMatt2 · Pull Request #45 · pberkes/big_O

TheMatt2 · 2022-08-21T20:48:16Z

Fixes an issue with how residuals are calculated and improves the reliability of test cases.

Residual Calculation Fix

tl;dr

The residual value returned from np.linalg.lstsq(x, y, rcond=-1) for Polynomial and
Exponential classes can not be meaningfully compared with the other complexity classes.

big_O/big_o/complexities.py

Line 36 in 0ed1e3d

coeff, residuals, rank, s = np.linalg.lstsq(x, y, rcond=-1)

The issue is the transform used in Polynomial and Exponential complexity classes to correctly
generate the coefficients causes the residual calculation to be invalidated.

big_O/big_o/complexities.py

Lines 189 to 190 in 0ed1e3d

    
           def _transform_time(self, t): 
        
               return np.log(t)

The solution is to recalculate the residuals manually:

ref_t = self.compute(n)
residuals = np.sum((ref_t - t) ** 2)

Steps to Reproduce

Here is an example that demonstrates big_O misclassifying a quadratic function because of this issue.

import big_o
import numpy as np
import matplotlib.pyplot as plt

# Set seed for consistent results
rng = np.random.default_rng(1234)

measures = np.linspace(1, 10, 100)

# Quadratic Function
func = lambda x: x**2

# Add randomness so its not exactly a quadratic
times = func(measures + np.abs(rng.standard_normal(measures.size)) * .1) + np.abs(rng.standard_normal(measures.size))

# Run big_o to get the best result
best, fitted = big_o.infer_big_o_class(measures, times, verbose = True)
# Observe best is *not* quadatic

# Lookup the fitted function for quadratic and exponential
quadatic = next(complexity for complexity in fitted if isinstance(complexity, big_o.complexities.Quadratic))
exponential = next(complexity for complexity in fitted if isinstance(complexity, big_o.complexities.Exponential))

# Plot the "best", second best, and the actual solution
plt.plot(measures, times, "-", label = "Original")
plt.plot(measures, best.compute(measures), "--", label = "Best " + str(best))
plt.plot(measures, quadatic.compute(measures), "--", label = quadatic)
plt.plot(measures, exponential.compute(measures), "--", label = exponential)
plt.xlabel("n")
plt.ylabel("t (sec)")
plt.legend()
plt.show()

Constant: time = 39 (sec) (r=90378.409421)
Linear: time = -23 + 11*n (sec) (r=4052.834495)
Quadratic: time = 1.2 + 1*n^2 (sec) (r=89.693176)
Cubic: time = 10 + 0.1*n^3 (sec) (r=2269.462214)
Polynomial: time = 1.6 * x^1.8 (sec) (r=1.323089)
Logarithmic: time = -30 + 45*log(n) (sec) (r=19352.077840)
Linearithmic: time = -4.4 + 4.3*n*log(n) (sec) (r=1389.322845)
Exponential: time = 2.8 * 1.5^n (sec) (r=6.137839)

As for can see in the calculated residuals, the best matching functions should be Polynomial with a residual of 1.32, followed by Exponential with a residual of 6.14, and followed distantly by Quadratic with a residual 89.70.
This is surprising, since the original function is Quadratic, yet Polynomial and Exponential were found to be better matches.

If we look at the graph, we can see this is an issue.

Blue is the original function.
Green is the Quadratic complexity, which seems to match the original input quite well.
Orange is the Polynomial complexity, which seems to trail off of the original function at the tail end.
Red is Exponential complexity, which does not seem to match the original function very well at all, yet had a lower residual than Quadratic.

This is an issue. big_o.infer_big_o_class should have identified Quadratic as the best complexity.

The Math

coeff, residuals, rank, s = np.linalg.lstsq(x, y, rcond=-1)

$n_i$ is the input measurement ($i$ is the index for multiple measurements)

$t_i$ is the time measured for the cooresponding $n_i$

For each complexity, there are functions $F(n_i)$ and $T(t_i)$ that transforms $n_i$
and $t_i$ before they are passed to np.linalg.lstsq().

The result of np.linalg.lstsq() are coeff, which is referenced as $c$ and residuals
which is referenced at $r$.

The other outputs from np.linalg.lstsq() are unused.

$$x_i = F(n_i)$$ $$y_i = T(t_i)$$

Complexity	$F(n)$	$T(t)$	$T^-1(t')$
Constant	$1$	$t$	$t$
Linear	$n$	$t$	$t$
Quadratic	$n^2$	$t$	$t$
Cubic	$n^3$	$t$	$t$
Logarithmic	$ln(n)$	$t$	$t$
Linearithmic	$nln(n)$	$t$	$t$
Polynomial	$ln(n)$	$ln(t)$	$e^t$
Exponential	$n$	$ln(t)$	$e^t$

The residual from np.linalg.lstsq() is calculated as the sum of squares of the calculated coefficiences.

$$t_i' = c_{1}F(n_i)+c_0$$

This can be expressed by the following equation for a given $F(n_i)$ and $T(t_i)$ with the resulting coefficience $c$:

$$r=\sum_{i=0}^{k}(t_i'-T(t_i))^2$$

However the equation for error, the distance between the equation represented by the complexity and the inputted function is calculated by applying $T^{-1}(t')$ to $t_i'$ instead of applying $T(t)$ to $t_i$.

$$s=\sum_{i=0}^{k}(T^{-1}(t_i')-t_i)^2$$

For must of the complexity functions, where $T(t) = t$, this error calculation is equal to the residual calculation.

But these are not equal for Polynomial and Exponential. This is shown in the table below:

Complexity	Residual	Error	Residual = Error
Constant	$$\sum_{i=0}^{k}(t_i'-t_i)^2$$	$$\sum_{i=0}^{k}(t_i'-t_i)^2$$	✅
Linear	$$\sum_{i=0}^{k}(t_i'-t_i)^2$$	$$\sum_{i=0}^{k}(t_i'-t_i)^2$$	✅
Quadratic	$$\sum_{i=0}^{k}(t_i'-t_i)^2$$	$$\sum_{i=0}^{k}(t_i'-t_i)^2$$	✅
Cubic	$$\sum_{i=0}^{k}(t_i'-t_i)^2$$	$$\sum_{i=0}^{k}(t_i'-t_i)^2$$	✅
Logarithmic	$$\sum_{i=0}^{k}(t_i'-t_i)^2$$	$$\sum_{i=0}^{k}(t_i'-t_i)^2$$	✅
Linearithmic	$$\sum_{i=0}^{k}(t_i'-t_i)^2$$	$$\sum_{i=0}^{k}(t_i'-t_i)^2$$	✅
Polynomial	$$\sum_{i=0}^{k}(t_i'-ln(t_i))^2$$	$$\sum_{i=0}^{k}(e^{t_i'}-t_i)^2$$	❌
Exponential	$$\sum_{i=0}^{k}(t_i'-ln(t_i))^2$$	$$\sum_{i=0}^{k}(e^{t_i'}-t_i)^2$$	❌

So for Polynomial and Exponential, the residual must be recalculated to be equal to the error.

Test Cases Improvements

From testing, it seems that specifically the test big_o.test.test_big_o.TestBigO seems to fail often.

% python -m unittest -v
test_big_o (big_o.test.test_big_o.TestBigO) ... FAIL
test_big_o_return_raw_data_default (big_o.test.test_big_o.TestBigO) ... ok
test_big_o_return_raw_data_false (big_o.test.test_big_o.TestBigO) ... ok
test_big_o_return_raw_data_true (big_o.test.test_big_o.TestBigO) ... ok
test_infer_big_o (big_o.test.test_big_o.TestBigO) ... ok
test_measure_execution_time (big_o.test.test_big_o.TestBigO) ... ok
test_compute (big_o.test.test_complexities.TestComplexities) ... ok
test_not_fitted (big_o.test.test_complexities.TestComplexities) ... ok
test_str_includes_units (big_o.test.test_complexities.TestComplexities) ... ok
test_eq (big_o.test.test_complexityclass_comparisons.TestComplexities) ... ok
test_g (big_o.test.test_complexityclass_comparisons.TestComplexities) ... ok
test_ge (big_o.test.test_complexityclass_comparisons.TestComplexities) ... ok
test_l (big_o.test.test_complexityclass_comparisons.TestComplexities) ... ok
test_le (big_o.test.test_complexityclass_comparisons.TestComplexities) ... ok

======================================================================
FAIL: test_big_o (big_o.test.test_big_o.TestBigO)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "big_O/big_o/test/test_big_o.py", line 80, in test_big_o
    self.assertEqual(class_, res_class.__class__)
AssertionError: <class 'big_o.complexities.Quadratic'> != <class 'big_o.complexities.Cubic'>

----------------------------------------------------------------------
Ran 14 tests in 8.803s

In testing, I found this test passed in 29 out of 100 tests.
Specs below:

Python 3.10.6
MacOS 12.4
MacBook Air 2017
I7-5650U (Frequency limited to 2.2 GHz in software)
On battery power

The problem appears to be the quadratic function being used for testing is too fast for
accurate timing to be achieved.

The code being used is:

big_O/big_o/test/test_big_o.py

Lines 50 to 54 in 0ed1e3d

    
           def dummy_quadratic_function(n): 
        
               for i in range(n): 
        
                   for j in range(n): 
        
                       # Dummy operation with quadratic complexity. 
        
                       8282828 * 2322

The problem with this function, is Python optimizes out the multiplication, leaving only two empty loops.

This can be demonstrated by using dis to look at the disassembly:

>>> def dummy_quadratic_function(n): 
...      for i in range(n): 
...          for j in range(n): 
...              # Dummy operation with quadratic complexity. 
...              8282828 * 2322
... 
>>> import dis
>>> dis.dis(dummy_quadratic_function)
  2           0 LOAD_GLOBAL              0 (range)
              2 LOAD_FAST                0 (n)
              4 CALL_FUNCTION            1
              6 GET_ITER
        >>    8 FOR_ITER                 9 (to 28)
             10 STORE_FAST               1 (i)

  3          12 LOAD_GLOBAL              0 (range)
             14 LOAD_FAST                0 (n)
             16 CALL_FUNCTION            1
             18 GET_ITER
        >>   20 FOR_ITER                 2 (to 26)
             22 STORE_FAST               2 (j)

  5          24 JUMP_ABSOLUTE           10 (to 20)

  3     >>   26 JUMP_ABSOLUTE            4 (to 8)

  2     >>   28 LOAD_CONST               0 (None)
             30 RETURN_VALUE

Python has optimized the function down to just the two for loops, which is fast enough to cause the classification to be inconsistent.

This is fixed by saving the results to variables, and doing some simple addition.
It also adds a constant component

def dummy_quadratic_function(n):
    # Dummy operation with quadratic complexity.

    # Constant component of quadratic function
    dummy_constant_function(n)

    x = 0
    for i in range(n):
        for j in range(n):
            for k in range(20):
                x += 1
    return x // 20

Other Improvements

Simplicity Bias

Add simplicity_bias as a parameter of big_o.infer_big_o_class() to allow for the user to change the bias to choosing a simpler complexity when the residuals are close.

big_O/big_o/big_o.py

Line 102 in 0ed1e3d

if residuals < best_residuals - 1e-6:

`big_o.test.test_big_o.TestBigO.test_big_o()` Change Numpy sort to `heapsort` from `mergesort`

As reflected in the comments mergesort can appear like a more linear function depending on the input set.

big_O/big_o/test/test_big_o.py

Lines 56 to 60 in 0ed1e3d

    
           # In the best case, TimSort is linear, so we fix a random array to 
        
           # make sure we hit a close-to-worst case scenario, which is 
        
           # O(n*log(n)). 
        
           random_state = np.random.RandomState(89342787) 
        
           random_array = random_state.rand(100000)

The Numpy documentation shows a possible reason for this, that the merge sort may be implemented by a radix sort or Tim sort. Radix sort is a linear time sorting algorithm and Tim sort, under the right situation, can appear to be a linear sorting algorithm.

https://numpy.org/doc/stable/reference/generated/numpy.sort.html

Note that both ‘stable’ and ‘mergesort’ use timsort or radix sort under the covers

This commit changes the sort to a heap sort, which is reliable enough that it is not necessary to set the random order of the list that is sorted.

`big_o.test.test_big_o.TestBigO.test_measure_execution_time()` set `n_timings` to `5` for more reliable testing

big_O/big_o/test/test_big_o.py

Lines 17 to 20 in 0ed1e3d

    
           ns, t = big_o.measure_execution_time( 
        
               f, datagen.n_, 
        
               min_n=1, max_n=5, n_measures=5, n_repeats=1, 
        
           )

It appears that the test is more reliable if the timing is run multiple times, which is done by setting n_timings=5

Allow list parameters to passed to `complexity.fit()`

Currently ComplexityClass.fit() requires two Numpy arrays to be passed as arguments.
However, that often requires the calling code needs to explicitly import Numpy to create the arrays.

This PR adds a call to np.asanyarray() to the inputs in complexity.fit() so calling code can specify inputs as normal Python iterables.

Test cases for this are included.

In testing, only 1 in 100 tests failed. Improvement over previous.

pberkes · 2022-08-22T15:52:19Z

Hi @TheMatt2 ! Thank you for the PR and the thorough description, I'll have a look as soon as possible.

In the meantime, the description of the first part seems similar to a previous issue that you fixed in #44 , would you mind explaining the difference in short?

TheMatt2 · 2022-08-23T00:54:01Z

Hi @pberkes !

Apologize, I realize this is quite a lengthy PR. This PR fixes a somewhat subtle bug, but still has a significant impact on the classification accuracy. I wanted to make sure to include a mathematical breakdown.

This PR is similar to PR #44 in that both fix issues dealing with correctly accounting for the logarithmic transform used (only) by Polynomial and Exponential complexities.

The difference is PR #44 is a fix for Complexity.compute(), while this PR is a fix for Complexity.fit()

Unlike PR #44, this PR actually corrects the behavior of how bigo decides the classification of a given function.

TheMatt2 · 2022-09-01T18:09:37Z

Update on this?

pberkes · 2022-09-04T09:13:21Z

I'm sorry, @TheMatt2 , I've been traveling, I'll have a look this week

pberkes · 2022-09-05T10:06:51Z

Hi @TheMatt2 , thanks for the PR! Good catch with the residuals, and the Python optimization problem is fun.
This PR is fixing 3 separate issues, in the future please submit each in a separate PR for faster review.

My main concern is that if I set _recalculate_residuals = False in Polynomial and Exponential, all tests still pass. Could you please add a test that would fail without _recalculate_residuals?

As for design, I'm not a big fan of having that behavior controlled by a class attribute. The solutions I propose are 1) recalculate the residuals always; there's some overhead but it's the cleanest; or 2) define a method _compute_fit_residuals(n, t, lstsq_residuals) whose basic implementation is return lstsq_residuals[0]. What do you think? I'm open for discussion on this point.

I'll add other minor comments inline

big_o/complexities.py

big_o/test/test_big_o.py

TheMatt2 · 2022-09-06T17:57:18Z

Hi @pberkes ,

Apologize for lumping multiple issues into this 1 PR, in hindsight, I should have used branches to track the changes separately. Now I know for next time.

I follow using _recalculate_residuals may seem a bit counter-intuitive. I wanted to avoid creating a dedicated internal function for this purpose, as it could imply that each class should have a different implementation, like _transform_n().
However, there are only the two ways to calculate residuals:

Using the result of np.linalg.lstsq()
residuals = np.sum((ref_t - t) ** 2)

To denote that the correct behavior is only going to be one of these two behaviors, I thought a class level boolean value would be appropriate.

Technically, the code could check if ComplexityClass._transform_time() has been overridden and use that to determine the behavior, but I don't think making the check implicit is an improvement.

Obviously always recalculating residuals solve this problem, but it seems a waste to always redo the calculation when the majority of complexity classes can just use the value from np.linalg.lstsq(). Best that I can see there is no way to have numpy compute the coefficients without a residuals calculation.

I think keeping it as a class variable would be best, though I accept your recommendation to name it _recalculate_fit_residuals.

Might be worth acknowledge that there is another option to explicitly override fit() in the subclasses.

class Polynomial(ComplexityClass):
    def fit(self, n, t):
        """ Fit complexity class parameters to timing data.

        Input:
        ------

        n -- Array of values of N for which execution time has been measured.

        t -- Array of execution times for each N in seconds.

        Output:
        -------

        residuals -- Sum of square errors of fit
        """
        # Explicitly calculate residuals
        # Time transform means residuals isn't comparable
        super().fit(n, t)
        ## Alternatively
        # ComplexityClass.fit(self, n, t)
        ref_t = self.compute(n)
        residuals = np.sum((ref_t - t) ** 2)
        return residuals

TheMatt2 · 2022-09-06T17:58:45Z

As for the test cases passing even if _recalculate_residuals = False, I will look into this.
I had thought the changes to the test cases would make them fail in this case.

TheMatt2 · 2022-09-06T19:07:06Z

I am unable to replicate the tests passing when _recalculate_residuals = False

When I run the test set I (correctly) get the failure message:

(venv) big_O % python -m unittest 
.......F........
======================================================================
FAIL: test_compute (big_o.test.test_complexities.TestComplexities)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "big_O/big_o/test/test_complexities.py", line 35, in test_compute
    assert_allclose(residuals, np.sum((y - ref_y) ** 2), rtol=1e-07, atol=1e-08,
  File "big_O/venv/lib/python3.9/site-packages/numpy/testing/_private/utils.py", line 1527, in assert_allclose
    assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
  File "big_O/venv/lib/python3.9/site-packages/numpy/testing/_private/utils.py", line 844, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=1e-07, atol=1e-08
compute() residuals failed to match expected values for class <class 'big_o.complexities.Exponential'>
Mismatched elements: 1 / 1 (100%)
Max absolute difference: 3.25244183e+70
Max relative difference: 1.
 x: array(6.265939e-27)
 y: array(3.252442e+70)

----------------------------------------------------------------------
Ran 16 tests in 45.164s

FAILED (failures=1)

This failure is expected if residual calculation was not performed correctly.
Would you mind adding some detail of the circumstance that the test passes?

Note that this test directly calls ComplexityClass.fit() and Complexity.compute() so this should not change with timing variations between systems.

pberkes · 2022-09-09T07:47:57Z

The test case for Exponential fails when you set _recalculate_residuals to False, but not the one for Polynomial

pberkes · 2022-09-09T07:48:53Z

I'm ok with renaming _recalculate_fit_residuals and going with your current solution

TheMatt2 · 2022-09-10T19:13:53Z

I see you are correct. Good catch.

Apologize, I had thought I had tested that, but I guess I accidentally tested the Exponential case twice.

For the test case, the answer for Polynomial with _recalculate_residuals = False is off by 10 orders of magnitude, but is so close to 0, that the check doesn't catch it. Will fix. Oops.

>>> class_
<class 'big_o.complexities.Polynomial'>
>>> np.sum((y - ref_y) ** 2)
5.0832678484432925e-18
>>> residuals
1.7620118668727588e-28

Fixes an issue that existing test did not detect incorrect residuals in some cases. The issue was the residuals were too small.

Makes linear test use a larger range. This seems to decrease the chance of a misclassification on a system with noisy timing.

Previously was a skip

TheMatt2

I think that covers all requested changes.
Let me know if there is anything else you would like me to change.

big_o/test/test_big_o.py

pberkes · 2022-09-12T13:24:38Z

LGTM, thank you!

DanielHabenicht · 2023-05-26T10:18:35Z

Thanks everybody for providing this fix! I was going down the same rabbit hole as @TheMatt2 and wondering why it would not classify my linear values as a linear complexity class, but as polynomial instead.

The culprit now is: This was never released to a current big-O pip package, so everybody might still be running into this problem (just like I am). Is there a timeline on when the next release is due @pberkes ? (I can see there is #46)

TheMatt2 added 6 commits August 20, 2022 18:36

Fix Residual Calculation for Polynomial and Exponential

82c29f7

Add test case for fit() residuals calculation

ea74ca8

Add simplicity bias as optional argument

933d39c

Improve Big O Test Cases

a4f9c98

Add numpy array conversion to Complexity.fit() and test cases

ff721d3

Improve reliability of big_o unittests

f70a4b8

In testing, only 1 in 100 tests failed. Improvement over previous.

pberkes requested changes Sep 5, 2022

View reviewed changes

big_o/complexities.py Outdated Show resolved Hide resolved

big_o/test/test_big_o.py Show resolved Hide resolved

big_o/test/test_big_o.py Outdated Show resolved Hide resolved

TheMatt2 added 4 commits September 10, 2022 16:22

Add Test Cases for residual calculation

d23d30b

Fixes an issue that existing test did not detect incorrect residuals in some cases. The issue was the residuals were too small.

Improve bigo test reliability + Whitespace

76fdf21

Makes linear test use a larger range. This seems to decrease the chance of a misclassification on a system with noisy timing.

For unittests make unstable test an error

21b8672

Previously was a skip

Refactor _recalculate_residuals to _recalculate_fit_residuals

d448767

TheMatt2 force-pushed the master branch from 54aac59 to d448767 Compare September 12, 2022 01:30

TheMatt2 commented Sep 12, 2022

View reviewed changes

big_o/test/test_big_o.py Show resolved Hide resolved

big_o/test/test_big_o.py Outdated Show resolved Hide resolved

pberkes merged commit fd8c994 into pberkes:master Sep 12, 2022

	def dummy_quadratic_function(n):
	for i in range(n):
	for j in range(n):
	# Dummy operation with quadratic complexity.
	8282828 * 2322

	# In the best case, TimSort is linear, so we fix a random array to
	# make sure we hit a close-to-worst case scenario, which is
	# O(n*log(n)).
	random_state = np.random.RandomState(89342787)
	random_array = random_state.rand(100000)

	ns, t = big_o.measure_execution_time(
	f, datagen.n_,
	min_n=1, max_n=5, n_measures=5, n_repeats=1,
	)

Conversation

TheMatt2 commented Aug 21, 2022