Skip to content

Residual Calculation Fix and Miscellaneous Improvements#45

Merged
pberkes merged 10 commits intopberkes:masterfrom
TheMatt2:master
Sep 12, 2022
Merged

Residual Calculation Fix and Miscellaneous Improvements#45
pberkes merged 10 commits intopberkes:masterfrom
TheMatt2:master

Conversation

@TheMatt2
Copy link
Contributor

Fixes an issue with how residuals are calculated and improves the reliability of test cases.

Residual Calculation Fix

tl;dr

The residual value returned from np.linalg.lstsq(x, y, rcond=-1) for Polynomial and
Exponential classes can not be meaningfully compared with the other complexity classes.

coeff, residuals, rank, s = np.linalg.lstsq(x, y, rcond=-1)

The issue is the transform used in Polynomial and Exponential complexity classes to correctly
generate the coefficients causes the residual calculation to be invalidated.

big_O/big_o/complexities.py

Lines 189 to 190 in 0ed1e3d

def _transform_time(self, t):
return np.log(t)

The solution is to recalculate the residuals manually:

ref_t = self.compute(n)
residuals = np.sum((ref_t - t) ** 2)

Steps to Reproduce

Here is an example that demonstrates big_O misclassifying a quadratic function because of this issue.

import big_o
import numpy as np
import matplotlib.pyplot as plt

# Set seed for consistent results
rng = np.random.default_rng(1234)

measures = np.linspace(1, 10, 100)

# Quadratic Function
func = lambda x: x**2

# Add randomness so its not exactly a quadratic
times = func(measures + np.abs(rng.standard_normal(measures.size)) * .1) + np.abs(rng.standard_normal(measures.size))

# Run big_o to get the best result
best, fitted = big_o.infer_big_o_class(measures, times, verbose = True)
# Observe best is *not* quadatic

# Lookup the fitted function for quadratic and exponential
quadatic = next(complexity for complexity in fitted if isinstance(complexity, big_o.complexities.Quadratic))
exponential = next(complexity for complexity in fitted if isinstance(complexity, big_o.complexities.Exponential))

# Plot the "best", second best, and the actual solution
plt.plot(measures, times, "-", label = "Original")
plt.plot(measures, best.compute(measures), "--", label = "Best " + str(best))
plt.plot(measures, quadatic.compute(measures), "--", label = quadatic)
plt.plot(measures, exponential.compute(measures), "--", label = exponential)
plt.xlabel("n")
plt.ylabel("t (sec)")
plt.legend()
plt.show()
Constant: time = 39 (sec) (r=90378.409421)
Linear: time = -23 + 11*n (sec) (r=4052.834495)
Quadratic: time = 1.2 + 1*n^2 (sec) (r=89.693176)
Cubic: time = 10 + 0.1*n^3 (sec) (r=2269.462214)
Polynomial: time = 1.6 * x^1.8 (sec) (r=1.323089)
Logarithmic: time = -30 + 45*log(n) (sec) (r=19352.077840)
Linearithmic: time = -4.4 + 4.3*n*log(n) (sec) (r=1389.322845)
Exponential: time = 2.8 * 1.5^n (sec) (r=6.137839)

As for can see in the calculated residuals, the best matching functions should be Polynomial with a residual of 1.32, followed by Exponential with a residual of 6.14, and followed distantly by Quadratic with a residual 89.70.
This is surprising, since the original function is Quadratic, yet Polynomial and Exponential were found to be better matches.

If we look at the graph, we can see this is an issue.

Big O Graph Demonstrating Classification Bug

Blue is the original function.
Green is the Quadratic complexity, which seems to match the original input quite well.
Orange is the Polynomial complexity, which seems to trail off of the original function at the tail end.
Red is Exponential complexity, which does not seem to match the original function very well at all, yet had a lower residual than Quadratic.

This is an issue. big_o.infer_big_o_class should have identified Quadratic as the best complexity.

The Math

coeff, residuals, rank, s = np.linalg.lstsq(x, y, rcond=-1)

$n_i$ is the input measurement ($i$ is the index for multiple measurements)

$t_i$ is the time measured for the cooresponding $n_i$

For each complexity, there are functions $F(n_i)$ and $T(t_i)$ that transforms $n_i$
and $t_i$ before they are passed to np.linalg.lstsq().

The result of np.linalg.lstsq() are coeff, which is referenced as $c$ and residuals
which is referenced at $r$.

The other outputs from np.linalg.lstsq() are unused.

$$x_i = F(n_i)$$ $$y_i = T(t_i)$$

Complexity $F(n)$ $T(t)$ $T^-1(t')$
Constant $1$ $t$ $t$
Linear $n$ $t$ $t$
Quadratic $n^2$ $t$ $t$
Cubic $n^3$ $t$ $t$
Logarithmic $ln(n)$ $t$ $t$
Linearithmic $nln(n)$ $t$ $t$
Polynomial $ln(n)$ $ln(t)$ $e^t$
Exponential $n$ $ln(t)$ $e^t$

The residual from np.linalg.lstsq() is calculated as the sum of squares of the calculated coefficiences.

$$t_i' = c_{1}F(n_i)+c_0$$

This can be expressed by the following equation for a given $F(n_i)$ and $T(t_i)$ with the resulting coefficience $c$:

$$r=\sum_{i=0}^{k}(t_i'-T(t_i))^2$$

However the equation for error, the distance between the equation represented by the complexity and the inputted function is calculated by applying $T^{-1}(t')$ to $t_i'$ instead of applying $T(t)$ to $t_i$.

$$s=\sum_{i=0}^{k}(T^{-1}(t_i')-t_i)^2$$

For must of the complexity functions, where $T(t) = t$, this error calculation is equal to the residual calculation.

But these are not equal for Polynomial and Exponential. This is shown in the table below:



Complexity Residual Error Residual = Error
Constant $$\sum_{i=0}^{k}(t_i'-t_i)^2$$ $$\sum_{i=0}^{k}(t_i'-t_i)^2$$
Linear $$\sum_{i=0}^{k}(t_i'-t_i)^2$$ $$\sum_{i=0}^{k}(t_i'-t_i)^2$$
Quadratic $$\sum_{i=0}^{k}(t_i'-t_i)^2$$ $$\sum_{i=0}^{k}(t_i'-t_i)^2$$
Cubic $$\sum_{i=0}^{k}(t_i'-t_i)^2$$ $$\sum_{i=0}^{k}(t_i'-t_i)^2$$
Logarithmic $$\sum_{i=0}^{k}(t_i'-t_i)^2$$ $$\sum_{i=0}^{k}(t_i'-t_i)^2$$
Linearithmic $$\sum_{i=0}^{k}(t_i'-t_i)^2$$ $$\sum_{i=0}^{k}(t_i'-t_i)^2$$
Polynomial $$\sum_{i=0}^{k}(t_i'-ln(t_i))^2$$ $$\sum_{i=0}^{k}(e^{t_i'}-t_i)^2$$
Exponential $$\sum_{i=0}^{k}(t_i'-ln(t_i))^2$$ $$\sum_{i=0}^{k}(e^{t_i'}-t_i)^2$$

So for Polynomial and Exponential, the residual must be recalculated to be equal to the error.

Test Cases Improvements

From testing, it seems that specifically the test big_o.test.test_big_o.TestBigO seems to fail often.

% python -m unittest -v
test_big_o (big_o.test.test_big_o.TestBigO) ... FAIL
test_big_o_return_raw_data_default (big_o.test.test_big_o.TestBigO) ... ok
test_big_o_return_raw_data_false (big_o.test.test_big_o.TestBigO) ... ok
test_big_o_return_raw_data_true (big_o.test.test_big_o.TestBigO) ... ok
test_infer_big_o (big_o.test.test_big_o.TestBigO) ... ok
test_measure_execution_time (big_o.test.test_big_o.TestBigO) ... ok
test_compute (big_o.test.test_complexities.TestComplexities) ... ok
test_not_fitted (big_o.test.test_complexities.TestComplexities) ... ok
test_str_includes_units (big_o.test.test_complexities.TestComplexities) ... ok
test_eq (big_o.test.test_complexityclass_comparisons.TestComplexities) ... ok
test_g (big_o.test.test_complexityclass_comparisons.TestComplexities) ... ok
test_ge (big_o.test.test_complexityclass_comparisons.TestComplexities) ... ok
test_l (big_o.test.test_complexityclass_comparisons.TestComplexities) ... ok
test_le (big_o.test.test_complexityclass_comparisons.TestComplexities) ... ok

======================================================================
FAIL: test_big_o (big_o.test.test_big_o.TestBigO)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "big_O/big_o/test/test_big_o.py", line 80, in test_big_o
    self.assertEqual(class_, res_class.__class__)
AssertionError: <class 'big_o.complexities.Quadratic'> != <class 'big_o.complexities.Cubic'>

----------------------------------------------------------------------
Ran 14 tests in 8.803s

In testing, I found this test passed in 29 out of 100 tests.
Specs below:

  • Python 3.10.6
  • MacOS 12.4
  • MacBook Air 2017
  • I7-5650U (Frequency limited to 2.2 GHz in software)
  • On battery power

The problem appears to be the quadratic function being used for testing is too fast for
accurate timing to be achieved.

The code being used is:

def dummy_quadratic_function(n):
for i in range(n):
for j in range(n):
# Dummy operation with quadratic complexity.
8282828 * 2322

The problem with this function, is Python optimizes out the multiplication, leaving only two empty loops.

This can be demonstrated by using dis to look at the disassembly:

>>> def dummy_quadratic_function(n): 
...      for i in range(n): 
...          for j in range(n): 
...              # Dummy operation with quadratic complexity. 
...              8282828 * 2322
... 
>>> import dis
>>> dis.dis(dummy_quadratic_function)
  2           0 LOAD_GLOBAL              0 (range)
              2 LOAD_FAST                0 (n)
              4 CALL_FUNCTION            1
              6 GET_ITER
        >>    8 FOR_ITER                 9 (to 28)
             10 STORE_FAST               1 (i)

  3          12 LOAD_GLOBAL              0 (range)
             14 LOAD_FAST                0 (n)
             16 CALL_FUNCTION            1
             18 GET_ITER
        >>   20 FOR_ITER                 2 (to 26)
             22 STORE_FAST               2 (j)

  5          24 JUMP_ABSOLUTE           10 (to 20)

  3     >>   26 JUMP_ABSOLUTE            4 (to 8)

  2     >>   28 LOAD_CONST               0 (None)
             30 RETURN_VALUE

Python has optimized the function down to just the two for loops, which is fast enough to cause the classification to be inconsistent.

This is fixed by saving the results to variables, and doing some simple addition.
It also adds a constant component

def dummy_quadratic_function(n):
    # Dummy operation with quadratic complexity.

    # Constant component of quadratic function
    dummy_constant_function(n)

    x = 0
    for i in range(n):
        for j in range(n):
            for k in range(20):
                x += 1
    return x // 20

Other Improvements

Simplicity Bias

Add simplicity_bias as a parameter of big_o.infer_big_o_class() to allow for the user to change the bias to choosing a simpler complexity when the residuals are close.

if residuals < best_residuals - 1e-6:

big_o.test.test_big_o.TestBigO.test_big_o() Change Numpy sort to heapsort from mergesort

As reflected in the comments mergesort can appear like a more linear function depending on the input set.

# In the best case, TimSort is linear, so we fix a random array to
# make sure we hit a close-to-worst case scenario, which is
# O(n*log(n)).
random_state = np.random.RandomState(89342787)
random_array = random_state.rand(100000)

The Numpy documentation shows a possible reason for this, that the merge sort may be implemented by a radix sort or Tim sort. Radix sort is a linear time sorting algorithm and Tim sort, under the right situation, can appear to be a linear sorting algorithm.

https://numpy.org/doc/stable/reference/generated/numpy.sort.html

Note that both ‘stable’ and ‘mergesort’ use timsort or radix sort under the covers

This commit changes the sort to a heap sort, which is reliable enough that it is not necessary to set the random order of the list that is sorted.

big_o.test.test_big_o.TestBigO.test_measure_execution_time() set n_timings to 5 for more reliable testing

ns, t = big_o.measure_execution_time(
f, datagen.n_,
min_n=1, max_n=5, n_measures=5, n_repeats=1,
)

It appears that the test is more reliable if the timing is run multiple times, which is done by setting n_timings=5

Allow list parameters to passed to complexity.fit()

Currently ComplexityClass.fit() requires two Numpy arrays to be passed as arguments.
However, that often requires the calling code needs to explicitly import Numpy to create the arrays.

This PR adds a call to np.asanyarray() to the inputs in complexity.fit() so calling code can specify inputs as normal Python iterables.

Test cases for this are included.

@pberkes
Copy link
Owner

pberkes commented Aug 22, 2022

Hi @TheMatt2 ! Thank you for the PR and the thorough description, I'll have a look as soon as possible.

In the meantime, the description of the first part seems similar to a previous issue that you fixed in #44 , would you mind explaining the difference in short?

@TheMatt2
Copy link
Contributor Author

TheMatt2 commented Aug 23, 2022

Hi @pberkes !

Apologize, I realize this is quite a lengthy PR. This PR fixes a somewhat subtle bug, but still has a significant impact on the classification accuracy. I wanted to make sure to include a mathematical breakdown.

This PR is similar to PR #44 in that both fix issues dealing with correctly accounting for the logarithmic transform used (only) by Polynomial and Exponential complexities.

The difference is PR #44 is a fix for Complexity.compute(), while this PR is a fix for Complexity.fit()

Unlike PR #44, this PR actually corrects the behavior of how bigo decides the classification of a given function.

@TheMatt2
Copy link
Contributor Author

TheMatt2 commented Sep 1, 2022

Update on this?

@pberkes
Copy link
Owner

pberkes commented Sep 4, 2022

I'm sorry, @TheMatt2 , I've been traveling, I'll have a look this week

@pberkes
Copy link
Owner

pberkes commented Sep 5, 2022

Hi @TheMatt2 , thanks for the PR! Good catch with the residuals, and the Python optimization problem is fun.
This PR is fixing 3 separate issues, in the future please submit each in a separate PR for faster review.

My main concern is that if I set _recalculate_residuals = False in Polynomial and Exponential, all tests still pass. Could you please add a test that would fail without _recalculate_residuals?

As for design, I'm not a big fan of having that behavior controlled by a class attribute. The solutions I propose are 1) recalculate the residuals always; there's some overhead but it's the cleanest; or 2) define a method _compute_fit_residuals(n, t, lstsq_residuals) whose basic implementation is return lstsq_residuals[0]. What do you think? I'm open for discussion on this point.

I'll add other minor comments inline

@TheMatt2
Copy link
Contributor Author

TheMatt2 commented Sep 6, 2022

Hi @pberkes ,

Apologize for lumping multiple issues into this 1 PR, in hindsight, I should have used branches to track the changes separately. Now I know for next time.

I follow using _recalculate_residuals may seem a bit counter-intuitive. I wanted to avoid creating a dedicated internal function for this purpose, as it could imply that each class should have a different implementation, like _transform_n().
However, there are only the two ways to calculate residuals:

  1. Using the result of np.linalg.lstsq()
  2. residuals = np.sum((ref_t - t) ** 2)

To denote that the correct behavior is only going to be one of these two behaviors, I thought a class level boolean value would be appropriate.

Technically, the code could check if ComplexityClass._transform_time() has been overridden and use that to determine the behavior, but I don't think making the check implicit is an improvement.

Obviously always recalculating residuals solve this problem, but it seems a waste to always redo the calculation when the majority of complexity classes can just use the value from np.linalg.lstsq(). Best that I can see there is no way to have numpy compute the coefficients without a residuals calculation.

I think keeping it as a class variable would be best, though I accept your recommendation to name it _recalculate_fit_residuals.

Might be worth acknowledge that there is another option to explicitly override fit() in the subclasses.

class Polynomial(ComplexityClass):
    def fit(self, n, t):
        """ Fit complexity class parameters to timing data.

        Input:
        ------

        n -- Array of values of N for which execution time has been measured.

        t -- Array of execution times for each N in seconds.

        Output:
        -------

        residuals -- Sum of square errors of fit
        """
        # Explicitly calculate residuals
        # Time transform means residuals isn't comparable
        super().fit(n, t)
        ## Alternatively
        # ComplexityClass.fit(self, n, t)
        ref_t = self.compute(n)
        residuals = np.sum((ref_t - t) ** 2)
        return residuals

@TheMatt2
Copy link
Contributor Author

TheMatt2 commented Sep 6, 2022

As for the test cases passing even if _recalculate_residuals = False, I will look into this.
I had thought the changes to the test cases would make them fail in this case.

@TheMatt2
Copy link
Contributor Author

TheMatt2 commented Sep 6, 2022

I am unable to replicate the tests passing when _recalculate_residuals = False

When I run the test set I (correctly) get the failure message:

(venv) big_O % python -m unittest 
.......F........
======================================================================
FAIL: test_compute (big_o.test.test_complexities.TestComplexities)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "big_O/big_o/test/test_complexities.py", line 35, in test_compute
    assert_allclose(residuals, np.sum((y - ref_y) ** 2), rtol=1e-07, atol=1e-08,
  File "big_O/venv/lib/python3.9/site-packages/numpy/testing/_private/utils.py", line 1527, in assert_allclose
    assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
  File "big_O/venv/lib/python3.9/site-packages/numpy/testing/_private/utils.py", line 844, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=1e-07, atol=1e-08
compute() residuals failed to match expected values for class <class 'big_o.complexities.Exponential'>
Mismatched elements: 1 / 1 (100%)
Max absolute difference: 3.25244183e+70
Max relative difference: 1.
 x: array(6.265939e-27)
 y: array(3.252442e+70)

----------------------------------------------------------------------
Ran 16 tests in 45.164s

FAILED (failures=1)

This failure is expected if residual calculation was not performed correctly.
Would you mind adding some detail of the circumstance that the test passes?

Note that this test directly calls ComplexityClass.fit() and Complexity.compute() so this should not change with timing variations between systems.

@pberkes
Copy link
Owner

pberkes commented Sep 9, 2022

The test case for Exponential fails when you set _recalculate_residuals to False, but not the one for Polynomial

image

@pberkes
Copy link
Owner

pberkes commented Sep 9, 2022

I'm ok with renaming _recalculate_fit_residuals and going with your current solution

@TheMatt2
Copy link
Contributor Author

TheMatt2 commented Sep 10, 2022

I see you are correct. Good catch.

Apologize, I had thought I had tested that, but I guess I accidentally tested the Exponential case twice.

For the test case, the answer for Polynomial with _recalculate_residuals = False is off by 10 orders of magnitude, but is so close to 0, that the check doesn't catch it. Will fix. Oops.

>>> class_
<class 'big_o.complexities.Polynomial'>
>>> np.sum((y - ref_y) ** 2)
5.0832678484432925e-18
>>> residuals
1.7620118668727588e-28

Fixes an issue that existing test did not detect incorrect residuals in some cases. The issue was the residuals were too small.
Makes linear test use a larger range.
This seems to decrease the chance of a misclassification on a system with noisy timing.
Copy link
Contributor Author

@TheMatt2 TheMatt2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that covers all requested changes.
Let me know if there is anything else you would like me to change.

@pberkes
Copy link
Owner

pberkes commented Sep 12, 2022

LGTM, thank you!

@pberkes pberkes merged commit fd8c994 into pberkes:master Sep 12, 2022
@DanielHabenicht
Copy link

Thanks everybody for providing this fix! I was going down the same rabbit hole as @TheMatt2 and wondering why it would not classify my linear values as a linear complexity class, but as polynomial instead.

The culprit now is: This was never released to a current big-O pip package, so everybody might still be running into this problem (just like I am). Is there a timeline on when the next release is due @pberkes ? (I can see there is #46)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants