issue-62 [DDFG] Complete MAR missingness generation#87
Conversation
eltonlaw
left a comment
There was a problem hiding this comment.
Test looks good, added a few comments.
| model = sm.OLS(y, x).fit() | ||
|
|
||
| # Use model to get predictions for where data is missing | ||
| corrupt_data = pd.DataFrame(corrupt_data) |
There was a problem hiding this comment.
I'm not averse to adding more dependencies to the project if required, but I don't think Pandas is needed in this case. We have a python implementation to get the indices of all NaN's take a look at impyute.util.find_null
| x = np.array(complete_data_mar[:, 0]).reshape((-1, 1)) | ||
| y = np.array(complete_data_mar[:, 1]) | ||
| x = sm.add_constant(x) | ||
| model = sm.OLS(y, x).fit() |
There was a problem hiding this comment.
Same as the comment on the use of pandas, statsmodel isn't a dependency, though we do use scikit-learn which has an OLS implementation as well.
| import statsmodels.api as sm | ||
|
|
||
| def test_corrupt(): | ||
| # Generate data |
There was a problem hiding this comment.
Could you add all this data generation stuff to conftest.py? Check out pytest fixtures
| from impyute.dataset.corrupt import Corruptor | ||
| import statsmodels.api as sm | ||
|
|
||
| def test_corrupt(): |
| full_model_coef = [full_model.params[1]-full_model.bse[1], full_model.params[1]+full_model.bse[1]] | ||
| imp_model_coef = imp_model.params[1] | ||
|
|
||
| assert full_model_coef[0] <= imp_model_coef <= full_model_coef[1], "MAR coefficient not within true values" |
| pass | ||
| output = self.data.copy() | ||
| nrow, ncol = self.data.shape | ||
| for i in range(0, nrow): |
There was a problem hiding this comment.
This isn't a blocker, since I guess it works but two nested for loops is pretttttty messy, You can achieve the same thing if you put the percentile & assign stuff in a separate function and do a numpy.apply_along_axis
In [4]: np.apply_along_axis(lambda x: x+1, -1, np.array([[1, 2, 3], [4, 5, 6]]))
Out[4]:
array([[2, 3, 4],
[5, 6, 7]])
naive implementation of corruptor.mar() & test case