issue-62 [DDFG] Complete MAR missingness generation by jeanielim · Pull Request #87 · eltonlaw/impyute

jeanielim · 2019-08-02T17:35:44Z

naive implementation of corruptor.mar() & test case

eltonlaw

Test looks good, added a few comments.

eltonlaw · 2019-08-03T02:53:45Z

+    model = sm.OLS(y, x).fit()
+
+    # Use model to get predictions for where data is missing
+    corrupt_data = pd.DataFrame(corrupt_data)


I'm not averse to adding more dependencies to the project if required, but I don't think Pandas is needed in this case. We have a python implementation to get the indices of all NaN's take a look at impyute.util.find_null

eltonlaw · 2019-08-03T02:55:29Z

+    x = np.array(complete_data_mar[:, 0]).reshape((-1, 1))
+    y = np.array(complete_data_mar[:, 1])
+    x = sm.add_constant(x)
+    model = sm.OLS(y, x).fit()


Same as the comment on the use of pandas, statsmodel isn't a dependency, though we do use scikit-learn which has an OLS implementation as well.

eltonlaw · 2019-08-03T02:59:53Z

+import statsmodels.api as sm
+
+def test_corrupt():
+    # Generate data


Could you add all this data generation stuff to conftest.py? Check out pytest fixtures

eltonlaw · 2019-08-03T03:00:28Z

+from impyute.dataset.corrupt import Corruptor
+import statsmodels.api as sm
+
+def test_corrupt():


docstring please

eltonlaw · 2019-08-03T03:03:37Z

+    full_model_coef = [full_model.params[1]-full_model.bse[1], full_model.params[1]+full_model.bse[1]]
+    imp_model_coef = imp_model.params[1]
+
+    assert full_model_coef[0] <= imp_model_coef <= full_model_coef[1], "MAR coefficient not within true values"


this is a pretty neat test!

eltonlaw · 2019-08-03T03:09:47Z

-        pass
+        output = self.data.copy()
+        nrow, ncol = self.data.shape
+        for i in range(0, nrow):


This isn't a blocker, since I guess it works but two nested for loops is pretttttty messy, You can achieve the same thing if you put the percentile & assign stuff in a separate function and do a numpy.apply_along_axis

In [4]: np.apply_along_axis(lambda x: x+1, -1, np.array([[1, 2, 3], [4, 5, 6]])) Out[4]: array([[2, 3, 4], [5, 6, 7]])

jeanielim added 2 commits August 2, 2019 10:29

naive implementation of Corruptor.mar()

492d3b4

add test case for corruptor.mar function

3ec5726

eltonlaw requested changes Aug 3, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

issue-62 [DDFG] Complete MAR missingness generation#87

issue-62 [DDFG] Complete MAR missingness generation#87
jeanielim wants to merge 2 commits into
eltonlaw:masterfrom
jeanielim:master

jeanielim commented Aug 2, 2019

Uh oh!

eltonlaw left a comment

Uh oh!

eltonlaw Aug 3, 2019

Uh oh!

eltonlaw Aug 3, 2019

Uh oh!

eltonlaw Aug 3, 2019

Uh oh!

eltonlaw Aug 3, 2019

Uh oh!

eltonlaw Aug 3, 2019

Uh oh!

eltonlaw Aug 3, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jeanielim commented Aug 2, 2019

Uh oh!

eltonlaw left a comment

Choose a reason for hiding this comment

Uh oh!

eltonlaw Aug 3, 2019

Choose a reason for hiding this comment

Uh oh!

eltonlaw Aug 3, 2019

Choose a reason for hiding this comment

Uh oh!

eltonlaw Aug 3, 2019

Choose a reason for hiding this comment

Uh oh!

eltonlaw Aug 3, 2019

Choose a reason for hiding this comment

Uh oh!

eltonlaw Aug 3, 2019

Choose a reason for hiding this comment

Uh oh!

eltonlaw Aug 3, 2019

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants