Conversation
|
This PR partially deals with #69 by adding cross-validated logp for SRM. |
|
@qihongl, could you please review? |
|
Thanks @qihongl ! It looks like we did not add. I will maybe work on it this weekend or next week. |
|
@lcnature Ah do you mean there are some new updates that haven't been added to this PR? |
|
@lcnature, please remember about this pull request. It seems close to completion. |
| ------- | ||
|
|
||
| ll_score : Log-likelihood of test-subject's data | ||
| """ |
There was a problem hiding this comment.
Might be better to have an input validation here to make sure all d in data have the same number of samples?
I noticed that later on line 616, you are skipping any d in data that is None. I was wondering if this can be done earlier, such a removing None from the very beginning, or warn the user that there is None in the data list? Feel free to decide what's more user-friendly.
There was a problem hiding this comment.
ah and this function probably want to assert len(data) >= 2
| w = [] | ||
| for subject in range(subjects): | ||
| _w = self._update_transform_subject(data[subject], self.s_) | ||
| w.append(_w) |
There was a problem hiding this comment.
not necessary but I think the following is slightly faster
w = [self._update_transform_subject(data[subject], self.s_) for subject in range(subjects)]
| w.append(_w) | ||
|
|
||
| x, mu, rho2, trace_xtx = self._init_structures(data, subjects) | ||
| sigma_s = self.sigma_s_ |
There was a problem hiding this comment.
not necessary but how about just use self.sigma_s_ on line 597?
|
|
||
| # Invert Sigma_s using Cholesky factorization | ||
| (chol_sigma_s, lower_sigma_s) = scipy.linalg.cho_factor( | ||
| sigma_s, check_finite=False) |
| wt_invpsi_x = np.zeros((self.features, samples)) | ||
| trace_xt_invsigma2_x = 0.0 | ||
| for subject in range(subjects): | ||
| if data[subject] is not None: |
There was a problem hiding this comment.
Hi, I'm not familiar with mpi but I think the code makes sense (in terms of matching the formulas in the paper)!
I think some input validation for data at the beginning of this function would be useful here. And I have 2 other very minor comments. Ah, and a test would be nice. Perhaps a test that computes the score for some toy data?
Thanks!
This adds a function on SRM which calculates the log-likelihood of test-subjects' data for model selection.
We followed a style of scikit-learn: https://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html#sklearn.mixture.GaussianMixture.score
Co-authored-by: @lcnature .