Expose "vocabulary" parameter to "StringEncoder"#1819
Conversation
|
hello @emassoulie, thank you for your contribution 🎉 ! I think it's also worth to add an entry in the changelog :). |
| Used during randomized svd. Pass an int for reproducible results across | ||
| multiple function calls. | ||
|
|
||
| vocabulary : Mapping or iterable, default=None |
There was a problem hiding this comment.
| vocabulary : Mapping or iterable, default=None | |
| vocabulary_ : Mapping or iterable, default=None |
The scikit-learn convention requires to have an underscore at the end of attributes that are derived from the data
There was a problem hiding this comment.
Here vocabulary is a user-provided value though, it's not derived from the data. In fact, we never define a vocabulary_ in fit_transform.
|
|
| - Computing the associations in :class:`TableReport` is now deterministic and can | ||
| be controlled by the new parameter ``subsampling_seed`` of the global configuration. | ||
| :pr:`1775` by :user:`Thomas S. <thomass-dev>`. | ||
| - The :class: `StringEncoder` now exposes the ``vocabulary`` parameter from the parent |
There was a problem hiding this comment.
The changelog entry was added in the wrong place, it should be at the top of the file in the proper section
| ngram_range=self.ngram_range, | ||
| analyzer=self.analyzer, | ||
| stop_words=self.stop_words, | ||
| if self.vocabulary is None: |
There was a problem hiding this comment.
I think that having something like
if self.vocabulary is not None:
raise ValueError(...)and then continuing with self.vectorizer_ would be more readable
rcap107
left a comment
There was a problem hiding this comment.
Looks good to me, thanks a lot @emassoulie
Co-authored-by: Riccardo Cappuzzo <riccardo.cappuzzo@gmail.com>
Co-authored-by: Riccardo Cappuzzo <riccardo.cappuzzo@gmail.com>
Response to issue 1792