Skip to content

Avoid scikit-learn UserWarning for vectorizer parameter token_pattern#729

Merged
osma merged 2 commits into
mainfrom
fix-sklearn-userwarning-token-pattern
Aug 16, 2023
Merged

Avoid scikit-learn UserWarning for vectorizer parameter token_pattern#729
osma merged 2 commits into
mainfrom
fix-sklearn-userwarning-token-pattern

Conversation

@osma

@osma osma commented Aug 16, 2023

Copy link
Copy Markdown
Member

scikit-learn vectorizers used by Annif (CountVectorizer, TfidfVectorizer) trigger this warning:

UserWarning: The parameter 'token_pattern' will not be used since 'tokenizer' is not None'

This is a bit surprising, since we are not setting token_pattern ourselves, but its default value is not None. This PR fixes the warning by explicitly setting token_pattern=None whenever the tokenizer parameter is set in Annif calling code.

@osma osma added the bug label Aug 16, 2023
@osma osma added this to the 1.0 milestone Aug 16, 2023
@osma osma self-assigned this Aug 16, 2023
@codecov

codecov Bot commented Aug 16, 2023

Copy link
Copy Markdown

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (40cc2fd) 99.67% compared to head (a84e466) 99.67%.
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #729   +/-   ##
=======================================
  Coverage   99.67%   99.67%           
=======================================
  Files          89       89           
  Lines        6397     6401    +4     
=======================================
+ Hits         6376     6380    +4     
  Misses         21       21           
Files Changed Coverage Δ
annif/lexical/mllm.py 100.00% <ø> (ø)
annif/backend/mixins.py 97.82% <100.00%> (+0.20%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@osma osma changed the title fix scikit-learn UserWarning for vectorizer parameter token_pattern Avoid scikit-learn UserWarning for vectorizer parameter token_pattern Aug 16, 2023
@sonarqubecloud

Copy link
Copy Markdown

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@osma osma merged commit fef6e43 into main Aug 16, 2023
@osma osma deleted the fix-sklearn-userwarning-token-pattern branch August 16, 2023 07:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant