EstNLTK analyzer by osma · Pull Request #818 · NatLibFi/Annif

osma · 2024-11-12T19:07:49Z

This PR adds a new analyzer to support lemmatization using EstNLTK, a natural language analysis toolkit for the Estonian language.

Note that the indirect dependencies of EstNLTK are quite large, with around ~500MB of libraries.

codecov · 2024-11-12T19:17:51Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.63%. Comparing base (d907024) to head (407a318).
Report is 10 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #818   +/-   ##
=======================================
  Coverage   99.63%   99.63%           
=======================================
  Files          93       95    +2     
  Lines        7141     7170   +29     
=======================================
+ Hits         7115     7144   +29     
  Misses         26       26

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

osma · 2024-11-21T14:32:31Z

The test coverage is not 100%. I think that's due to how optional backends are handled in the initialization code, that I copied from the spaCy analyzer. The same problem is already there for spaCy, so I think it would make sense to fix that first in main, then apply the same solution in this PR.

osma · 2024-11-22T13:30:11Z

The initialization problem was fixed for spaCy in PR #820, already merged to main.
I rebased this PR branch and adapted it accordingly. I think all is now well.

This still needs wiki documentation, maybe also a mention in the Annif tutorial.

osma · 2024-11-22T13:35:35Z

I added a brief section about this analyzer on the wiki page for Analyzers.

I don't see any mention of specific analyzers in the Annif tutorial, so I don't think this needs to be mentioned there.

juhoinkinen

👍

juhoinkinen · 2024-11-22T16:30:51Z

I added an estnltk section to the Optional features page of wiki too (and updated the whole page to use poetry instead of pip for installing dependencies when using dev installation).

osma · 2024-11-25T07:29:22Z

I realized that EstNLTK is GPL licensed, so probably should be mentioned alongside YAKE when we talk about licensing. Need to fix that before merging this PR.

sonarqubecloud · 2024-11-25T11:06:49Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

osma · 2024-12-20T10:31:08Z

I asked the EstNLTK developers about their thoughts on license compatibility. They were very positive about the Annif integration. Now that the licensing situation is at least somewhat clarified in the README, I don't think there are obstacles for merging this, so I will do it now.

osma added the enhancement label Nov 12, 2024

osma self-assigned this Nov 12, 2024

github-advanced-security AI found potential problems Nov 12, 2024

View reviewed changes

Comment thread annif/analyzer/estnltk.py Fixed

Comment thread annif/analyzer/estnltk.py Fixed

osma mentioned this pull request Nov 21, 2024

Smarter initialization of optional analyzers #820

Merged

osma added 5 commits November 22, 2024 15:20

first implementation of EstNLTK analyzer support

94d29db

add estnltk dependency to CI/CD tests for Python 3.10

51e841b

remove unused imports

f9863dc

fix test for estnltk install

35b8955

refactor code to avoid flake8 warning

66f577d

osma force-pushed the feature-estnltk-analyzer branch from d2a0051 to 66f577d Compare November 22, 2024 13:23

osma marked this pull request as ready for review November 22, 2024 13:28

osma changed the title ~~[WIP] EstNLTK analyzer~~ EstNLTK analyzer Nov 22, 2024

osma requested a review from juhoinkinen November 22, 2024 13:28

juhoinkinen approved these changes Nov 22, 2024

View reviewed changes

clarify(?) licensing situation w.r.t. YAKE and EstNLTK

407a318

osma mentioned this pull request Dec 12, 2024

License compatibility / Apache License 2.0 estnltk/estnltk#123

Closed

osma merged commit 8f13d7d into main Dec 20, 2024

osma deleted the feature-estnltk-analyzer branch December 20, 2024 10:31

osma added this to the 1.3 milestone Feb 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EstNLTK analyzer#818

EstNLTK analyzer#818
osma merged 6 commits into
mainfrom
feature-estnltk-analyzer

osma commented Nov 12, 2024

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Nov 12, 2024 •

edited

Loading

Uh oh!

osma commented Nov 21, 2024

Uh oh!

osma commented Nov 22, 2024

Uh oh!

osma commented Nov 22, 2024

Uh oh!

juhoinkinen left a comment

Uh oh!

juhoinkinen commented Nov 22, 2024

Uh oh!

osma commented Nov 25, 2024

Uh oh!

sonarqubecloud Bot commented Nov 25, 2024

Uh oh!

osma commented Dec 20, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

osma commented Nov 12, 2024

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Nov 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

osma commented Nov 21, 2024

Uh oh!

osma commented Nov 22, 2024

Uh oh!

osma commented Nov 22, 2024

Uh oh!

juhoinkinen left a comment

Choose a reason for hiding this comment

Uh oh!

juhoinkinen commented Nov 22, 2024

Uh oh!

osma commented Nov 25, 2024

Uh oh!

sonarqubecloud Bot commented Nov 25, 2024

Quality Gate passed

Uh oh!

osma commented Dec 20, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented Nov 12, 2024 •

edited

Loading