forked from NatLibFi/Annif
-
Notifications
You must be signed in to change notification settings - Fork 1
Corpus formats
Juho Inkinen edited this page Jun 19, 2025
·
15 revisions
Annif uses different kinds of subject and document corpora.
- Subject vocabulary corpora define the set of possible subjects (concepts) that can be assigned to documents. These are typically SKOS or TSV files. See Subject vocabulary formats.
- Document corpora are collections of documents (with or without assigned subjects) used for training, evaluation, or testing. See Document corpus formats.
- 🧑💻 Introduction & Getting Started
- 🚀 Deployment
- 🖥️ User Interfaces
- ⚙️ Preprocessing & Supporting Features
- 🎯 Optimization Techniques
- 🧩 Backends
- 🛠️ Development & Contribution
- 🆘 Troubleshooting & Support