Skip to content
Juho Inkinen edited this page Jun 19, 2025 · 15 revisions

Annif uses different kinds of subject and document corpora.

  • Subject vocabulary corpora define the set of possible subjects (concepts) that can be assigned to documents. These are typically SKOS or TSV files. See Subject vocabulary formats.
  • Document corpora are collections of documents (with or without assigned subjects) used for training, evaluation, or testing. See Document corpus formats.

← System requirements | Project configuration →

Clone this wiki locally