Dokimos | LLM Evaluation Framework for Java

Get started your way

One dependency for humans. One line for agents. Pick your on-ramp.

Add the dependency

One line in your test scope. That is the whole install.

pom.xml

<dependency>    <groupId>dev.dokimos</groupId>    <artifactId>dokimos-junit</artifactId>    <version>0.23.0</version>    <scope>test</scope></dependency>

Pulls in dokimos-core. Gradle and the framework integration modules (Spring AI, Spring AI Alibaba, LangChain4j, Koog, Embabel) are in the install guide.

Write your first eval

Point the JUnit integration at a dataset and run it like any other test.

FirstEvalTest.java

@DatasetSource("qa-pairs.json")@EvalTestvoid evaluate(EvalTestCase testCase) {    String answer = ragPipeline.answer(testCase.input());
    assertThat(answer)        .satisfies(new CorrectnessEvaluator(judge));}

Runs in mvn test and your existing CI, no new services to stand up.

Dataset-driven evaluation

Load test cases from JSON or CSV, or build them in code. Run the same dataset across experiments and JUnit tests, and track quality as it changes.

Built-in and agent evaluators

Hallucination, faithfulness, contextual relevance, and LLM-as-judge, plus tool-call validity, trajectory, and task completion for agents.

Framework agnostic

The core depends on no AI framework, so it works with any LLM client. Optional one-line integrations cover Spring AI, Spring AI Alibaba, LangChain4j, Koog, Embabel, and JUnit.