Get started your way
One dependency for humans. One line for agents. Pick your on-ramp.
Add the dependency
One line in your test scope. That is the whole install.
<dependency> <groupId>dev.dokimos</groupId> <artifactId>dokimos-junit</artifactId> <version>0.23.0</version> <scope>test</scope></dependency>Pulls in dokimos-core. Gradle and the framework integration modules (Spring AI, Spring AI Alibaba, LangChain4j, Koog, Embabel) are in the install guide.
Write your first eval
Point the JUnit integration at a dataset and run it like any other test.
@DatasetSource("qa-pairs.json")@EvalTestvoid evaluate(EvalTestCase testCase) { String answer = ragPipeline.answer(testCase.input());
assertThat(answer) .satisfies(new CorrectnessEvaluator(judge));}Runs in mvn test and your existing CI, no new services to stand up.
Dataset-driven evaluation
Load test cases from JSON or CSV, or build them in code. Run the same dataset across experiments and JUnit tests, and track quality as it changes.
Built-in and agent evaluators
Hallucination, faithfulness, contextual relevance, and LLM-as-judge, plus tool-call validity, trajectory, and task completion for agents.
Framework agnostic
The core depends on no AI framework, so it works with any LLM client. Optional one-line integrations cover Spring AI, Spring AI Alibaba, LangChain4j, Koog, Embabel, and JUnit.