Writing Effective Test
Oracles
Hiranya Prasad Bastakoti
Contents
What Should Be
Checked?
Determining
Correct Values
When tests were performed manually, testers
What should could observe and decide if the software
be checked? behaved correctly using their own judgment.
However, with automated testing, the correct
behavior must be predefined and encoded into
the test before it runs — usually using assertion
statements like assertEquals() in JUnit.
To make testing effective, testers must carefully
decide what outputs should be checked and how
often
Test Oracle Strategy:
A test oracle defines the rules for deciding whether the test
passed or failed.
Two important features of an oracle:
Precision: How much of the output is checked
Frequency: How often the output is checked (once or
multiple times during execution)
1. Always Check Some Output
• Some testers only check if the program crashes — this is called a null
oracle strategy.
• Research shows only 25%–56% of failures cause crashes.
Four • If we only check for crashes, almost half or more bugs will be missed.
So always verify specific outputs, not just crashes.
Guidelines
2. Check the Right Outputs
for Effective • Good tests check outputs that are directly affected by the input or
Testing: function.
• Bad tests check unrelated or irrelevant values.
• Each test must have a clear goal, such as:
• Testing a specific branch or state
• Verifying a requirement
• The output should match the purpose of the test.
Contd.
3. Low Precision is Fine
• We don’t need to check everything in the output.
• Checking the most important and relevant parts is usually enough.
• Research shows that adding more checks gives only small improvement in failure detection.
• Keep tests simple and focused on what matters.
4. Low Frequency is Also Fine
• It’s not necessary to check the output state many times during execution.
• Checking the final output once is usually enough.
• Frequent checks add little benefit and may increase effort and complexity.
4/28/2025 6
• Knowing which parts of the output to check is one challenge, but a bigger one
is knowing what the correct output is.
Four main techniques to determine correct values (test oracle strategies):
1. Specification-Based Direct Verification
• If a clear specification exists, it can define expected outputs for given inputs.
• Example: A sorting program must produce a permutation of the input in sorted
order.
• Verifying output by human judgment is accurate but expensive.
Determining • Automated checkers (e.g., verifying that a sorted list is ordered and has the
same elements) are helpful but can be hard to write.
Correct
• Problems:
Values • Clear specifications are rare.
• Some software (e.g., probability calculators for Petri nets) produces outputs
we cannot verify manually.
4/28/2025 7
• Use another version of the software (a “gold”
implementation) to compare outputs.
• Useful when direct checking is hard or impossible.
• Example: Compare binary search with linear search
results.
2. Redundant
• Challenges:
Computations
• Developing multiple versions is costly.
• Independent versions may still fail in the same way due
to common hard inputs.
• Still widely used in regression testing (compare current
output with previous version).
4/28/2025 8
3. Consistency Checks
Checks for internal correctness using expected properties.
Example: A container should not contain duplicate elements.
Involves checking invariants, preconditions, and postconditions.
Tools like assertions are useful to automate consistency checks.
Based on the RIPR model (Reachability, Sometimes errors can be detected by internal
Infection, Propagation, Revealability): structure violations, not just incorrect outputs.
4/28/2025 9
4. Metamorphic Testing
• Compares output for related inputs.
• If it's hard to verify output for input x, test how the program behaves for a related input y.
• Example:
• For sine function:
• Also applies to data structures:
• Adding and then removing an element from a bag should leave it unchanged.
• Real-world example: TCAS (aircraft collision system)
• Small changes in aircraft positions shouldn't change the resolution advisory unless near
boundaries.
• Helps identify unstable or incorrect behavior, especially in continuous input spaces.
4/28/2025 10
The End
4/28/2025 11