Guiding Principle
Guiding Principle
Selected-response tests require learners to choose the correct answer or best alternative from
several choices. While they can cover a wide range of learning materials very efficiently and
measure a variety of learning outcomes, they are limited when assessing learning outcomes
that involve more complex and higher-level thinking skills. Selected-response tests include a
multiple-choice test, true-false or alternative response test, and matching type test. In the
following sections, the nature of the three selected-response tests and the rules for writing
them are discussed.
Strengths
1. Learning outcomes from simple to complex can be measured.
2. Highly structured and clear tasks are provided.
3. A broad sample of achievement can be measured.
4. Incorrect alternatives provide diagnostic information.
5. Scores are less influenced by guessing than true-false items.
6. Scoring is easy, objective, and reliable.
Limitations
1. Constructing good items is time-consuming.
2. It is frequently difficult to find plausible distracters.
3. This item is ineffective for measuring some types of problem-solving and the ability
to organize and express ideas.
4. Score can be influenced by reading ability.
Rules for Writing Multiple-Choice Items (Keith Waugh & Gronlund, 2013)
An effective multiple-choice item presents students with a task that is both important and
clearly understood and one that can be answered correctly by anyone who has achieved
the intended learning outcome. Nothing in the content or structure of the item should
prevent an informed student from responding correctly. Similarly, nothing in the content
or structure of the item should enable an uninformed student to select the correct answer.
The following rules for item writing are intended as guides for the preparation of multiple-
choice items that function as intended.
1. Design each item to measure an important learning outcome. The problem situation
around which an item is to be built should be important and should be related to the
intended learning outcome to be measured. When writing the item, focus on the
functioning content of the item and resist the temptation to include irrelevant material
or more obscure and less significant content to increase item difficulty.
2. Present a single, clearly formulated problem in the stem of the item. The task outlined
in the stem of the item should be so clear that a student can understand it without reading
the alternatives. A good check on the clarity and completeness of a multiple-choice stem
is to cover the alternatives and determine whether it could be answered without the
choices.
Example
Poor: A table of specifications
a. indicates how a test will be used to improve learning.
b. provides a more balanced sampling of content.*
c. arranges the instructional objectives in order of their importance.
d. specifies the method of scoring to be used on a test.
Better: What is the main advantage of using a table of specifications when preparing
an achievement test?
a. It reduces the amount of time required.
b. It improves the sampling of content.*
c. It makes the construction of test items easier.
d. It increases the objectivity of the test.
3. State the stem of the item in simple, clear language. The problem in the stem of a
multiple-choice item should be stated as precisely as possible and should be free of
unnecessarily complex wording and sentence structure. Poorly stated item stems
frequently introduce sufficient ambiguity to prevent a knowledgeable student from
responding correctly. Also, complex sentence structure may make the item a measure
more of reading comprehension than of the intended outcome.
Example
Poor: The paucity of plausible, but incorrect, statements that can be related to a central
idea poses a problem when constructing which one of the following types of test
items?
a. Short answer.
b. True-false.
c. Multiple choice.*
d. Essay.
Better: The lack of plausible, but incorrect, alternatives will cause the greatest
difficulty when constructing
a. short-answer items.
b. true-false items.
c. multiple-choice items.*
d. essay items.
Another common fault in stating multiple-choice items is to load the stem with irrelevant
and, thus, nonfunctioning material. This is probably caused by the instructor’s desire to
continue to teach the students – even while testing them. The following example illustrates
the use of an item stem as “another chance to inform students.”
Example:
Poor: Testing can contribute to the instructional program of the school in many
important ways. However, the main function of testing in teaching is:
Better: The main function of testing in teaching is:
4. Put as much of the wording as possible in the stem of the item. Avoid repeating the same
material in each of the alternatives. By moving all the common content to the stem, it is
usually possible to clarify the problem further and to reduce the time the student needs
to read the alternatives.
Example
Poor: In objective testing, the term objective
a. refers to the method of identifying the learning outcomes.
b. refers to the method of selecting the test content.
c. refers to the method of presenting the problem.
d. refers to the method of scoring the answers.*
Better: In objective testing, the term objective refers to the method of
a. identifying the learning outcomes.
b. selecting the test content.
c. presenting the problem.
d. scoring the answers.*
In many cases, the problem is not simply to move the common words to the stem but to
reword the entire item. The following examples illustrate how an item can be improved by
revising the stem and shortening the alternatives.
Example
Poor: Instructional objectives are most apt to be useful for test-construction purposes
when they are stated in such a way that they show
a. the course content to be covered during the instructional period.
b. the kinds of performance students should demonstrate upon reaching the goal.*
c. the things the teacher will do to obtain maximum student learning.
d. the types of learning activities to be participated in during the course.
Better: Instructional objectives are most useful for test-construction purposes when
they are stated in terms of
a. course content.
b. student performance.*
c. teacher behavior.
d. learning activities.
5. State the stem of the item in positive form, wherever possible. A positively phrased item
tends to measure more important learning outcomes than a negatively stated item. This
is because knowing such things as the best method or the most relevant argument
typically has greater educational significance than knowing the poorest method or the
least relevant argument. The use of negatively stated items in stem results all too
frequently from the ease with which such items can be constructed rather than from the
importance of the learning outcomes measured. The test maker who becomes frustrated
by the inability to think of a sufficient number of plausible distracters for an item, as in
the first following example, suddenly realizes how simple it would be to construct the
second version.
Example
Item One: Which one of the following is a category in the revised taxonomy of the
cognitive domain?
a. Understand.*
b. (distracter needed)
c. (distracter needed)
d. (distracter needed)
Item Two: Which one of the following is not a category in the revised taxonomy of
the cognitive domain?
a. Understand
b. Apply
c. Analyze
d. (answer needed)*
Note in the second version that the categories of the taxonomy serve as distracters and that
all that is needed to complete the item is a correct answer. This could be any term that
appears plausible but is not one of the categories listed in the taxonomy. Although such
items are easily constructed, they are apt to have a low level of difficulty and are likely to
measure relatively unimportant learning outcomes. Being able to identify answers that do
not apply provides no assurance that the student possesses the desired knowledge.
This solution to the lack of sufficient distracters is most likely to occur when the test
maker is committed to the use of multiple-choice items only. A more desirable procedure
for measuring the “ability to recognize the categories in the taxonomy of the cognitive
domain” is to switch to a modified true-false form, as in the following example.
Example
Directions: Indicate which of the following are categories in the taxonomy of the
cognitive domain by circling Y for yes and N for no.
*Y N Understand
Y N* Critical Thinking
Y N* Reasoning
*Y N Create
Example
Poor: Which one of the following is not a desirable practice when preparing multiple-
choice items?
a. Stating the stem in positive form.
b. Using a stem that could function as a short-answer item.
c. Underlining certain words in the stem for emphasis.
d. Shortening the stem by lengthening the alternatives.*
Better: All of the following are desirable practices when preparing multiple-choice
items EXCEPT
a. stating the stem in positive form.
b. using a stem that could function as a short-answer item.
c. underlining certain words in the stem for emphasis.
d. shortening the stem by lengthening the alternatives.*
7. Make certain that the intended answer is correct or clearly best. When the correct-
answer form of a multiple-choice item is used, there should be only one correct answer
and it should be unquestionably correct. With the best-answer form, the intended answer
should be one that competent authorities would agree is clearly the best. In the latter
case, it may also be necessary to include “of the following” in the stem of the item to
allow for equally satisfactory answers that have not been included in the item.
Example
Poor: What is the best method of selecting course content for test items?
Better: Which one of the following is the best method of selecting course content for
test items?
The proper phrasing of the stem of an item can also help avoid equivocal answers
when the correct-answer form is used. In fact, an inadequately stated problem frequently
makes the intended answer only partially correct or makes more than one alternative
suitable.
Example
Poor: What is the purpose of classroom testing?
Better: One purpose of classroom testing is or The main purpose of classroom testing is
8. Make all alternatives grammatically consistent with the stem of the item and parallel in
form. The correct answer is usually carefully phrased so that it is grammatically
consistent with the stem. Unless care is taken to check them against the wording in the
stem and the correct answer, they may be inconsistent in the tense, article, or
grammatical forms. This, of course, could provide a clue to the correct answer, or at
least make some of the distracters ineffective. A general step that can be taken to prevent
grammatical inconsistency is to avoid using the articles “a” or “an: at the end of the stem
of the item.
Example
Poor: The recall of factual information can be measured best with a
a. matching item.
b. multiple-choice item.
c. short-answer item.*
d. essay question.
The indefinite article “a” in the first version makes the last distracter obviously
wrong. By simply changing the alternatives from singular to plural, it is possible to omit
the article. In other cases, it may be necessary to add an article (“a,” “an,” or as appropriate)
to each alternative or to rephrase the entire item.
Stating all the alternatives in parallel form also tends to prevent unnecessary clues
from being given to students. When the grammatical structure of one alternative differs
from that of the others, some students may more readily detect that alternative as a correct
or an incorrect response.
Example
Poor: Why should negative terms be avoided in the stem of a multiple-choice item?
a. They may be overlooked.*
b. The stem tends to be longer.
c. The construction of alternatives is more difficult.
d. The scoring is more difficult.
Better: Why should negative terms be avoided in the stem of a multiple-choice item?
a. They may be overlooked.*
b. They tend to increase the length of the stem.
c. They make the construction of alternatives more difficult.
d. They may increase the difficulty of the scoring.
9. Avoid verbal clues that enable students to select the current answer or to eliminate an
incorrect alternative. One of the most common sources of extraneous clues in multiple-
choice items is the wording of the item. Some such clues are rather obvious and are
easily avoided. Others require the constant attention of the test maker to prevent them
from slipping in unnoticed. Let’s review some of the verbal clues commonly found in
multiple-choice items.
a. Similarity of wording in both the stem and the correct answer is one of the most obvious
clues. Keywords in the stem may unintentionally be repeated verbatim in the correct
answer, a synonym may be used, or the words may simply sound or look alike.
Poor: Which one of the following would you consult first to locate research articles
on achievement testing?
a. Journal of Educational Psychology
b. Journal of Educational Measurement
c. Journal of Consulting Psychology
d. Review of Educational Research*
The word “research” in both the stem and the correct answer is apt to provide a clue to the
correct answer to the uninformed but testwise student. Such obvious clues might better be
used in both the stem and an incorrect answer, to lead the uninformed away from the correct
answer.
b. Stating the correct answer in the textbook language or stereotyped phraseology or may
cause students to select it because it looks better than the other alternatives, or because
they vaguely recall having seen it before.
Example
Poor: Learning outcomes are most useful in preparing tests when they are
a. clearly stated in performance terms.*
b. developed cooperatively by teachers and students.
c. prepared after the instruction has ended.
d. stated in general terms.
The pat phrasing of the correct answer is likely to give it away. Even the most poorly
prepared student is apt to recognize the often-repeated phrase “clearly stated in
performance terms,” without having the foggiest notion of what it means.
c. Stating the correct answer in greater detail may provide a clue. Also, when the answer
is qualified by modifiers that are typically associated with true statements (for example,
“sometimes,” “may,” “usually”), it is more likely to be chosen.
Example
Poor: Lack of attention to learning outcomes during test preparation
a. will lower the technical quality of the items.
b. will make the construction of test items more difficult.
c. will result in the greater use of essay questions.
d. may result in a test that is less relevant to the instructional program.*
The term “may” is rather obvious in this example, but this type of error is common and
appears frequently in a subtler form.
d. Including absolute terms in the distracters enables students to eliminate them as possible
answers because such terms (“always,” “never,” “all,” “none,” “only,”) are commonly
associated with false statements. This makes the correct answer obvious or at least
increases the chances that the students who do not know the answer will guess it.
Example
Poor: Achievement tests help students improve their learning by
a. encouraging them all to study hard.
b. informing them of their progress.*
c. giving them all a feeling of success.
d. preventing any of them from neglecting their assignments.
Such absolutes tend to be used by the inexperienced test maker to ensure that the incorrect
alternatives are clearly wrong. Unfortunately, they are easily recognized by the student as
unlikely answers, making them ineffective as distracters.
e. Including two responses that are all-inclusive makes it possible to eliminate the other
alternatives since one of the two must obviously be the correct answer.
Example
Poor: Which of the following types of test items measures learning outcomes at the
recall level?
a. Supply-type items.*
b. Selection-type items.
c. Matching items.
d. Multiple-choice items.
Since the first two alternatives include the only two major types of test items, even poorly
prepared students are likely to limit their choices to these two. This, of course, gives them
a fifty-fifty chance of guessing the correct answer.
f. Including two responses that have the same meaning makes it possible to eliminate them
as potential answers. If two alternatives have the same meaning and only one answer is
to be selected, it is fairly obvious that both alternatives must be incorrect.
Example
Poor: Which of the following is the most important characteristic of achievement-test
results?
a. Consistency.
b. Reliability.
c. Relevance.*
d. Objectivity.
In this item, both “consistency” and “reliability” can be eliminated because they mean
essentially the same thing. Extraneous clues to the correct answer must be excluded from test
items if the items are to function as intended. It is frequently good practice, however, to use
such clues to lead the uninformed away from the correct answer. If not overdone, this can
contribute to the plausibility of the incorrect alternatives.
10. Make the distracters plausible and attractive to the uninformed. The distracters in a
multiple-choice item should be so appealing to the students who lack the knowledge
called for by the item that they select one of the distracters in preference to the correct
answer. This is the ideal, of course, but one toward which the test maker must work
continually. The art of constructing a good multiple-choice item depends heavily on
the development of effective distracters. You can do several things to increase the plausibility
and attractiveness of distracters:
The greater plausibility resulting from the use of more homogenous alternatives can be seen
in the improved version of the following item.
Example
Poor: Obtaining a dependable ranking of students is of major concern when using
a. norm-referenced summative tests.*
b. behavior descriptions
c. checklists
d. questionnaires.
Example
Poor: One advantage of multiple-choice items over essay questions is that they
a. measure more complex outcomes.
b. depend more on recall.
c. require less time to score.
d. provide for a more extensive sampling of course content.*
Better: One advantage of multiple-choice items over essay questions is that they
a. provide for the measurement of more complex learning outcomes.
b. place greater emphasis on the recall of factual information.
c. require less time for test preparation and scoring.
d. provide for a more extensive sampling of course content.
12. Avoid using the alternative “all of the above,” and use “none of the above” with
extreme caution. When test makers are having difficulty in locating a sufficient number
of distracters, they frequently resort to the use of “all of the above” or “none of the
above” as the final option. These special alternatives are seldom used appropriately
and almost always render the item less effective than it would be without them.
The inclusion of “all of the above” as an option makes it possible to answer the item
based on partial information. Since students are to select only one answer, they can
detect “all of the above” as the correct choice simply by noting that two of the
alternatives are correct. They can also detect it as a wrong answer by recognizing that
at least one of the alternatives is incorrect; of course, their chance of guessing the
correct answer from the remaining choices then increases proportionally. Another
difficulty with option is that some students, recognizing that the first choice is correct,
will select it without reading the remaining alternatives. Obviously, the use of “none of the
above” is not possible with the best-answer type of multiple-choice item since the alternatives
vary in appropriateness and the criterion of absolute correctness is not applicable. When used
as the right answer in a correct- answer type of item, this option may be measuring nothing
more than the ability to detect incorrect answers. Recognizing that certain answers are wrong
is no guarantee that the student knows what is correct. For example, a student may be able to
answer the following item correctly without being able to name the categoriesin the
taxonomy.
Example
Poor: Which of the following is a category in the revised taxonomy of the cognitive
domain?
a. Critical Thinking.
b. Scientific Thinking.
c. Reasoning Ability.
d. None of the above.
13. Vary the position of the correct answer in a random manner. The correct answer should
appear in each alternative position about the same number of times, but its placement
should not follow a pattern that may be apparent to the person taking the test.
Sufficient variation without a discernible pattern might also be obtained by simply
placing the responses in alphabetical order, based on the first letter in each, and letting
the correct answer fall where it will.
When the alternative responses are numbers, they should always be listed in order
of size, preferably in ascending order. This will eliminate the possibility of a clue, such
as the correct answer being the only one that is not in numerical order.
14. Control the difficulty of the item either by varying the problem in the stem or by
changing the alternatives. It is usually preferable to increase item difficulty by
increasing the level of knowledge called by making the problem more complex.
However, it is also possible to increase the difficulty by making the alternatives more
homogenous. When this is done, care must be taken that the finer discriminations
called for are educationally significant and are in harmony with the learning outcomes
to be measured.
15. Make certain each item is independent of the other items in the test. Occasionally
information given in the stem of one item will help the students answer another item.
This can be remedied easily by a careful review of the items before they are assembled
into a rest. A different type of problem occurs when the correct answer to an item depends on
knowing the correct answer to the item preceding it. The student who is unable to
answer the first item is unable to answer the first item, of course, has no basis for
responding to the second. Such chains of interlocking items should be avoided. Each
item should be an independently scorable unit.
16. Use an efficient item format. The alternatives should be listed in separate lines, under
one another, like the examples provided here. This makes the alternatives easy to read
and compare. It also contributes to ease of scoring since the letter of the alternatives
all appears on the left side of the paper.
17. Follow the normal rules of grammar. If the stem is in question form, begin each
alternative with a capital letter and end with a period or other appropriate punctuation
mark. Omit the period with numerical answers, however, to avoid confusion with
decimal points. When the stem is an incomplete statement, start each alternative with
a lowercase letter and end with whatever terminal punctuation mark is appropriate.
18. Break (or bend) any of these rules if it will improve the effectiveness of the items. These
rules for constructing multiple-choice items are stated rather dogmatically as an aid to
the beginner. As experience in item writing is obtained, situations are likely to occur
where ignoring or modifying a rule may be desirable.
Example
T *F True-false items are classified as supply-type items.
In some cases, the student is asked to judge each statement as true or false, and then
to change the false statements so that they are true. When this is done, a portion of each
statement is underlined to indicate the part that can be changed. In the example given, for
instance, the words “supply type” would be underlined. The key parts of true statements,
of course, must also be underlined.
Another variation is the cluster-type true-false format. In this case, a series of items
is based on a common stem.
Example
Which of the following terms indicate observable student performance? Circle Y for yes
and N for no.
*Y N 1. Explains
*Y N 2. Identifies
Y *N 3. Learns
*Y N 4. Predicts
Y *N 5. Realizes
This stem format is especially useful for replacing multiple-choice items that have
more than one correct answer. Such items are impossible to score satisfactorily. This is
avoided with the cluster-type item because it makes each alternative a separate scoring unit
of one point.
Strengths
1. The item is useful for outcomes where there are only two possible alternatives (e.g.,
fact or opinion, valid or invalid).
2. Less demand is placed on reading ability than in multiple-choice items.
3. A relatively large number of items can be answered in a typical testing period.
4. A complex outcome can be measured when used with interpretive exercises.
5. Scoring is easy, objective, and reliable.
Limitations
1. It is difficult to write items beyond the knowledge level that are free from ambiguity.
2. Making an item false provides no evidence that the student knows what is correct.
3. No diagnostic information is provided by the incorrect answers.
4. Scores are more influenced by guessing than with any other item type.
Rules for Writing True-False Items (Keith Waugh & Gronlund, 2013)
The purpose of a true-false item, as with all item types, is to distinguish between
those who have and those who have not achieved the intended learning outcome. Achievers
should be able to select the correct alternative without difficulty, while nonachievers should
find the incorrect alternative at least as attractive as the correct one. The rules for writing
true-false items are directed towards this end.
1. Include only one central idea in each statement. The main point of the item should be a
prominent position in the statement. The true-false decision should not depend on some
subordinate point or trivial detail. The use of several ideas in each statement should
generally be avoided because these tend to be confusing, and the answer is more apt to
be influenced by reading ability than the intended outcome.
Example
Poor: T F* The true-false item is also called an alternative-response item.
Better: T* F The true-false item, which is favored by test experts, is also called an alternative-
response item.
The “poor” example must be marked false because test experts do not favor the true-
false item. Such subordinate points are easily overlooked when reading the item. If the point
is important, it should be included as the main idea in a separate item.
2. Keep the statement short and use simple vocabulary and sentence structure. A short,
simple statement will increase the likelihood that the point of the item is clear. All
students should be able to grasp what the statement is saying.
Example
Poor: T* F The true-false item is more subject to guessing but it should be
used in place of a multiple-choice item, if well-constructed,
when there is a dearth of plausible distracters.
Better: T* F The true-false item should be used in place of a multiple-choice item when only
two alternatives are possible.
3. Word the statement so precisely that it can unequivocally be judged true or false. True
statements should be true under all circumstances and yet free of qualifiers (“may,”
“possible,” and so on), which might provide clues. This requires the use of precise words
and the avoidance of such vague terms as “seldom,” “frequently,” and “often.” The same
care, of course, must also be given to false statements so that their falsity is not too
readily apparent from differences in wording
Example
Poor: T F* Lengthening a test will increase its reliability.
Better: T* F Lengthening a test by adding items like those in the test will increase its
reliability.
4. Use negative statements sparingly and avoid double negatives. The “no” and/or “not” in
negative statements are frequently overlooked and are read as positive statements. Thus,
negative statements should be used only when the learning outcome requires it (e.g., in
avoiding a harmful practice), and then the negative words should be emphasized by
underlining or by using capital letters. Statements including double negatives tend to be
so confusing that they should be restated in positive form.
Example
Poor: T* F Correction-for-guessing is not a practice that should never be used in testing.
Better: T* F Correction-for-guessing is a practice that should sometimes be used in testing
Example
Poor: T F Testing should play a major role in the teaching-learning process.
Better: T* F Gronlund believes that testing should play a major role in the teaching-learning
process.
In some cases, it is useful to use a series of opinion statements that pertain to the same
individual or organization. This permits a more comprehensive measure of how well the
student understands a belief or value system.
Example
Would the author of your textbook agree or disagree with the following statements?
Circle A for agree, D for disagree.
A* D 1. The first step in achievement testing is to state the intended learning outcomes in
performance tests.
A D* 2. True-false tests are superior to multiple-choice tests for measuring achievement.
Using about 10 items like those listed here would provide a good indication of the
students’ grasp of the author’s point of view. Another valuable use of opinion statements
is to ask students to distinguish between statements of fact and statements of opinion.
Example
Read each of the following statements and circle F if it is a fact and circle O if it is an
opinion.
F* O 1. The true-false item is a selection-type item.
F O* 2. The true-false item is difficult to construct.
F O* 3. The true-false item encourages student guessing.
F* O 4. The true-false item can be scored objectively.
In addition to illustrating the use of opinion statements in test items, the last two
examples illustrate variations from the typical true-false format. These are more logically
called alternative-response items.
6. When cause-effect relationships are being measured, use only true propositions. The
true-false item can be used to measure the “ability to identify cause-effect relationships,”
and this is an important aspect of understanding. When used for this purpose, both
propositions should be true and only the relationship judge true or false.
Example
Poor: T F* True-false items are classified as objective items because students must supply the
answer.
Better: T F* True-false items are classified as objective items because there are only two
possible answers.
7. Avoid extraneous clues to the answer. There are several specific determiners that provide
verbal clues to the truth or falsity of an item. Statements that include such absolutes as
“always,” “never,” “all,” “none,” and “only” tend to be false; statements with qualifiers
such as “usually,” “may,” and “sometimes” tend to be true. Either these verbal clues
must be eliminated from the statements, or their use must be balanced between true items
and false items.
Example
Poor: T F* A statement of opinion should never be used in a true-false item.
Poor: T* F A statement of opinion may be used in a true-false item.
Better: T* F A statement of opinion, by itself, cannot be marked true or false.
The length and complexity of the statement might also provide a clue. True statement
tends to be longer and more complex than false ones because of their need for qualifiers.
Thus, a special effort should be made to equalize true and false statements in these respects.
A tendency to use a disproportionate number of true statements, or false statements,
might also be detected and used as a clue. Having approximately, but not exactly, an equal
number of each seems to be the best solution. When assembling the test, it is, of course,
also necessary to avoid placing the correct answers in some discernible pattern (for
instance, T, F, T, F). Random placement will eliminate this possible clue.
8. Base items on introductory material to measure more complex learning outcomes. True-
false or alternative-response items are frequently used in interpreting written materials,
tables, graphs, maps, or pictures. The use of introductory material makes it possible to
measure various types of complex learning outcomes.
The matching item is simply a variation of the multiple-choice form. A good practice
isto switch to the matching format only when it becomes apparent that the same alternatives
are being repeated in several multiple-choice items.
Below are the strengths and limitations of matching items.
Strengths
1. A compact and efficient form is provided where the same set of responses fit a series
of item stems (i.e., premises).
2. Reading and response time are short.
3. This item type is easily constructed if converted from multiple-choice items having a
common set of alternatives.
4. Scoring is easy, objective, and reliable.
Limitations
1. This item type is largely restricted to simple knowledge outcomes based on
association.
2. It is difficult to construct items that contain enough homogenous responses.
3. Susceptibility to irrelevant clues is greater than in other item types.
Example
Which test item is least useful for educational diagnosis?
a. Multiple-choice item.
b. True-false item.
c. Short-answer item.
Example
Directions: Column A contains a list of characteristics of test items. On the line to the
left of each statement, write the letter of the test item in Column B that best fits the
statement. Each response in Column B may be used once, more than once, or not at all.
Column A Column B
Rules for Writing Matching Items (Keith Waugh & Gronlund, 2013)
A good matching item should function the same as a series of multiple-choice items.
As each premise is considered, all the responses should serve as plausible alternatives. The
rules for item writing are directed toward this end.
1. Include only homogenous material in each matching item. In our earlier example of a
matching item, we included only types of test items and their characteristics. Similarly,
an item might include only authors and their works, inventors and their inventions,
scientists and their discoveries, or historical events and their dates. This homogeneity is
necessary if all responses are to serve as plausible alternatives (see earlier example).
2. Keep the list of items short and place the brief responses on the right. A short list of
items (say fewer than 10) will save reading time, make it easier for the student to locate
the answer, and increase the likelihood that the responses will be homogenous and
plausible. Placing the brief responses on the right also saves reading time.
3. Use a larger, or smaller, number of responses than premises, and permit the responses
to be used more than once. Both an uneven match and the possibility of using each
response more than once reduce the guessing factor. As we noted earlier, proper use of
the matching form requires that all responses be plausible alternatives for each premise.
This, of course, dictates that each response be eligible for use more than once.
4. Place the responses in alphabetical or numerical order. This will make the selection of
the responses easier and avoid possible clues foe to placement.
5. Specify in the directions the basis for matching and indicate that each response may be
used once, more than once, or not at all. This will clarify the task for all students and
prevent any misunderstanding. Take care, however, not to make the directions too long
and involved. The previous example illustrates adequate detail for directions
6. Put all the matching items on the same page. This will prevent the distraction of flipping
pages back and forth and prevent students from overlooking responses on another page.
This item type also includes computational problems and any other simple item form
that requires supplying the answer rather than selecting it. Except for its use in
computational problems, the short-answer item is used primarily to measure the simple
recall of knowledge.
The short-answer item appears to be easy to write and use, but there are two major
problems in constructing short-answer items. First, it is extremely difficult to phrase the
question or incomplete statement so that only one answer is correct. Second, there is the
problem of spelling. If credit is given only when the answer is spelled correctly, the poor
spellers will be prevented from showing their true level of achievement, and the test scores
will become an uninterpretable mixture of knowledge and spelling skills.
Below are the strengths and limitations of short-answer items.
Strengths
1. It is easy to write test items.
2. Guessing is less likely than in selection-type items.
3. This item type is well suited to computational problems and other learning outcomes
where supplying the answer is important.
4. A broad range of knowledge outcomes can be measured.
Limitations
1. It is difficult to phrase statements so that only one answer is correct.
2. Scoring is contaminated by spelling ability when responses are nonverbal.
3. Scoring is tedious and time-consuming.
4. This item type is not very adaptable to measuring complex learning outcomes.
Rules for Writing Short-Answer Items (Keith Waugh & Gronlund, 2013)
1. State the item so that only a single, brief answer is possible. This requires great skill in
phrasing and the use of precise terms. What appears to be a simple, clear question to the
test maker can frequently be answered in many ways.
2. Start with a direct question and switch to an incomplete statement only when greater
conciseness is possible by doing so. The use of a direct question increases the likelihood
that the problem will be stated clearly and that only one answer will be appropriate.
Also, incomplete statements tend to be less ambiguous when they are based on problems
that were first stated in questions form.
Example
What is another name for true-false items? (alternative-response items)
True-false items are also called (alternative-response items)
In some cases, it is best to leave it in question form. This may make the item clearer,
especially to younger students.
3. It is best to leave only one blank, and it should relate to the main point of the statement.
Leaving several blanks to be filled in is often confusing and the answer to one blank
may depend on the answer in another.
Example
Poor: In terms of the type of response, the (matching) item is most like the (multiple-choice)
item.
Better: In terms of the type of responses, which item is most like the matching item?
(multiple choice)
In the “poor” version, several different responses would have to be given credit,
such as “short answer” and “essay,” and “true-false,” and “multiple choice.” Obviously,
the item would not function as originally intended. It is also important to avoid asking
students to respond to unimportant or minor aspects of a statement. Focus on the main idea of
the item and leave a blank only for the key response.
4. Place the blanks at the end of the statement. This permits the student to read the complete
problem before coming to the blank to be filled. With this procedure, confusion and
rereading of the item are avoided, and scoring is simplified. Constructing incomplete
statements with blanks at the end is more easily accomplished when the item is first
stated as a direct question, as suggested earlier. In some cases, it may be a matter of
rewording the item and changing the response to be made.
Example
Poor: (Reliability) is likely to increase when a test is lengthened.
Better: When a test is lengthened, reliability is likely to (increase).
With this particular item, the “better” version also provides a more clearly focused
item. The “poor” version could be answered by “validity,” “time for testing,” “fatigue,”
and other unintended but clearly correct responses. This again illustrates the great care
needed in phrasing short-answer items.
5. Avoid extraneous clues to answer. One of the most common clues in short-answer items
is the length of the blank. If a long blank is used for a long word and a short word, this
is obviously a clue. Thus, all blanks should be uniform in length. Another common clue
is the use of the indefinite article “a” or “an” just before the blank. It sometimes gives
away the answer or at least rules out some possible incorrect answers.
Example
Poor: The supply item used to measure the ability to organize and integrate material
is called an (essay item).
Better: Supply-type items used to measure the ability to organize and integrate
material are called (essay items).
The “poor” version rules out “short-answer item,” the only other supply item
because it does not follow the article “an.” One solution is to include both articles, using
a(an). Another solution is to eliminate the article by switching to plural, as shown in the
“better” version.
6. For numerical answers indicate the degree of precision expected and the units in which
they are expressed. Indicating the degree of precision (e.g., to the nearest whole number)
will clarify the task for students and prevent them from spending more time on an item
that is required. Indicating the units in which to express the answer will aid scoring by
providing a more uniform set of responses (e.g., minutes rather than fractions of an
hour). When the learning outcome requires knowing the type of unit in common use and
the degree of precision expected, this rule must then be disregarded.
Example
Describe the relative merits ofselection type test items and essay questionsfor measuring
learning outcomes at the understanding level. Confine your answers to one page.
Example
Mr. Rogers, a ninth-grade science teacher, wants to measure his students’ “ability to
interpret scientific data” with the paper-and-pencil test.
Example
Evaluation Outcome: (The student is given a complete achievement test that includes
errors or flaws in the directions, in the test items, and the
arrangement of the items.) Write a critical evaluation of this test
using as evaluative criteria the rules and standards for test
construction described in your textbook. Include a detailed
analysis of the test’s strengths and weaknesses and an evaluation
of its overall quality and probable effective.
The following are the strengths and limitations of the essay questions:
Strengths
1. The highest-level learning outcomes (analyzing, evaluating, creating) can be
measured.
2. Preparation time is less than that for selection-type items.
3. The integration and application of ideas are emphasized.
Limitations
1. There is an adequate sampling of achievement due to the time needed for answering
each question.
2. It is difficult to relate to intended learning outcomes because of the freedom to select,
organize, and express ideas.
3. Scores are raised by writing skill and bluffing and lowered by poor handwriting,
misspelling, and grammatical errors.
4. Scoring is time-consuming and subjective and tends to be unreliable.
Rules for Writing Essay Questions (Keith Waugh & Gronlund, 2013)
1. Use essay questions to measure complex learning outcomes only. Most recalls of
knowledge outcomes profit little from being measured by essay questions. These
outcomes can usually be measured more effectively by objective items that lack the
sampling and scoring problems that essay questions introduce. There may be few
exceptions, as when supplying the answer is a basic part of the learning outcome. But for
most, recall of knowledge outcomes essay questions simply provide a less reliable
measure with no compensating benefits.
2. Relate the questions as directly as possible to the learning outcomes being measured.
Essay questions will not measure complex learning outcomes unless they are carefully
constructed to do so. Each question should be specifically designed to measure one or
more well-defined outcomes. Thus, the place to start, as is the case with objective items,
is with a precise description of the performance to be measured. This will help determine
both the content and form of the item and will aid in the phrasing of it.
4. Do not permit a choice of questions unless the learning outcome requires it. In most tests
of achievement, it is best to have all students answer the same questions. If they are
permitted to write on only a fraction of the questions, such as three out of five, their
answers cannot be evaluated on a comparative basis. Also, since the students will tend
to choose those questions, they are best prepared to answer, their responses will provide
a sample of their achievement that is less representative than that obtained without
optional questions. As discussed earlier, one of the major limitations of the essay test is
the limited and unrepresentative sampling it provides. Giving students a choice among
questions simply complicates the sampling problem further and introduces greater
distortion into the test results. In some situations, the use of optional questions might be
defensible. For example, if the essay is to be used as a measure of writing skill only,
some choice of topics on which to write may be desirable. This might also be the case
if the essay is used to measure some aspects of creativity, or if the students have pursued
individual interests through independent study.
5. Provide ample time for answering and suggest a time limit on each question. Since essay
questions are designed most frequently to measure intellectual skills and abilities, time
must be allowed for thinking as well as for writing. Thus, generous time limits should
be provided. Informing students of the appropriate amount of time they should spend on
each question will help them use their time efficiently; ideally, it will also provide a
more adequate sample of achievement. If the length of the answer is not clearly defined
by the problem, as in some extended-response questions, it might also be desirable to
indicate page limits.
Rules for Scoring Essay Answers (Keith Waugh & Gronlund, 2013)
1. Evaluate answers to essay questions in terms of the learning outcomes being measured.
The essay test, like the objective test, is used to obtain evidence concerning the extent
to which clearly defined learning outcomes have been achieved. Thus, the desired
student performance specified in these outcomes should serve as a guide both for
constructing the question and for evaluating the answers.
2. Score restricted-response answers by the point method, using the model answer as a
guide. Scoring with the aid of a previously prepared scoring key is possible with the
restricted-response item because of the limitations placed on the answer. The procedure
involves writing a model answer to each question and determining the number of points
to be assigned to it and the parts within it. The distribution of points within an answer
must, of course, consider all scorable units indicated in the learning outcomes being
measured.
5. Evaluate answers to essay questions without knowing the identity of the reader. This is
another way to control personal bias during scoring. Answers to essay questions should
be evaluated in terms of what is written, not in terms of what is known about the writers
from other contacts with them. The best way to prevent prior knowledge from biasing
our judgment is to evaluate each answer without knowing the identity of the writer.
6. Whenever possible, have two or more persons grade each answer. The best way to check
on the reliability of the scoring of essay answers is to obtain two or more independent
judgments. Although this may not be a feasible practice for routine classroom testing, it
might be done periodically with a fellow teacher (one who is equally competent in the
area).