Review item quality on the test

Yellowmacro

The value of performance data relies on each item assessing the right learning outcome

Content review

  • Does the item measure important learning?
  • Is the content measured worth long-term learning?
  • Does the item measure instructional significant content or skills?
  • Does the student have to think to answer each question? Is it simple recall of information, or do they have to apply critical thinking, problem-solving, or creativity?
  • Is the item too specific or too abstract?
  • Does the question attempt to serve too many purposes? Can you tell from a student response that they understood the behavior or knowledge assessed?
  • If a student incorrectly answers an item, will you be able to tell what the student did wrong?
  • Is this item cueing an answer to another question?
  • Does this item measure facts, not opinions?
  • Is there anything tricky about the item? Will all students understand the question?

Format and style review

  • Does the item use a format appropriate for the content, and age of students?
  • Is the question so complicated that most students will not understand the topic?
  • Are the items formatted consistently? For example, vertically or horizontally.
  • Is vocabulary appropriate for the student population tested?
  • Does the item or passage, require too much reading? Is it worth the student’s time?

Stem or the question

  • Does the stem have correct grammar, spelling, and punctuation?
  • Is the stem written so all students should understand the problem?
  • Are there any words or phrases that do not need to be part of the stem?
  • Is all information in stem relative to answering the item? Does the student have to weed out unnecessary information to respond to the item?
  • Is there a better way to phrase the stem?

Correct answers and distractors (incorrect answers)

  • Does the right answer match the key?
  • Are all distractors reasonable and standard errors?
  • Is there too much repetition of phrases in the distractors? For example, repeating the same introductory term in each item answer choice.
  • Does the correct answer item choice vary in position across the assessment?
  • For numerical answers, are the potential solutions in logical or numerical order? Be careful in systems that let you scramble answers for multiple versions of assessment. It may create unfair versions of the item.
  • Is the length of each choice similar?
  • Are there any clues that give away the correct answer, such as silly distractors?

Source:
Haladyna, T. (1997). Writing test items to evaluate higher order thinking. Boston: Allyn and Bacon.