Review item quality on the test

The value of performance data relies on each item assessing the right learning outcome

Content review

  • Does the item measure important learning?
  • Is the content measured worth long-term learning?
  • Does the item measure instructional significant content or skills?
  • Does the student have to think to answer each question? Is it simple recall of information, or do they have to apply critical thinking, problem-solving, or creativity?
  • Is the item overly specific or abstract?
  • Does the question attempt to serve too many purposes? Can you tell from a student response that they understood the behavior or knowledge tested?
  • If a student incorrectly answers an item, will you be able to say what the student did wrong?
  • Is this item cueing an answer to another question?
  • Does this item measure facts, not opinions?
  • Is there anything tricky about the item? Will all students understand the problem?

Format and style review

  • Does the item use a format appropriate for the content, and age of students?
  • Is the question so complicated that most students will not understand the topic?
  • Are the items formatted consistently? For example, vertically or horizontally.
  • Is vocabulary appropriate for the student population tested?
  • Does the item or passage, require too much reading? Is it worth the student’s time?

Stem or the question

  • Does the stem have correct grammar, spelling, and punctuation?
  • Is the stem written so all students should understand the problem?
  • Are there any words or phrases that do not need to be part of the stem?
  • Is all information in stem relative to answering the item? Does the student have to weed out unnecessary information to respond to the item?
  • Is there a better way to phrase the stem?

Correct answers and distractors (incorrect answers)

  • Does the right answer match the key?
  • Are all distractors reasonable and standard errors?
  • Is there too much repetition of phrases in the distractors? For example, repeating the same introductory term in each item answer choice.
  • Does the correct answer item choice vary in position across the assessment?
  • For numerical answers, are the potential solutions in logical or numerical order? Be careful in systems that let you scramble answers for multiple versions of assessment. It may create unfair versions of the item.
  • Is the length of each choice similar?
  • Are there any clues that give away the correct answer, such as silly distractors?

Haladyna, T. (1997). Writing test items to evaluate higher order thinking. Boston: Allyn and Bacon.

Science assessments

Classroom science assessments

Effective science assessments measure student understanding and learning of concepts. Standards-based questions should work together to create a picture of student understanding.

  1. Identify the important idea of the domain to select standards.
  2. Analyze the skills and concepts within the standards to gauge how they work together to create a learning story.
  3. Broad standards effect the coverage of learning measured in the test. Because of the complementary nature of science standards, it is common to see overlap and cueing across questions.
  4. Standards with many skills need more items or a performance task to allow students to show learning across the skills within a standard.
  5. Science tests easily over-burden a concept or skill. This happens often with popular or interesting concepts.

Math assessments

Math assessments for the classroom

Math standards work together

Include standards within a cluster or domain that complement each other to test concepts.

For example, in elementary grades place value can be evaluated using items identifying place value, rounding numbers, and addition with regrouping. Students show their understanding of place value in different but connected ways.

Items address concepts in different ways

Selected response and constructed response questions working together provide a better picture of student thinking and learning.

Grade level math vocabulary

Using grade-appropriate math vocabulary, along with mixed item types will provide an improved understanding of student learning needs. It will be essential to decide what modifications or support will be available for students that may not understand the question due to reading skills, not math proficiency.

Use grade-appropriate numbers

Vary digits, length, and the types of numbers used in the questions. For example, if testing fractions, select items with a variety of denominators.

Vary complexity and difficulty

Assessments that are too hard or too easy will not provide instructional or feedback value.

Be careful not to over-test a concept or standard

Typically 5 – 10 items with mixed complexity or difficulty will provide quality performance data.