Observing Learning Using All Answers
Many professional educators are expressing concerns about machine-scored testing.1 Before we decide to discontinue using multiple-choice tests entirely, let’s remember that a great deal of work has gone into establishing the alternative answers. Statistical observations of item response patterns suggest that the answers we consider to be “wrong” may well be measuring something of value to teachers.
The research conducted on behalf of Better Schooling Systems seems to suggest that there is a Piaget-like developmental sequence embedded among the alternative answers.2 Some form of systematic selection process involved in students’ selections of their “wrong” answers is hardly surprising, considering the amount of care attending the design of these answers.3
Nor is it surprising that the answering systems involved are lost when these answers are scored “zero” (0). Finding patterns among such answers requires that researchers find some way to by-pass the linear dependency between the right and the wrong answers. This dependency causes the statistical analysis conducted using the general linear model (GLM) to crash.
Our way around this problem is to use an adaptation of the multinomial procedure. This technique is presented elsewhere.4 It is sufficient to say here that this procedure successfully resolves this problem for within-item analysis.
We selected every 50th. student from our 2,810 sample arranged in descending order of age and the four students being presented here are a purposeful selection from this random selection.
We used the modes of the ages of selection to provide norms for each sub-test. All sub-scores are arrayed vertically from Concrete Right Answers on the bottom to Abstract Right Answers on the top. There was a strong indication of a developmental sequence present in this ordering. The numbers in the body of the data table at the end of this posting are percentage of selection of each sub-test. “F” stands for the October (Fall) administration and “S” for the March (Spring) administration of this same test.
The numbers in parentheses appended to each sub-scale name are the norms (in years of age) for that scale. The five subscales with modes of 8 years seem to represent Piaget’s egocentric stage. The three with modes of 9 and 10 seem to represent Piaget’s concrete operations stage. The balance seems to represent the phases of transition to the formal operations stage.
The bottom row of the table and the two right-hand columns give information that will be discussed in a later posting. The titles of the sub-scores are provided in the data table and their definitions are to be found in the definitions section of this site.
To interpret this graph we need to know that the numbers at the ends of the bars are the difference between the percentage scores from each sub-scale and reflect the change from the Fall to Spring administration. We also need to recognize that the red bars to the left mean the scores on these sub-scales declined from one administration to the next. Alternatively, the green lines to the right mean that the scores on these sub-scales have increased.
Let’s look at the learning dynamics revealed by these score changes for each of the four students.
Student 9
Age 16 years, 9 months old (Fall)
Total Scores Right/Wrong: Fall 45% -- Spring 33%
Functioning Level (all Answers): about 14 years
Apparently this person is functioning below chronological age by at least two years. From the norm ages, we can infer that s/he is moving out of thinking like 8 to 9 year-olds and moving upward through Literal (Lit) thinking into the more mature types of thought. From the green bars we infer that this child has not stopped learning, but appears to be consolidating at a level a bit above concrete thinking.
When we look at the particular sub-scores showing the greatest increases, we see these for Literal Reductions (LR) and Over Simplifications (OS). If s/he is struggling with this particular aspect of printed language interpretation, the teacher has an immediate handle on the problem that has made this student drop behind.
In contrast, when we look only at the total-correct score, all we see is consolidating at a level a bit above concrete thinking.
Student 25
Age: 14 years, 7 months (fall)
Total scores Right/Wrong: Fall 28% -- Spring 53%
Functioning level (all answers) about 15 years
The lower sub-test scales generally show declines and the higher ones show increases. This student seems to be learning at an age-appropriate level, if the modes can be considered to be “age appropriate.” This pattern is what we would expect normal development to look like.
Student 33:
Age: 13 years, 3 months (fall)
Total scores Right/Wrong: Fall 18% -- Spring 25%
Functioning level (all answers) about 10 years
We see declines in the higher order transition sub-scores (Irr, Tr and OG), and increases in the lower sub-scores (IR, WA and OS). This child’s heavy use of word associations (WA) suggests a reading comprehension problem. Although this student has shown a gain in total correct scores, the overall pattern is deterioration of cognitive maturity.
Student 49:
Age: 9 years, 4 months (fall)
Total scores Right/Wrong: Fall 8% -- Spring 13%
Functioning level (all answers) about 11 years
If we are looking only at the total scores, it is probably not appropriate to give this test to someone so young. However, when we consider the modal ages where we observe declines and where we observe gains, we get a different picture. This student appears to be moving out of concrete operations toward more abstract thinking at a level about two years ahead of his/her chronological age. The teacher should be encouraging independent study for this child. Such information is not available from total scores.
Conclusions
One advantage of this approach to scoring is that the modes of selection for the sub-tests give us a linear scale for observing progress.
A second advantage can be found among the specific gains and losses among the sub-scores give a picture of the level of achievement and the direction of performance change of each student that is more precise and detailed than is available from the proportion of Abstract Right Answers provides.
The third advantage comes from the descriptions of the thinking behind the sub-score answers selected that can provide teachers with an important diagnostic perspective.
Finally, evidence that scores can decline while performance improves and scores can improve while performance declines presents a serious challenge to the psychological validity of using total scores to assess student progress.
Does this alternative scoring approach provide for the formative assessment that is becoming so important for the success of educational improvement initiatives such as No Child Left Behind?
Summary of Information Communicated by Right/Wrong and All-Answer Test Scoring

Data from which the se conclusions were drawn
Contact: James C. Powell, Ph.D., P.O. Box 12833, Pittsburgh, PA 15241.
or by e-mail
This e-mail address is being protected from spambots. You need JavaScript enabled to view it
Anonymous, (2007) Multiple-Choice Tests. FairTest The National Center for Fair and Open Testing (Posted August 17th, 2007, www.fairtests.org/multiple-choice-tests
Powell, J. C. (1977). The developmental sequence of cognition as revealed by wrong answers Alberta Journal of Educational Research, 23, 43 - 51.
Dorans, Neil J.; Liu, Jinghua; and Hammond, Shelby (2008) Anchor test types and Population Invariance: An exploration across Subpopulations and Test Administrations Applied Psychological Measurement 31(1), 81 – 97.
Powell, J. C. and Shklov, N. (1992) Obtaining information about learners’ thinking strategies from wrong answers on multiple-choice tests The Journal of Educational and Psychological Measurement, 52, 847-865. A full discussion of this procedure is available from Better Schooling Systems.