I present below some results reported in the Fourth National Assessment of Educational Progress in the USA, a report (one of a series) announced in 1988 and to be found in libraries under Library of Congress number LC # QA13.N36.

(The numbers below are percentages, but I seem to have made a mistake in transcribing the Grade 3 results in part (A), since they add to 104; sorry.)

The problems used in the NAEP examinations are given to representative samples from three age groups, sometimes or always (I’m not sure about this) the same question to all age groups. One of those given to all three groups was this one, classified under “logic”. The italicized hypothesis applies to both parts (A) and (B), which constitute two questions:

Everyone on the team is tall.

A. If Tom is short, then

Tom is on the team...………                          5                  1                       1

Tom is not on the team.......                         62                62                     68

There is not enough information

to tell if Tom is on the team …..        35                35                     30

I don't know.................                                  2                  2                       1

B. If Jane is tall, then

Jane is on the team..........                            68                58                     47

Jane is not on the team......                           5                  3                       2

There is not enough info.....                         13                38                     50

I don't know.................                                14                   1                      1

I was not acquainted with this Assessment, and only read very rapidly the text that introduced it, and its predecessors, but my understanding is that these tests are administered very personally, perhaps even read aloud to the third graders (some of whom might not be able to spell out the word "information," for example). But I'm not sure.

Yet it is clear that these questions are designed to test how well students understand implications and negations of simple propositions. In some sense the “obvious” interpretation of the result of this question is true: A distressingly large number of students fail to follow an English sentence that says something in a logical form they are not used to in daily life.

Especially Part (B). The first choice, selected by nearly half the 11th graders, is wrong. That is, "If Jane is tall, then Jane is on the team" is false. But if that choice had been put, "All the tall girls in the school are on the team," surely it would not have attracted such a following. And this rephrasing is not even a contrapositive to the one given, just a direct translation of the "if...then..." formulation to the set-inclusion formulation, "The set of all the tall girls (Jane being one of them) is contained in the team." It is evident that the question, if read properly, is laughably trivial and its answer obvious even to 3rd graders, yet half the 11th graders get it wrong. The real question becomes, what has this question measured?

It certainly has not measured any child's ability to reason in simple cases like this. I am quite sure that every normal child, American-born or immigrant, black or white, third, 7th or 11th grade, urban or rural, living in a slum, a home for wayward children or in the lap of luxury, can and will answer this question correctly as it appears in real life, stripped of the intimidating vocabulary of formal logic.

Real life! Imagine: Here are these two basketball teams in our school, one for boys and one for girls; we cheer them on Wednesday nights; we know many of their names; we can see they are all very tall; we know that the tallest ones have great advantage over the others. In a word, we know -- even without being told -- that "All members of the team are tall." Now imagine I am a third grader and with a friend am waiting for the bus. My friend points to a kid on the playground, some distance off. It's hard to see just who it is.

She is tall, but her features are so indistinct at that distance (It is raining, besides) that not much else about her can be seen or deduced. The friend asks, "Hey, isn't that girl on the basketball team?" Wouldn’t I immediately, without even thinking about it, answer "I don't know"? (By which I would mean, "There is not enough information to tell if that girl is on the team", and not that I didn’t understand the question.) In other words, would I not have given the correct answer to Question (B) above? (It appears to me now that there are two correct answers for Problem B: #3 and #4, but this might be because I have not fully transcribed the text. However it was worded, the number of persons who apparently believe that all tall girls are ipso facto members of the basketball team is obviously unbelievable.)

Yet when exactly this question was asked by NAEP, 63% of my Grade 3 classmates apparently answered that this indistinct girl, by virtue of her being tall – and nothing else – had to be a member of the basketball team. And 58% of my 8th grade schoolmates, and 47% of my 11th grade schoolmates, too! Is this credible? Of course it is not. It is something about the question, about NAEP and the exam situation, that announces that these children are the victims of an ignorance they cannot possibly have.

Again, imagine that my friend has pointed out a short boy, his features also indistinct, though through no fault of his own. We ask our neighbor, "Isn't that little pipsqueak over there a member of the Varsity basketball team?" The answer, "No", would come out so fast that we'd put the respondent in for a Fields medal. Yet the NAEP responses would have us believe that 30% of the 11th graders (Eleventh graders!) in America would express uncertainty about whether, on a team where all members are tall, this short boy belonged.

(One percent had it that this short boy was on the team. There is always a part of the exam-taking group that wishes to sabotage the test, or that simply reads it wrong. One cannot take the response figures seriously beyond a few percent of possible meaninglessness, especially in an exam with no individual feedback and no ill consequences for those who “flunk”.)

So, why do half the kids get the answer wrong on the test? There is the allied question: Does so large a number of wrong answers to this question tell us something about the nation’s success in teaching mathematics to children? The answer is yes, even though rather indirectly. Of course the mathematical content of the question is trivial, and as I have pointed out, any student who understood the question would certainly get the answers right if the identical question had been posed in a real-life lunchroom or bus stop, no matter how little mathematics he knew, or whether he was 9, 13 or 17 years old. They all would get it right! All! The only deductions needed are literally those that were being made every day by our Paleolithic ancestors, whose lives depended on recognizing at a distance whether an approaching shape was that of a man or a beast, a member of the tribe or a stranger. Those people had no mathematics beyond counting. Was theirs the kind of skills NAEP imagines it is testing? Admittedly, such skills are important, but there is no need of schools to teach them, no more than that children should be taught (in schools) that food is for the hungry. I can see the question now:

All hungry people want food.

If Janet is hungry,

1. She will want some food

2. She will not want food

3. There is not enough information given to tell us if she wants any food

4. I don’t know if she wants food or doesn’t want food

A clever question. Notice that the adjacent pair of words “want food” that occurs in the prelude (the “stem” of the question, as it is called in exam jargon) does not occur (as an adjacent pair) in the correct answer, and does appear in two of the wrong answers, so that children who don’t know the correct answer and look to the stem for a hint (“key words”) might be impelled to choose (2) or (4), both wrong answers.

Clearly this question is better placed in a vocabulary test than in a mathematics text. Likewise as to the basketball team question. Not that either one is very useful even as a test of vocabulary. None of the words in either question is unusual, or used in an unusual manner. Even someone who has never heard of basketball can, with some understanding of English syntax and such common words as “tall” and “short”, get the basketball question right. No, none of these questions is really suitable for a vocabulary test.

Yet one of the popular objections to the multiple-choice math tests that are given these days is that they are tests of English rather than math. The critics are right, for the questions they have in mind, but those usually are relatively tangled tales compared to the basketball question. Typical, though exaggerated, is this sort of question:

Fifty seven Canada geese flying south for the winter rest on an island in Lake Ontario for the night. In the morning 12 of them form a group and continue their flight while the rest remain behind. One-third of the geese who didn’t continue south that day then fly to Lake Canandaigua and are able to keep warm under a nearby bridge. If two-thirds of the geese who elected to remain on the Lake die of bird flu while the rest are healthy, how many healthy geese remained on Lake Ontario?”

This question is somewhat better, for it asks for the translation of some language into mathematical terms: Children should be able to compute 57-12 as the number 45 of the geese who stayed behind, and (2/3)X(45) = 30 as the number who then died of bird flu, and 15 as the number of healthy geese remaining in Lake Ontario. As arithmetic it is difficult for the 3rd grade, simple for the 8th grade, and really quite trivial for the 11th grade. It is a typical problem for NAEP, though I have just made it up myself, but it is objected to on the grounds that the reading of it is difficult. And one can be sure that almost all of those who would get it wrong in Grade 8 or 11 would have had their difficulties with the writing itself. Is the NAEP mathematics test misnamed, and should some of the math questions be moved over to the “English” part of NAEP?

Here I have to say I don't know. I believe such questions cannot reasonably be placed under the heading "logic" or “mathematics”, yet as questions about reading comprehension they are corrupted by their mathematical content. Were it given as a “reading” test problem, a careless arithmetic error would make the exam imply an inability to read. A similar story about Canada geese and the flight south would have to be followed by literary questions not so subject to tiny arithmetic mistakes. The inclusion of questions of this sort, and they are common, makes me wonder about the validity of the whole Assessment. Maybe definite numerical, algebraic and geometric problems with definite answers is a better way to go? Probably that depends on what you want to do with the results of the Assessment.

In both the basketball team problem and the Canada geese problem the mathematics is very simple, and a child out in the field, acquainted with the facts of the case, would score much better than the child at a desk reading about the case. This much is plain in the basketball problem, but even in the geese problem one can imagine a magically shrunken child, the boy Nils of Lagerlof’s famous children’s tale, seated on the back of one of them, counting to see what is happening, and of course having no problem whatever with either the logic or the numerical values that lead to the answer: seeing twelve of the 57 flying off the first morning and grieving over the 30 friends lost to the flu, and knowing with hardly having to count that he has only 15 left for the remainder of his journey.

What is mathematics? A test such as NAEP imagines itself to be an assessment of how well a child understands the way words in a little story translate into computation demands, but this is not the same thing as testing how well the real-life situation described by the NAEP stem would be attacked by the same child when not in the exam room, nor is it a test of anything but the most trivial of mathematics. Which is the thing we want to test? How well the words get interpreted and converted into computations, or how well such a situation as really experienced would get converted into computations? An excellent carpenter, experienced in his craft and intolerant of error, deriving one measure from another on his drawing board is experiencing the situation itself, and might well get the answer wrong if he were only to be able to read about it in an NAEP stem. It is true that in getting it wrong he would be showing an inability in some domain of human knowledge that another carpenter would get right, but whatever that domain might be it is not mathematics.

I believe, therefore, that the public is being given the wrong information, and is led to believe things that are simply not so. The NAEP scores affect to tell us that children fail to understand certain things touching on mathematics as it is presented to us in the real world, whereas what these children are exhibiting ignorance of is something quite different: the skill to read an examination paper and to imagine the real-world situation those words think they are describing to him. Very often, as happens repeatedly in other examinations where parents protest and exam publishers apologize, those words don’t really do their job. But even when the writing is plain, and good, should we label this NAEP “story-problem” skill “Mathematics”?

Can we really say that half of our high school students seriously believe that because all basketball players are tall every tall person is not only a basketball player but one of our own team? Yet we do solemnly say that. NAEP itself says so. They have measured it.

Ralph A. Raimi

Revised 18 October 2009