This topic is so contentious that I am going to talk about it in the abstract. Suppose we have two populations (A and B) that have been separated by a great many centuries. Each has its own language and culture. For reasons that we will assume to be benign, we decide to calculate the intelligence of Group A. This is an extremely difficult thing to do. In the simplest approach, we need a number of questions that we believe will measure the inherent ability to reason, as opposed to something that has been learned by members of the group. We administer the test to a great number of ten-year-old children in Group A, and then we count the number of them who got the answers to only one of the questions, to two of them, to three, etc. Very few of them will have gotten the answers to only a few; very few will have gotten the answers to almost all; and a great many will have gotten the answers to about half. Finally, we draw a graph showing the results. If the questions are well-designed and appropriate for use with this age group, it will be a bell-curve because the mathematics of this type of problem always produce such a curve. (There is a bit more about this in the notes.)
Now, we need an Index to characterize individual performance. Suppose a ten-year old takes the test, and gets the average score for ten-year-olds. We express this as her mental age (10) divided by her chronological age, which would give us 10/10 = 1, and then we multiply this by 100, which gives us an Intelligence Quotient (IQ) of 100. But if she gets the average score for twelve-year-olds, we have 12/10 = 1.2, and when we multiply this by 100, we know that she has an IQ of 120. We are assuming that the innate ability to reason is developed as the child grows up, but it’s clear that people who are 55 cannot be expected to have higher or lower IQs than they had at 50, so such tests cannot reasonably be given to adults. And we must be very certain that we aren’t testing something other than innate ability, so such tests must be revised very, very carefully. Finally, we shouldn’t think of them as more than a tool for placing students in study groups, for analyzing their performance from year to year, etc.
Now, suppose that we want to compare the intelligence scores for Groups A and B—and a bystander might well wonder why we are trying to do that. It’s clear that we must have a single test for both populations, and that means we will have to discard what we have done, and try to create a test that will work equally well for both—a very daunting task because of differences in language use, cultural assumptions, etc. But, let’s suppose that we are able to develop a test that we can reasonably have confidence in (and I’m not sure that can be done), and it seems to say that the average intelligence of Group A, as measured by such a test, is five points higher than that of Group B. So what?
So what! If you look at the superimposed curves you can see that there is almost complete overlap and any member of Group B with an IQ of 140, has a greater IQ than that of almost all of the members of Group A, the one with the supposedly average higher IQ; and one with an IQ of 110 has an IQ greater than most of the members of Group A. Period! In everyday life, each of us deals always with individuals, not with a hypothetical average.
Beyond this, any designer of such tests must worry about the likelihood that they have a margin of error (not easily assessed) that is so great when used across cultural boundaries that the readings may be meaningless.
Note It is sometimes said that you can prove anything with statistics, and you can—but only if you make questionable assumptions, falsify your results, or choose inappropriate ways of displaying the data. As Mark Twain said, “figures don’t lie, but liars figure.” Every time you mark a child’s height on the door-frame, you are collecting data, and every time you average your expenses for a week, you are calculating a statistic—and I imagine that you are as accurate as you can be when you do these things. Because of the outcry over the book, The Bell Curve, many people view that curve, itself, with suspicion, but it is the cornerstone of statistics, and it actually helps us to know how much error there is likely to be in our conclusions. The easiest way to explain it is to begin by looking at Galton’s Quincunx. Suppose that we have ten rows of ¾ inch thick plastic hexagons glued between two clear plastic sheets. They are arranged like the cells in a honey-comb, but spaced ¾ inch apart from each other so that branching channels run down from the top to the bottom. At the bottom, there is a series of vertical chutes. This is Galton’s Quincunx. Now . . . we drop a marble into one of the channels at the top (we will call this channel the “center”). The marble hits the pointed top of a hexagon in the second row, and it may bounce to the left or right. If it bounces to the left, it hits a hexagon in the third row, and again, it may bounce to the left or the right. The same thing is true if it bounces to the right. At every point, the chance of returning towards the center is just as great as the chance of moving further away. After dropping one hundred marbles down the center of this branching chute, we look at the shape made by the piles of marbles in the vertical chutes at the bottom. It is a bell-shaped curve. Mathematically, this shape is governed by a formula called the binomial distribution. In actual events, it occurs whenever we have a series of “measurements,” each dependent on many factors—and each of which can cause the measurement’s final value to move in one of two opposed directions. In the case of the quincunx, the chance movements occurring when the marbles hit the hexagons are the “factors,” and final position of each marble is a “measurement.” We can imagine a scale of such measurements marked off along the bottom of the chutes, and the number of marbles in each of the chutes provides us with a visual tally of the measurements. Of course, the Quincunx is simply a physical analogy of the statistical procedures, but it is an exact one. The binomial is not a smooth curve, but a stair-step, and if we were to smooth it out we would get the “normal” curve which is produced by the normal distribution. For our purposes, we can regard them as being the same (though the normal curve has its own formula—as it must—one which involves integral calculus). The shape and position of these curves on a chart are determined by two parameters, the mean (also called the arithmetic average) and the standard deviation. The mean locates the curve on the scale of measurement, and the standard deviation tells us how closely the individual measurements cluster around the mean. This is the curve we get if we have 50 people measure one side of a building using a ruler as the unit (Each estimate of the total length is a measurement, and the factors are the errors in placing the ruler). We get it if we repeatedly toss a number of coins at the same time and plot the number of heads (the measurement is the number of heads per toss and the factors are the results for the individual coins). We get it if we take a sample of washers from a manufacturing line and measure their thickness to see if the equipment is still in adjustment (the measurement is the thickness of each washer, and factors are the variations brought about by vibration, etc.) And because chromosomes are paired and the genes at the same locations on the paired chromosomes influence the same trait (one gene ordinarily being dominant over the other) we get this same curve for every measurable human characteristic that is controlled by a large number of genes. (This is just like the coin example.) Height and intelligence both fit this description. But height is easily measured—intelligence is not.