Intelligence Tests (College Board AP® Psychology): Revision Note
The Stanford-Binet IQ test
The first formal intelligence test was developed by Alfred Binet in France to identify children who needed additional support in school
Binet introduced the concept of mental age, which is the level of intellectual functioning typical of a child of a given age
A child's mental age was compared to their chronological age (their actual age) to produce a score
Lewis Terman at Stanford University adapted Binet's test for American use and introduced the formula for calculating the intelligence quotient (IQ):
IQ = (mental age ÷ chronological age) × 100
E.g. a 10-year-old child with a mental age of 10 has an IQ of 100 (10 ÷ 10 × 100); a 10-year-old with a mental age of 12 has an IQ of 120
The resulting test became known as the Stanford-Binet IQ test
This was the first widely used standardized intelligence test
In modern use, IQ scores are calculated differently, using deviation IQ, which compares a person's score to the average performance of others
IQ scores follow a normal distribution with a mean of 100 and a standard deviation of 15
Approximately 68% of people score between 85 and 115; approximately 95% score between 70 and 130
In modern times, IQ scores are often used to identify students for educational services
The Wechsler IQ tests
David Wechsler developed a family of intelligence tests designed to address limitations of the Stanford-Binet:
Wechsler Adult Intelligence Scale (WAIS) for adults aged 16 and over
Wechsler Intelligence Scale for Children (WISC) for children aged 6–16
Wechsler Preschool and Primary Scale of Intelligence (WPPSI) for children aged 2.5–7
The Wechsler tests differ from the Stanford-Binet in important ways:
They yield both an overall IQ score and separate scores for specific subtests, giving a more detailed profile of a person's cognitive strengths and weaknesses
Subtests cover six types of questions: information, comprehension, arithmetic, similarities, digit span, and vocabulary
This reflects Wechsler's view that intelligence involves a combination of abstract and verbal measures rather than a single score
The Wechsler tests are among the most widely used intelligence assessments in the world today
Psychometric principles: standardization, reliability, and validity
For any psychological test, such as including intelligence tests, to be considered useful, it must adhere to the following psychometric principles:
standardization
reliability
validity
Standardization
A test is standardized when it is administered using consistent procedures and environments for all test-takers
Standardization ensures that scores are comparable across individuals by ensuring everyone takes the test under the same conditions
A standardization sample is a large, representative group used to establish the norms against which individual scores are compared
E.g. IQ scores are standardized so that a score of 100 always represents average performance for a given age group. This is only meaningful if the test was administered consistently to a representative sample
Reliability
A test is reliable if it yields similar results each time it is administered to the same person under similar conditions
Reliability is measured using a correlation coefficient - a score of 1.0 indicates perfect reliability
Two types of reliability are required:
Test-retest reliability: the same test administered to the same person on two separate occasions should produce similar scores
E.g. if you take an IQ test today and again in three months, a reliable test should produce approximately the same score both times
Split-half reliability: the test is divided into two halves and scores on each half are correlated. A reliable test should produce similar scores on both halves
A test can be reliable without being valid, but a valid test must be reliable
Validity
A test is valid if it measures what it is designed to measure
Two types of validity are required:
Construct validity: the degree to which the test actually measures the theoretical construct it claims to measure
E.g. does an IQ test genuinely measure intelligence, or is it measuring something else such as test-taking skill or cultural familiarity?
Predictive validity: the degree to which the test score accurately predicts future performance on a relevant outcome
E.g. an IQ test has predictive validity if high scorers consistently go on to perform well in academic and professional settings
A test that is not valid is not useful, regardless of how reliable it is
Systemic issues: stereotype threat, Flynn effect, bias, and historical misuse
Intelligence testing does not occur in a vacuum
Scores are influenced by social, cultural, and historical factors that must be understood to interpret them responsibly
Stereotype threat and stereotype lift
Stereotype threat occurs when awareness of a negative stereotype about one's group causes a person to perform below their potential on a test
The anxiety of possibly confirming the stereotype impairs performance
E.g. research has shown that when women are reminded of the stereotype that men perform better at math before a math test, their scores decrease, even though no such difference exists under neutral conditions
Stereotype lift is the complementary effect
Members of a group that is positively stereotyped on a measure may perform better than expected when the stereotype is made salient
Researchers strive to develop assessments that are socioculturally responsive, i.e. designed to reduce the influence of stereotype threat and cultural bias on scores
The Flynn effect
The Flynn effect is the finding that IQ scores across much of the world have generally increased over time
Average IQ scores have risen by approximately 3 points per decade over the 20th century
This increase is too rapid to be explained by genetic change, therefore it is attributed to societal factors such as:
higher socioeconomic status and improved nutrition
better access to healthcare
increased access to education and more cognitively stimulating environments
The Flynn effect demonstrates that IQ scores are not fixed biological measurements, but are significantly influenced by environmental and social conditions
Bias and group differences
IQ scores tend to vary more within any given group than between groups, meaning the differences between individuals within a group are larger than the average differences between groups
This is a critical point as knowing someone's group membership tells you very little about their individual intelligence
Personal and sociocultural biases can distort the interpretation of individual IQ scores:
Poverty, discrimination, and educational inequities have been shown to negatively influence intelligence scores of individuals and groups
IQ tests have historically been criticized for containing items that reflect the knowledge and experiences of white, middle-class, Western cultures
This places test-takers from other backgrounds at a systematic disadvantage
Historical misuse of intelligence tests
IQ scores have been used historically to limit access to jobs, military ranks, educational institutions, and immigration to the US:
During World War I, army intelligence tests were used to rank recruits
Results were used to argue for the intellectual inferiority of certain racial and immigrant groups, despite the tests being culturally biased
IQ testing was used to justify discriminatory immigration policies in the early 20th century, restricting entry to the US from certain countries on the basis of alleged low intelligence
These historical uses demonstrate that intelligence tests are not neutral instruments
Their design, administration, and interpretation are all shaped by the social and political contexts in which they are used
Examiner Tips and Tricks
For Skill 1.B, Flynn effect questions may describe rising IQ scores over time and ask you to explain the cause
Always attribute the Flynn effect to environmental and societal factors, not genetic change
For Skill 2.D, intelligence testing raises significant ethical concerns
Be prepared to evaluate whether historical uses of IQ tests followed appropriate ethical procedures
For Skill 4.B, you may be asked to evaluate the claim that IQ tests are fair and unbiased measures of intelligence
Use stereotype threat, cultural bias, and the historical misuse of IQ scores as evidence to argue that tests have not always been socioculturally responsive and support your claim with specific examples
Ensure that you know the difference between:
test-retest reliability (same test, same person, different occasions) and split-half reliability (same test, split into two halves, correlated)
construct validity (does it measure what it claims?) and predictive validity (does it predict future performance?) (Skill 2.D)
Unlock more, it's free!
Was this revision note helpful?