AP®PsychologyCollege BoardStudy GuidesCognitionIntelligence & AchievementIntelligence Tests

Intelligence Tests (College Board AP® Psychology): Study Guide

Written by: Raj Bonsor

Reviewed by: Claire Neeson

Updated on 29 March 2026

The Stanford-Binet IQ test

The first formal intelligence test was developed by Alfred Binet in France to identify children who needed additional support in school
- Binet introduced the concept of mental age, which is the level of intellectual functioning typical of a child of a given age
  - A child's mental age was compared to their chronological age (their actual age) to produce a score
Lewis Terman at Stanford University adapted Binet's test for American use and introduced the formula for calculating the intelligence quotient (IQ):
- IQ = (mental age ÷ chronological age) × 100
  - E.g. a 10-year-old child with a mental age of 10 has an IQ of 100 (10 ÷ 10 × 100); a 10-year-old with a mental age of 12 has an IQ of 120
The resulting test became known as the Stanford-Binet IQ test
- This was the first widely used standardized intelligence test
In modern use, IQ scores are calculated differently, using deviation IQ, which compares a person's score to the average performance of others
IQ scores follow a normal distribution with a mean of 100 and a standard deviation of 15
- Approximately 68% of people score between 85 and 115; approximately 95% score between 70 and 130
In modern times, IQ scores are often used to identify students for educational services

The Wechsler IQ tests

David Wechsler developed a family of intelligence tests designed to address limitations of the Stanford-Binet:
- Wechsler Adult Intelligence Scale (WAIS) for adults aged 16 and over
- Wechsler Intelligence Scale for Children (WISC) for children aged 6–16
- Wechsler Preschool and Primary Scale of Intelligence (WPPSI) for children aged 2.5–7
The Wechsler tests differ from the Stanford-Binet in important ways:
- They yield both an overall IQ score and separate scores for specific subtests, giving a more detailed profile of a person's cognitive strengths and weaknesses
- Subtests cover six types of questions: information, comprehension, arithmetic, similarities, digit span, and vocabulary
This reflects Wechsler's view that intelligence involves a combination of abstract and verbal measures rather than a single score
The Wechsler tests are among the most widely used intelligence assessments in the world today

Psychometric principles: standardization, reliability, and validity

For any psychological test, such as including intelligence tests, to be considered useful, it must adhere to the following psychometric principles:
- standardization
- reliability
- validity

Standardization

A test is standardized when it is administered using consistent procedures and environments for all test-takers
- Standardization ensures that scores are comparable across individuals by ensuring everyone takes the test under the same conditions
A standardization sample is a large, representative group used to establish the norms against which individual scores are compared
- E.g. IQ scores are standardized so that a score of 100 always represents average performance for a given age group. This is only meaningful if the test was administered consistently to a representative sample

Reliability

A test is reliable if it yields similar results each time it is administered to the same person under similar conditions
- Reliability is measured using a correlation coefficient - a score of 1.0 indicates perfect reliability
Two types of reliability are required:
- Test-retest reliability: the same test administered to the same person on two separate occasions should produce similar scores
  - E.g. if you take an IQ test today and again in three months, a reliable test should produce approximately the same score both times
- Split-half reliability: the test is divided into two halves and scores on each half are correlated. A reliable test should produce similar scores on both halves
A test can be reliable without being valid, but a valid test must be reliable

Validity

A test is valid if it measures what it is designed to measure
Two types of validity are required:
- Construct validity: the degree to which the test actually measures the theoretical construct it claims to measure
  - E.g. does an IQ test genuinely measure intelligence, or is it measuring something else such as test-taking skill or cultural familiarity?
- Predictive validity: the degree to which the test score accurately predicts future performance on a relevant outcome
  - E.g. an IQ test has predictive validity if high scorers consistently go on to perform well in academic and professional settings
A test that is not valid is not useful, regardless of how reliable it is

Systemic issues: stereotype threat, Flynn effect, bias, and historical misuse

Intelligence testing does not occur in a vacuum
- Scores are influenced by social, cultural, and historical factors that must be understood to interpret them responsibly

Stereotype threat and stereotype lift

Stereotype threat occurs when awareness of a negative stereotype about one's group causes a person to perform below their potential on a test
- The anxiety of possibly confirming the stereotype impairs performance
  - E.g. research has shown that when women are reminded of the stereotype that men perform better at math before a math test, their scores decrease, even though no such difference exists under neutral conditions
Stereotype lift is the complementary effect
- Members of a group that is positively stereotyped on a measure may perform better than expected when the stereotype is made salient
Researchers strive to develop assessments that are socioculturally responsive, i.e. designed to reduce the influence of stereotype threat and cultural bias on scores

The Flynn effect

The Flynn effect is the finding that IQ scores across much of the world have generally increased over time
- Average IQ scores have risen by approximately 3 points per decade over the 20th century
This increase is too rapid to be explained by genetic change, therefore it is attributed to societal factors such as:
- higher socioeconomic status and improved nutrition
- better access to healthcare
- increased access to education and more cognitively stimulating environments
The Flynn effect demonstrates that IQ scores are not fixed biological measurements, but are significantly influenced by environmental and social conditions

Bias and group differences

IQ scores tend to vary more within any given group than between groups, meaning the differences between individuals within a group are larger than the average differences between groups
- This is a critical point as knowing someone's group membership tells you very little about their individual intelligence
Personal and sociocultural biases can distort the interpretation of individual IQ scores:
- Poverty, discrimination, and educational inequities have been shown to negatively influence intelligence scores of individuals and groups
IQ tests have historically been criticized for containing items that reflect the knowledge and experiences of white, middle-class, Western cultures
- This places test-takers from other backgrounds at a systematic disadvantage

Historical misuse of intelligence tests

IQ scores have been used historically to limit access to jobs, military ranks, educational institutions, and immigration to the US:
- During World War I, army intelligence tests were used to rank recruits
  - Results were used to argue for the intellectual inferiority of certain racial and immigrant groups, despite the tests being culturally biased
- IQ testing was used to justify discriminatory immigration policies in the early 20th century, restricting entry to the US from certain countries on the basis of alleged low intelligence
These historical uses demonstrate that intelligence tests are not neutral instruments
- Their design, administration, and interpretation are all shaped by the social and political contexts in which they are used

Examiner Tips and Tricks

For Skill 1.B, Flynn effect questions may describe rising IQ scores over time and ask you to explain the cause
- Always attribute the Flynn effect to environmental and societal factors, not genetic change
For Skill 2.D, intelligence testing raises significant ethical concerns
- Be prepared to evaluate whether historical uses of IQ tests followed appropriate ethical procedures
For Skill 4.B, you may be asked to evaluate the claim that IQ tests are fair and unbiased measures of intelligence
- Use stereotype threat, cultural bias, and the historical misuse of IQ scores as evidence to argue that tests have not always been socioculturally responsive and support your claim with specific examples
Ensure that you know the difference between:
- test-retest reliability (same test, same person, different occasions) and split-half reliability (same test, split into two halves, correlated)
- construct validity (does it measure what it claims?) and predictive validity (does it predict future performance?) (Skill 2.D)

Unlock more, it's free!

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

I would just like to say a massive thank you for putting together such a brilliant, easy to use website.I really think using this site helped me secure my top gradesin science and maths. You really did save my exams! Thank you.

Beth
IGCSE Student

This website is soooo useful and I can’t ever thank you enough for organising questions by topic like this. Furthermore, the name of the website could not have been more appropriate as it literally did SAVE MY EXAMS!

Fathima
A Level Student

Incredible! SO worth my money, the revision notes have everything I need to know and are so easy to understand. I actually enjoy revising! It makes me feel a lot more confident for my GCSEs in a few months.

Kate
GCSE Student

Absolutely brilliant, both my girls used it for A levels and GCSE. It's saves on paper copies, also beneficial exam questions ranked from easy to hard. It's removed a lot of stress from the exams.

Sameera
Parent

Just to say that your resources are the best I have seen and I have been teaching chemistry at different levels for about 40 years

Mark
Chemistry Teacher

Excellent