Intelligence Tests (College Board AP® Psychology): Study Guide

Raj Bonsor

Written by: Raj Bonsor

Reviewed by: Claire Neeson

Updated on

The Stanford-Binet IQ test

  • The first formal intelligence test was developed by Alfred Binet in France to identify children who needed additional support in school

    • Binet introduced the concept of mental age, which is the level of intellectual functioning typical of a child of a given age

      • A child's mental age was compared to their chronological age (their actual age) to produce a score

  • Lewis Terman at Stanford University adapted Binet's test for American use and introduced the formula for calculating the intelligence quotient (IQ):

    • IQ = (mental age ÷ chronological age) × 100

      • E.g. a 10-year-old child with a mental age of 10 has an IQ of 100 (10 ÷ 10 × 100); a 10-year-old with a mental age of 12 has an IQ of 120

  • The resulting test became known as the Stanford-Binet IQ test

    • This was the first widely used standardized intelligence test

  • In modern use, IQ scores are calculated differently, using deviation IQ, which compares a person's score to the average performance of others

  • IQ scores follow a normal distribution with a mean of 100 and a standard deviation of 15

    • Approximately 68% of people score between 85 and 115; approximately 95% score between 70 and 130

  • In modern times, IQ scores are often used to identify students for educational services

The Wechsler IQ tests

  • David Wechsler developed a family of intelligence tests designed to address limitations of the Stanford-Binet:

    • Wechsler Adult Intelligence Scale (WAIS) for adults aged 16 and over

    • Wechsler Intelligence Scale for Children (WISC) for children aged 6–16

    • Wechsler Preschool and Primary Scale of Intelligence (WPPSI) for children aged 2.5–7

  • The Wechsler tests differ from the Stanford-Binet in important ways:

    • They yield both an overall IQ score and separate scores for specific subtests, giving a more detailed profile of a person's cognitive strengths and weaknesses

    • Subtests cover six types of questions: information, comprehension, arithmetic, similarities, digit span, and vocabulary

  • This reflects Wechsler's view that intelligence involves a combination of abstract and verbal measures rather than a single score

  • The Wechsler tests are among the most widely used intelligence assessments in the world today

Psychometric principles: standardization, reliability, and validity

  • For any psychological test, such as including intelligence tests, to be considered useful, it must adhere to the following psychometric principles:

    • standardization

    • reliability

    • validity

Standardization

  • A test is standardized when it is administered using consistent procedures and environments for all test-takers

    • Standardization ensures that scores are comparable across individuals by ensuring everyone takes the test under the same conditions

  • A standardization sample is a large, representative group used to establish the norms against which individual scores are compared

    • E.g. IQ scores are standardized so that a score of 100 always represents average performance for a given age group. This is only meaningful if the test was administered consistently to a representative sample

Reliability

  • A test is reliable if it yields similar results each time it is administered to the same person under similar conditions

    • Reliability is measured using a correlation coefficient - a score of 1.0 indicates perfect reliability

  • Two types of reliability are required:

    • Test-retest reliability: the same test administered to the same person on two separate occasions should produce similar scores

      • E.g. if you take an IQ test today and again in three months, a reliable test should produce approximately the same score both times

    • Split-half reliability: the test is divided into two halves and scores on each half are correlated. A reliable test should produce similar scores on both halves

  • A test can be reliable without being valid, but a valid test must be reliable

Validity

  • A test is valid if it measures what it is designed to measure

  • Two types of validity are required:

    • Construct validity: the degree to which the test actually measures the theoretical construct it claims to measure

      • E.g. does an IQ test genuinely measure intelligence, or is it measuring something else such as test-taking skill or cultural familiarity?

    • Predictive validity: the degree to which the test score accurately predicts future performance on a relevant outcome

      • E.g. an IQ test has predictive validity if high scorers consistently go on to perform well in academic and professional settings

  • A test that is not valid is not useful, regardless of how reliable it is

Systemic issues: stereotype threat, Flynn effect, bias, and historical misuse

  • Intelligence testing does not occur in a vacuum

    • Scores are influenced by social, cultural, and historical factors that must be understood to interpret them responsibly

Stereotype threat and stereotype lift

  • Stereotype threat occurs when awareness of a negative stereotype about one's group causes a person to perform below their potential on a test

    • The anxiety of possibly confirming the stereotype impairs performance

      • E.g. research has shown that when women are reminded of the stereotype that men perform better at math before a math test, their scores decrease, even though no such difference exists under neutral conditions

  • Stereotype lift is the complementary effect

    • Members of a group that is positively stereotyped on a measure may perform better than expected when the stereotype is made salient

  • Researchers strive to develop assessments that are socioculturally responsive, i.e. designed to reduce the influence of stereotype threat and cultural bias on scores

The Flynn effect

  • The Flynn effect is the finding that IQ scores across much of the world have generally increased over time

    • Average IQ scores have risen by approximately 3 points per decade over the 20th century

  • This increase is too rapid to be explained by genetic change, therefore it is attributed to societal factors such as:

    • higher socioeconomic status and improved nutrition

    • better access to healthcare

    • increased access to education and more cognitively stimulating environments

  • The Flynn effect demonstrates that IQ scores are not fixed biological measurements, but are significantly influenced by environmental and social conditions

Bias and group differences

  • IQ scores tend to vary more within any given group than between groups, meaning the differences between individuals within a group are larger than the average differences between groups

    • This is a critical point as knowing someone's group membership tells you very little about their individual intelligence

  • Personal and sociocultural biases can distort the interpretation of individual IQ scores:

    • Poverty, discrimination, and educational inequities have been shown to negatively influence intelligence scores of individuals and groups

  • IQ tests have historically been criticized for containing items that reflect the knowledge and experiences of white, middle-class, Western cultures

    • This places test-takers from other backgrounds at a systematic disadvantage

Historical misuse of intelligence tests

  • IQ scores have been used historically to limit access to jobs, military ranks, educational institutions, and immigration to the US:

    • During World War I, army intelligence tests were used to rank recruits

      • Results were used to argue for the intellectual inferiority of certain racial and immigrant groups, despite the tests being culturally biased

    • IQ testing was used to justify discriminatory immigration policies in the early 20th century, restricting entry to the US from certain countries on the basis of alleged low intelligence

  • These historical uses demonstrate that intelligence tests are not neutral instruments

    • Their design, administration, and interpretation are all shaped by the social and political contexts in which they are used

Examiner Tips and Tricks

  • For Skill 1.B, Flynn effect questions may describe rising IQ scores over time and ask you to explain the cause

    • Always attribute the Flynn effect to environmental and societal factors, not genetic change

  • For Skill 2.D, intelligence testing raises significant ethical concerns

    • Be prepared to evaluate whether historical uses of IQ tests followed appropriate ethical procedures

  • For Skill 4.B, you may be asked to evaluate the claim that IQ tests are fair and unbiased measures of intelligence

    • Use stereotype threat, cultural bias, and the historical misuse of IQ scores as evidence to argue that tests have not always been socioculturally responsive and support your claim with specific examples

  • Ensure that you know the difference between:

    • test-retest reliability (same test, same person, different occasions) and split-half reliability (same test, split into two halves, correlated)

    • construct validity (does it measure what it claims?) and predictive validity (does it predict future performance?) (Skill 2.D)

Unlock more, it's free!

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Raj Bonsor

Author: Raj Bonsor

Expertise: Psychology & Sociology Content Creator

Raj joined Save My Exams in 2024 as a Senior Content Creator for Psychology & Sociology. Prior to this, she spent fifteen years in the classroom, teaching hundreds of GCSE and A Level students. She has experience as Subject Leader for Psychology and Sociology, and her favourite topics to teach are research methods (especially inferential statistics!) and attachment. She has also successfully taught a number of Level 3 subjects, including criminology, health & social care, and citizenship.

Claire Neeson

Reviewer: Claire Neeson

Expertise: Psychology Content Creator

Claire has been teaching for 34 years, in the UK and overseas. She has taught GCSE, A-level and IB Psychology which has been a lot of fun and extremely exhausting! Claire is now a freelance Psychology teacher and content creator, producing textbooks, revision notes and (hopefully) exciting and interactive teaching materials for use in the classroom and for exam prep. Her passion (apart from Psychology of course) is roller skating and when she is not working (or watching 'Coronation Street') she can be found busting some impressive moves on her local roller rink.