AP®PsychologyCollege BoardRevision NotesResearch MethodsEvaluating & Improving Psychological ResearchReliability

Reliability (College Board AP® Psychology): Revision Note

Written by: Raj Bonsor

Reviewed by: Claire Neeson

Updated on 31 March 2026

Reliability

Reliability refers to the consistency of a measure or procedure
- A study is reliable if it produces similar results when repeated under the same conditions
If a study is replicated and produces similar results, this demonstrates that the measure is consistent and not subject to significant fluctuation
Two types of reliability include:
- Internal reliability — the extent to which a measure is consistent within itself
- External reliability — the extent to which a measure is consistent over time and across different occasions
Reliability is essential to the scientific process in psychology
- Replication is the primary means by which researchers verify that findings are consistent and not the result of chance or error
Unreliable findings cannot be confidently used to draw conclusions about psychological phenomena, and are unlikely to survive the peer review process

Reliability across research methods

Different research methods vary in their level of reliability:
- Lab experiments tend to be the most reliable
  - They use standardized procedures, controlled conditions, and random assignment, making them easier to replicate and producing quantitative data that can be directly compared across studies
- Field experiments are less reliable than lab experiments
  - Although they implement an IV and produce quantitative data, they are subject to uncontrolled extraneous variables that are difficult to replicate exactly
- Natural experiments and quasi-experiments are less reliable still
  - The naturally occurring IV cannot be controlled or replicated by the researcher, meaning conditions are unlikely to be identical across replications
- Observational studies, surveys, and interviews vary in reliability depending on how well the procedure is standardized and how clearly variables are operationally defined

Measuring reliability

There are three main methods for measuring reliability, each suited to a different type of research:
- the test-retest method
- the split-half method
- inter-rater reliability

Test-retest method

The test-retest method measures external reliability:
- The same participants complete the same measure on two separate occasions, with a time gap between sessions (e.g. six months)
- If each participant produces a similar score on both occasions, external reliability is established — the measure is consistent over time
- Used to assess the reliability of surveys, questionnaires, and psychological scales

Split-half method

The split-half method measures internal reliability:
- The researcher divides the measure in half and compares participants' responses to the first half with their responses to the second half
- If similar responses are given across both halves, internal reliability is established — the measure is consistent within itself
- Used to assess the internal consistency of surveys and psychological scales

Inter-rater reliability

Inter-rater reliability measures the level of consistency between two or more trained observers independently recording the same observation
How it is established:
- All observers agree on the behavioral categories and how they will be recorded before the observation begins
- Each observer conducts the observation independently to avoid one influencing the other
- After the observation, the two independent data sets are compared
- A correlation is calculated between the two sets of scores — a strong positive correlation indicates good inter-rater reliability
- If inter-rater reliability is low, behavioral categories are reviewed and refined before the observation is repeated
Good inter-rater reliability reduces the risk that researcher bias has distorted the findings

Improving reliability

If reliability is measured and found to be low, the researcher must take steps to improve it before the study is conducted or repeated
- The appropriate improvement strategy depends on the research method being used

Lab and field experiments

Ensure all aspects of the procedure are fully standardized
- Same instructions, same environment, same materials, same timing across all conditions
Ensure the IV and DV are clearly operationally defined so the study can be precisely replicated

Observational studies

Ensure behavioral categories are clearly operationally defined and measure only directly observable behavior
Ensure behavioral categories are mutually exclusive with no overlap or ambiguity
Use more than one observer and establish inter-rater reliability before the main observation begins

Surveys

Run the test-retest method and revise or remove any questions that produce inconsistent scores across sessions
Replace ambiguous open questions with clearly worded closed questions or Likert scale items that are less open to interpretation

Interviews

Use the same interviewer across all participants to reduce variability in delivery
Ensure interviewers are trained and follow a consistent approach
Remove leading questions, double-barreled questions, and ambiguous wording from the interview schedule

Reliability & the evolution of scientific conclusions

Reliability is fundamental to how psychological conclusions evolve through peer review and replication:
- When a study is submitted for peer review, other experts in the field evaluate whether the methodology is sufficiently reliable to support the conclusions drawn
- If a study cannot be replicated or produces inconsistent results, its findings will be challenged or rejected during peer review
- When multiple independent replications of a study produce consistent findings, confidence in those conclusions increases — this is how psychological knowledge is built and refined over time
- Unreliable findings, even if statistically significant, cannot contribute meaningfully to the scientific evidence base, because they cannot be consistently reproduced

Examiner Tips and Tricks

Ensure that you understand these key points:

Reliability and validity are not the same thing — a measure can be reliable without being valid
- E.g. a bathroom scale that consistently overestimates weight by 5 pounds is reliable but not valid
A study does not need to produce identical results to be considered reliable — some variation is expected
- Reliability requires that results are similar, not identical, across replications
Inter-rater reliability does not guarantee validity — two observers can consistently agree on what they are recording while still recording the wrong thing if the behavioral categories are poorly designed

Unlock more, it's free!

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

I would just like to say a massive thank you for putting together such a brilliant, easy to use website.I really think using this site helped me secure my top gradesin science and maths. You really did save my exams! Thank you.

Beth
IGCSE Student

This website is soooo useful and I can’t ever thank you enough for organising questions by topic like this. Furthermore, the name of the website could not have been more appropriate as it literally did SAVE MY EXAMS!

Fathima
A Level Student

Incredible! SO worth my money, the revision notes have everything I need to know and are so easy to understand. I actually enjoy revising! It makes me feel a lot more confident for my GCSEs in a few months.

Kate
GCSE Student

Absolutely brilliant, both my girls used it for A levels and GCSE. It's saves on paper copies, also beneficial exam questions ranked from easy to hard. It's removed a lot of stress from the exams.

Sameera
Parent

Just to say that your resources are the best I have seen and I have been teaching chemistry at different levels for about 40 years

Mark
Chemistry Teacher

Excellent