Observational Design (College Board AP® Psychology): Study Guide

Raj Bonsor

Written by: Raj Bonsor

Reviewed by: Claire Neeson

Updated on

Structured & unstructured observations

Structured observation

  • A structured observation is used when the researcher wants to observe specific, predetermined behaviors in a large sample or busy environment where many different behaviors are likely to occur

    • Rather than recording everything that happens, the researcher focuses only on a limited set of clearly defined behaviors of interest

  • The emphasis in structured observation is on gathering quantitative data, e.g.

    • The number of times a child displays aggressive behavior toward a peer during a 30-minute recess period

    • The frequency with which drivers stop at a crosswalk when a pedestrian is waiting

    • The number of times a student raises their hand to answer a question during a one-hour class

Evaluation of structured observations

Strengths

  • Quantitative data can be easily analyzed, presented graphically, and converted to statistics

    • This is a strength as it allows trends and frequencies of behavior to be identified across large samples

    • This increases the reliability of the results

  • Using predetermined behavioral categories keeps the researcher focused

    • They can disregard behaviors that fall outside the categories ensuring that what is being recorded is directly relevant to the research aim

Limitations

  • Quantitative data reveals what behavior occurred but not why

    • This means that structured observations lack explanatory power as they produce findings that are limited in depth and insight

  • Predetermined categories mean the researcher cannot record behaviors that fall outside them, even if those behaviors are interesting and relevant

    • This limits the usefulness and validity of structured observations

Unstructured observation

  • An unstructured observation is used when the researcher wants to observe the full range of behaviors occurring in a small sample or more intimate setting where interpersonal interaction is the focus

    • The researcher does not use predetermined behavioral categories — instead they record everything that occurs during the observation session

  • The emphasis in unstructured observation is on gathering qualitative data, e.g.

    • verbal and non-verbal communication between participants

    • the quality and tone of conversation (e.g. light-hearted, serious, aggressive)

    • how participants use and move within the environment

  • Examples of research scenarios suited to unstructured observation:

    • Observing how young children interact when playing with gender-stereotyped toys

    • Observing how couples communicate when discussing a source of conflict

Evaluation of unstructured observations

Strengths

  • Unstructured observations produce rich, detailed, in-depth qualitative data that captures the complexity of behavior

    • This is high in ecological validity as it reflects the genuine, unfiltered experience of the participants

  • The flexible, open-ended nature of unstructured observation allows the researcher to follow unexpected or particularly significant behaviors as they emerge

    • This can generate new insights and research questions

Limitations

  • The highly subjective nature of unstructured observations increases the risk that the researcher loses objectivity

    • They may become too close to the participants, succumb to confirmation bias, or unconsciously overlook behaviors that do not align with their expectations

    • This reduces the reliability of the findings

  • Analyzing the data from unstructured observations is time-consuming and depends heavily on the researcher's interpretation

    • This introduces subjectivity into the findings and reduces the validity of the published conclusions

Examiner Tips and Tricks

Structured observation is not the same as a controlled observation. Structured refers to the use of predetermined behavioral categories to record data, whereas controlled refers to the level of control the researcher exerts over the setting and procedure.

Behavioral categories & inter-rater reliability

Behavioral categories

  • Behavioral categories are used in structured observations to define and record the specific behaviors the researcher is interested in

  • Behavioral categories must:

    • only include directly observable behaviors — nothing that requires inference or interpretation

    • be clearly operationally defined so that there is no ambiguity about what counts as an instance of that behavior

    • be mutually exclusive — each behavior should fit into only one category

  • Examples of well-operationalized behavioral categories in a study on aggression include:

    • "Physical aggression" = punching, kicking, or shoving another person

    • "Verbal aggression" = shouting, name-calling, or threatening another person

    • "Non-aggressive behavior" = smiling, sharing, or cooperating with another person

  • Behavioral categories can be further subdivided for greater precision, e.g.

    • Physical aggression directed toward a peer of the same gender

    • Physical aggression directed toward a peer of a different gender

Inter-rater reliability

  • Even when behavioral categories are clearly defined, observations can still be affected by researcher bias

    • Different observers may interpret the same behavior differently

  • Inter-rater reliability is the level of consistency between two or more trained observers recording the same observation independently

  • Inter-rater reliability is established in the following ways:

    • All observers agree on the behavioral categories and how they will be recorded before the observation begins

    • Each observer conducts the observation independently to avoid one observer influencing another

    • After the observation, the two independent data sets are compared

    • A correlation is calculated between the two sets of scores — a strong positive correlation indicates good inter-rater reliability

    • If inter-rater reliability is low, the behavioral categories are reviewed and refined before the observation is conducted again

  • Establishing good inter-rater reliability reduces the risk that researcher bias has distorted the findings and increases confidence in the reliability of the conclusions

Evaluation of behavioral categories and inter-rater reliability

Strengths

  • The use of clearly defined, unambiguous behavior categories allow the researcher to record behavior objectively

    • Eliminating subjectivity moves the process closer to the scientific method

  • Inter-rater reliability ensures that the findings are consistent across observers

    • This strengthens the reliability of the data and reduces the likelihood that the findings will be challenged during peer review

Limitations

  • Predetermined behavioral categories may be too restrictive

    • If behaviors occur during the observation that do not fit any of the categories, they cannot be recorded

    • This means the findings may not accurately represent what actually occurred, reducing validity

  • Inter-rater reliability does not account for the possibility that observers simply guessed when scoring ambiguous behaviors

    • High agreement between observers does not necessarily mean the categories are being applied correctly

    • This overestimates the true reliability of the observation

Event sampling & time sampling

  • It can be difficult to observe and record all behaviors continuously throughout an observation session

  • Therefore researchers use sampling procedures to organize data collection

  • These include:

    • event sampling

    • time sampling

Event sampling

  • The researcher records every time a behavior from a specific behavioral category occurs throughout the entire observation session, e.g.

    • recording every instance of physical aggression during a 60-minute recess period

    • tallying every time a driver uses their phone while waiting at a traffic light

Time sampling

  • The researcher records all behaviors that occur during a set time interval at regular points throughout the observation session, e.g.

    • recording all behaviors for 20 seconds every 10 minutes across a 2-hour observation

    • recording all behaviors for 15 minutes every 3 hours across a two-day observation

  • The researcher determines which time interval is most appropriate for the specific study

Evaluation of event sampling & time sampling

Strengths

  • Event sampling ensures that specific behaviors will not be missed or overlooked

    • Every instance is recorded as it occurs, producing a complete and accurate frequency count

  • Time sampling gives the researcher flexibility to record any behaviors that occur within the time window

    • It also offers researchers the opportunity to record unexpected behaviors which may generate new research questions

Limitations

  • If too many target behaviors occur simultaneously or are particularly complex, event sampling may fail to capture all of them accurately

    • This limits the validity of the method as it would not provide a true reflection of what occurred during the observation session

  • Time sampling may miss any behaviors that occur outside of the designated time windows

    • Some behaviors may be over- or underrepresented in the findings, which limits the validity of the conclusions drawn

Examiner Tips and Tricks

Behavioral categories must describe only what can be directly seen and measured — if you cannot observe it, you cannot record it. E.g. a category like "anxious" is not acceptable, but "fidgets with hands" or "avoids eye contact" is.

Be able to distinguish between event sampling and time sampling. Event sampling records every instance of a target behavior; time sampling records all behaviors within a set time window at regular intervals; confusing the two is a common and easily avoidable mistake.

Unlock more, it's free!

Join the 100,000+ Students that ❤️ Save My Exams

the (exam) results speak for themselves:

Raj Bonsor

Author: Raj Bonsor

Expertise: Psychology & Sociology Content Creator

Raj joined Save My Exams in 2024 as a Senior Content Creator for Psychology & Sociology. Prior to this, she spent fifteen years in the classroom, teaching hundreds of GCSE and A Level students. She has experience as Subject Leader for Psychology and Sociology, and her favourite topics to teach are research methods (especially inferential statistics!) and attachment. She has also successfully taught a number of Level 3 subjects, including criminology, health & social care, and citizenship.

Claire Neeson

Reviewer: Claire Neeson

Expertise: Psychology Content Creator

Claire has been teaching for 34 years, in the UK and overseas. She has taught GCSE, A-level and IB Psychology which has been a lot of fun and extremely exhausting! Claire is now a freelance Psychology teacher and content creator, producing textbooks, revision notes and (hopefully) exciting and interactive teaching materials for use in the classroom and for exam prep. Her passion (apart from Psychology of course) is roller skating and when she is not working (or watching 'Coronation Street') she can be found busting some impressive moves on her local roller rink.