Observational Design (College Board AP® Psychology): Revision Note
Structured & unstructured observations
Structured observation
A structured observation is used when the researcher wants to observe specific, predetermined behaviors in a large sample or busy environment where many different behaviors are likely to occur
Rather than recording everything that happens, the researcher focuses only on a limited set of clearly defined behaviors of interest
The emphasis in structured observation is on gathering quantitative data, e.g.
The number of times a child displays aggressive behavior toward a peer during a 30-minute recess period
The frequency with which drivers stop at a crosswalk when a pedestrian is waiting
The number of times a student raises their hand to answer a question during a one-hour class
Evaluation of structured observations
Strengths
Quantitative data can be easily analyzed, presented graphically, and converted to statistics
This is a strength as it allows trends and frequencies of behavior to be identified across large samples
This increases the reliability of the results
Using predetermined behavioral categories keeps the researcher focused
They can disregard behaviors that fall outside the categories ensuring that what is being recorded is directly relevant to the research aim
Limitations
Quantitative data reveals what behavior occurred but not why
This means that structured observations lack explanatory power as they produce findings that are limited in depth and insight
Predetermined categories mean the researcher cannot record behaviors that fall outside them, even if those behaviors are interesting and relevant
This limits the usefulness and validity of structured observations
Unstructured observation
An unstructured observation is used when the researcher wants to observe the full range of behaviors occurring in a small sample or more intimate setting where interpersonal interaction is the focus
The researcher does not use predetermined behavioral categories — instead they record everything that occurs during the observation session
The emphasis in unstructured observation is on gathering qualitative data, e.g.
verbal and non-verbal communication between participants
the quality and tone of conversation (e.g. light-hearted, serious, aggressive)
how participants use and move within the environment
Examples of research scenarios suited to unstructured observation:
Observing how young children interact when playing with gender-stereotyped toys
Observing how couples communicate when discussing a source of conflict
Evaluation of unstructured observations
Strengths
Unstructured observations produce rich, detailed, in-depth qualitative data that captures the complexity of behavior
This is high in ecological validity as it reflects the genuine, unfiltered experience of the participants
The flexible, open-ended nature of unstructured observation allows the researcher to follow unexpected or particularly significant behaviors as they emerge
This can generate new insights and research questions
Limitations
The highly subjective nature of unstructured observations increases the risk that the researcher loses objectivity
They may become too close to the participants, succumb to confirmation bias, or unconsciously overlook behaviors that do not align with their expectations
This reduces the reliability of the findings
Analyzing the data from unstructured observations is time-consuming and depends heavily on the researcher's interpretation
This introduces subjectivity into the findings and reduces the validity of the published conclusions
Examiner Tips and Tricks
Structured observation is not the same as a controlled observation. Structured refers to the use of predetermined behavioral categories to record data, whereas controlled refers to the level of control the researcher exerts over the setting and procedure.
Behavioral categories & inter-rater reliability
Behavioral categories
Behavioral categories are used in structured observations to define and record the specific behaviors the researcher is interested in
Behavioral categories must:
only include directly observable behaviors — nothing that requires inference or interpretation
be clearly operationally defined so that there is no ambiguity about what counts as an instance of that behavior
be mutually exclusive — each behavior should fit into only one category
Examples of well-operationalized behavioral categories in a study on aggression include:
"Physical aggression" = punching, kicking, or shoving another person
"Verbal aggression" = shouting, name-calling, or threatening another person
"Non-aggressive behavior" = smiling, sharing, or cooperating with another person
Behavioral categories can be further subdivided for greater precision, e.g.
Physical aggression directed toward a peer of the same gender
Physical aggression directed toward a peer of a different gender
Inter-rater reliability
Even when behavioral categories are clearly defined, observations can still be affected by researcher bias
Different observers may interpret the same behavior differently
Inter-rater reliability is the level of consistency between two or more trained observers recording the same observation independently
Inter-rater reliability is established in the following ways:
All observers agree on the behavioral categories and how they will be recorded before the observation begins
Each observer conducts the observation independently to avoid one observer influencing another
After the observation, the two independent data sets are compared
A correlation is calculated between the two sets of scores — a strong positive correlation indicates good inter-rater reliability
If inter-rater reliability is low, the behavioral categories are reviewed and refined before the observation is conducted again
Establishing good inter-rater reliability reduces the risk that researcher bias has distorted the findings and increases confidence in the reliability of the conclusions
Evaluation of behavioral categories and inter-rater reliability
Strengths
The use of clearly defined, unambiguous behavior categories allow the researcher to record behavior objectively
Eliminating subjectivity moves the process closer to the scientific method
Inter-rater reliability ensures that the findings are consistent across observers
This strengthens the reliability of the data and reduces the likelihood that the findings will be challenged during peer review
Limitations
Predetermined behavioral categories may be too restrictive
If behaviors occur during the observation that do not fit any of the categories, they cannot be recorded
This means the findings may not accurately represent what actually occurred, reducing validity
Inter-rater reliability does not account for the possibility that observers simply guessed when scoring ambiguous behaviors
High agreement between observers does not necessarily mean the categories are being applied correctly
This overestimates the true reliability of the observation
Event sampling & time sampling
It can be difficult to observe and record all behaviors continuously throughout an observation session
Therefore researchers use sampling procedures to organize data collection
These include:
event sampling
time sampling
Event sampling
The researcher records every time a behavior from a specific behavioral category occurs throughout the entire observation session, e.g.
recording every instance of physical aggression during a 60-minute recess period
tallying every time a driver uses their phone while waiting at a traffic light
Time sampling
The researcher records all behaviors that occur during a set time interval at regular points throughout the observation session, e.g.
recording all behaviors for 20 seconds every 10 minutes across a 2-hour observation
recording all behaviors for 15 minutes every 3 hours across a two-day observation
The researcher determines which time interval is most appropriate for the specific study
Evaluation of event sampling & time sampling
Strengths
Event sampling ensures that specific behaviors will not be missed or overlooked
Every instance is recorded as it occurs, producing a complete and accurate frequency count
Time sampling gives the researcher flexibility to record any behaviors that occur within the time window
It also offers researchers the opportunity to record unexpected behaviors which may generate new research questions
Limitations
If too many target behaviors occur simultaneously or are particularly complex, event sampling may fail to capture all of them accurately
This limits the validity of the method as it would not provide a true reflection of what occurred during the observation session
Time sampling may miss any behaviors that occur outside of the designated time windows
Some behaviors may be over- or underrepresented in the findings, which limits the validity of the conclusions drawn
Examiner Tips and Tricks
Behavioral categories must describe only what can be directly seen and measured — if you cannot observe it, you cannot record it. E.g. a category like "anxious" is not acceptable, but "fidgets with hands" or "avoids eye contact" is.
Be able to distinguish between event sampling and time sampling. Event sampling records every instance of a target behavior; time sampling records all behaviors within a set time window at regular intervals; confusing the two is a common and easily avoidable mistake.
Unlock more, it's free!
Was this revision note helpful?