A Review of Psych 130-Research Methods: Research Designs in Psychology


CONTENTS:

Introduction

Questions and Hypotheses

Experimental Research

Quasi-experimental Research

Non-experimental Research

Behavioral Variables

Non-experimental Studies Revisited

Data Forms for Observational Studies

Evaluating Research

Experimental Design Tree

 

Introduction

 

    Psychology is the discipline concerned with gathering knowledge of behavioral and cognitive processes, and with the application of that knowledge in the solution of practical problems. It is distinguished from the related disciplines of anthropology, economics, political science, and sociology in that 1)it seeks to understand the behavior and cognitive processes of all species, not just our own, 2)it emphasizes understanding the individual and 3)it makes more extensive use of scientific methodology, especially experimental methods, than the other disciplines. Psychology is strongly identified with its applications in areas such as child rearing, counseling, education, and psychotherapy, but it is important to realize that these applications must be based upon a solid body of knowledge if they are to be successful. It is the gathering of psychological knowledge with which this material is concerned.

     There are many ways to gather knowledge. The artist, humanist, philosopher, novelist, and theologian has each, in his/her own way, contributed to our understanding of the universe. Here, however, we will consider only the scientific approach to knowledge. Scientists make several assumptions that guide their search for knowledge, and the two most important ones are determinism and empiricism.

     Determinism is the assumption that the universe and all of the events that occur within it are orderly and lawful. Science seeks to discover that order, and to understand the laws of nature. The principle of determinism also asserts that all natural events have natural causes. Scientists ultimately seek the causes of the phenomena they study.

     Empiricism is the assumption that the best (some would argue the only) source of knowledge is observation. Scientific observations are said to be empirical, which means that they are based on sensory data. Of course scientists often use instruments that extend their senses. They are also public in the sense that anyone ought to be able to verify scientific observations. To be accepted by the scientific community, observations must also be reliable. This means that they must be repeatable, and therefore trustworthy. The assumption of empiricism implies that science must base its search for the orderliness of nature, and for the causes of natural events on objective observations. Empirical methods are at the heart of the scientific approach, and distinguish it from other ways of gaining knowledge. They also limit the scope of science to natural events that can somehow be observed.

     The process of gathering knowledge is called research. Scientific research is based upon the principles of determinism and empiricism, and the methods used by science reflect these fundamental assumptions. In this way, science is different from other approaches to knowledge. The goal of behavioral research to describe, explain, and predict behavior and cognitive processes. To the scientist, understanding implies a knowledge of causes. Thus the attempt to discover causes is therefore an important part of science.

   The most direct product of scientific research is the discovery of facts. A scientific fact is a reliable empirical observation. Research can provide three kinds of scientific facts. A descriptive fact is an empirical description of some event or phenomenon, and often serves to define the phenomenon. Descriptive facts make up the most basic data of science. They are the events and phenomena we want to explain. A correlational fact is the observation that two or more events or phenomena are reliably related. The observation that parents who abuse their children were often abused themselves when they were children is a correlational fact. A functional fact is the observation that one event is the cause of another. Cause-effect relationships are difficult to discover.

     Notice that correlational facts are also descriptive, but not all descriptive facts are correlational. Likewise, causal facts are correlational, but not all correlational facts are causal. This is a source of considerable confusion. The fact that two events are correlated does not imply that one is the cause of the other. On the other hand, if two events are not correlated, then one cannot be the cause of the other! One of the greatest errors that a scientist, or non-scientist can commit is confusing correlation with causation. Although causes must be correlated with their effects, correlation between two events does not imply that one is the cause of the other. This problem will be explored later.

     Correlational and functional facts are sometimes called empirical laws, because they represent the orderliness of nature that science seeks to discover.

     Science seeks not only facts, but also explanations of those facts. In science, explanations are provided by theories. All sciences are theoretical. Recall from our earlier look at theories that they attempt to do four things: 1)they attempt to organize a body of facts into a meaningful system of knowledge; 2)they provide tentative explanations of those facts; 3)they guide research by making predictions on the basis of their explanations; and 4)they advance knowledge and understanding. Good theories make predictions that can be tested for accuracy, which in turn makes it possible to evaluate them. In this way, they advance knowledge and understanding.

   All scientific methodology is based on observation. Scientists can make observations under many different conditions. We may observe behavior of a wolf under the highly controlled conditions of a laboratory, or the completely uncontrolled conditions of a natural forest setting.

     The conditions under which observations are made have two major implications. First, they limit the kinds of facts (descriptive, correlational, functional) that can be discovered. Second, they define the different methods of scientific research, i.e. the different research designs. Both of these implications are explored below.

 

Questions and Hypotheses

 

     While The overall goal of scientific research is to describe, predict and explain the phenomena of interest, the scope of a single research project is more limited. The researcher sets out to answer one or a few specific questions and/or to test one or a few specific hypotheses.

     Research questions correspond to the three kinds of empirical facts. We can ask descriptive, correlational, and causal questions.      A research hypothesis is a prediction of the outcome of a research study, which is tested for accuracy by the data collected in the study. It is a statement of what one expects to observe when answering a research question. Thus hypotheses can also be descriptive, correlational, or causal.

     Since research designs differ in terms of the kinds of facts that they can reveal, the researcher must select the research design that is appropriate for answering the kind of question posed, or for testing the kind of hypothesis stated.

     Questions and hypotheses about differences are often encountered in research situations, and they can be either correlational or causal. For example, we might ask if male wolves howl more than female wolves, i.e. do males and female differ? This is a correlational question because it is concerned with whether two variables (gender and howling) are related. A hypothesis about this difference would be a correlational one. But a question or hypothesis about why males and females differ in terms of howling would be causal because is it concerned with not only the difference itself, but also with what causes, and therefore explains the difference.

 

Experimental Research

 

   In order for a research design to be a true experiment, three things must be involved. First, an independent variable (IV) must be directly manipulated (varied) by the researcher. Second, a dependent variable (DV) must be reliably measured. The goal is to determine if and how the IV affects the DV. Finally, all other variables besides the IV that could influence the DV must be controlled. It is the control of such extraneous variables (EV's) that allows the effect of the IV on the DV to be assessed.

     In an experiment, two or more situations that are identical (all EV's controlled) except for one factor (the IV) are established. The DV is measured in each situation. Any differences in the DV between situations must be due to the only factor that was different in them, the IV. By controlling EV's, alternative explanations of observed differences are eliminated, so that the only possible explanation is in terms of the IV. To the extent that this is successful, the experiment is said to be internally valid. If some EV varies along with the IV, the two are said to be confounded. The effects of the IV cannot be separated from those of the EV. The goal in designing experiments is to insure internal validity by eliminating all possible confounds.

     Experiments thus deal with observed differences, and if internally valid, they allow the researcher to test cause-effect hypotheses. They assess the effect of the IV on the DV. Experiments are therefore able to provide descriptive, correlational, and functional facts.

Experimental Designs

     Experimental designs are classified along three dimensions: 1)the way in which the IV is manipulated, 2)the number of IV's and of their levels involved, and 3)the methods used to control EV's.

     Manipulation of the IV(s). In a between design, different groups of subjects are tested under the different levels of the IV. Each subject is tested once, and only under one level of the IV. The researcher must take steps to insure that the different groups are comparable by controlling EV's that might make them differ, especially those that cause individual differences. In within designs, subjects are not divided into groups. Instead, each subject is tested under every level of the IV. Such designs are therefore often called repeated measures designs. Since different groups are not involved, the need to control EV's that cause individual differences is eliminated. In such designs, each subject serves as his/her own control. But in within designs, a different problem arises. Since subjects are tested repeatedly, it is necessary to insure that this doesn't introduce any EV's that might influence the DV. For example, the researcher must insure that one testing does not influence performance on subsequent testings. Sequence effects (EV's) of this sort might be due to a long-lasting effect of one level of the IV, to the order in which the different levels of the IV are imposed, to the fact that the subject has practiced the behavior being measured (the DV), etc.

     In a between design, the differences between subjects cannot be separated from the differences between the groups. In a within design however, since each subject is measured two or more times (once under each level of the IV) it is possible to measure difference between subjects apart from differences between levels of the IV. Performance of each subject averaged over all levels of the IV can be compared to that of the other subjects. In a within design, SUBJECTS is treated as an IV whose effects on the DV can be measured.

     Number of IV's and of Their Levels. The simplest possible experiment involves the manipulation of one IV at two levels. Experiments involving only one IV are called one-way designs. When only two levels of the IV are involved, the experiment is bivalent, and when more that two levels are employed, it is multivalent. Since multivalent experiments employ several levels of the IV, they can give a better, more complete picture of the effects of an IV that can have many levels. In order to determine the effects of some drug (IV) on anxiety (DV) many dose levels of the drug would be employed in order to find the most effective.

     Experiments involving more than one IV are called n-way designs, where n refers to the number of IV's used. In the simplest n-way design, two levels of two different IV's are combined (a 2 x 2 design). Actually, any number of levels of any number of IV's could be combined in an n-way experiment. For example, a 2 x 3 x 5 design would involve three IV's (3-way design), one manipulated at two levels, another at three levels, and the third at five levels. As the number of IV's and levels increase, the results of the experiment become more difficult to interpret.

     The major advantage of the n-way design is that it not only allows for the assessment of the effects of each IV independently from the effects of other IV's, but also shows how the different IV's combine to influence the DV. An interaction is present in an n-way design if the effect of one (or more) of the IV's depends upon the level of another IV (or IV's). For example, the effect of alcohol (one IV) on motor coordination (DV) may depend upon the presence of cocaine (another IV) in the subject's blood. Such interactions can be observed only in n-way experiments.

     N-way designs are said to be factorial if every level of each IV involved is combined equally often with each level of the other IV (or IV's). Every possible combination of the levels of the IV's is used.

     Since two or more IV's are involved, and n-way experiment could be designed in several ways. Consider the previous example. Both alcohol and cocaine could be manipulated between groups. A different group of subjects would be tested under each different combination of alcohol levels and cocaine levels.

     Or, both alcohol level and cocaine level could be manipulated within. Each subject would be tested under every alcohol level and cocaine level combination. Because many different testing orders might be possible depending upon how many levels of alcohol and cocaine are used, and therefore because sequence effects might be a real problem, the within approach would not be very attractive.

     A third possibility exists. One of the IV's, perhaps alcohol, could be manipulated between groups (different group tested under each alcohol level), while the other IV, cocaine, could be manipulated within. Thus each subject would be tested under only one level of alcohol, but under every level of cocaine. The resulting design is sometimes called mixed, since it mixes both between and within manipulations. It is also called a nested design (note: nested is the preferred term, since there are other designs that are called mixed). This sort of design is not factorial. Why? Repeated measures are used in this design. Recall that when repeated measures are made, SUBJECTS is treated as an IV. But not all subjects (levels of the IV) are tested under every level of alcohol, since alcohol is manipulated between groups. SUBJECTS is factorial to the cocaine, since its levels are manipulated within subjects, and every subject is tested under every cocaine level. In nested designs SUBJECTS is factorial to the within IV(s), but not to the between IV(s). The nested design is very commonly used, especially in the study of learning, where the effect of repeating a task on several trials is of interest. Some IV, such as reward size, might be manipulated between groups, while every subject is given some number, say 20, trials. Trials is a within variable. This design is sometimes called subjects/groups x trials (read "subjects over groups by trials").

     Control of Extraneous Variables. An extraneous variable (EV) is any variable other than an IV, which can influence the DV. EV's are controlled for two reasons. The first and most important reason for controlling EV's is to avoid confounds and to insure internal validity. The second reason is to reduce variability in the DV. Uncontrolled EV's that do not systematically vary with the IV(s) cause unsystematic (random) variability in the DV (sometimes called the disturbance effect of EV's). Such variability is called error variability, and it tends to obscure any systematic variability caused by the IV(s). Differences caused by the IV are harder to detect. Controlling this nonsystematic or error variability makes the experiment more sensitive to the effects of the IV(s). There are three kinds of variables that can be EV's: subject, situational, and sequence.

     Subject variables refer to the characteristics or conditions of the subjects (e.g. age, gender, IQ, etc.) that cause individuals to differ when the DV is measured. Whenever a between manipulation is used, potential confounds due to subject variables are a source of concern. The problem is to generate groups of subjects that do not differ systematically, and hence that are comparable. If the groups differ on the DV before the IV is manipulated because of subject EV's, then it is impossible to assess the effects of the IV. The most common way to control for subject EV's is to randomly assign subjects to groups. The logic of this approach is that the subject EV's will be evenly distributed among the groups. Experiments in which this technique is employed are called randomized groups designs. An alternative is to measure each subject on the EV of concern, and then to match the groups in terms of that EV. For example, the IQ of every subject could be measured, and groups that do not differ in terms of their mean IQ could be formed. If three groups were involved, the three subjects having the highest IQ scores would be randomly assigned, one to each group. They would form a set of subjects matched on IQ. A matched set of subjects is called a block. Subjects within a block are randomly assigned to groups in order to control for other subject EV's that have not been measured. The assignment procedure is repeated until all of the subjects have been placed in groups. Experiments in which this technique is employed are called matched or randomized blocks designs. In such a design BLOCKS is treated as an IV in the same way as SUBJECTS in a within design. The randomized blocks design is somewhat intermediate between randomized groups and within designs. The groups in a randomized blocks design are more similar than they would be in a randomized groups design (they are said to be nonindependent or correlated groups), and the problem of sequence EV's of the within design are avoided. Of course, another way to solve the problem of subject EV's is to use a within design. In a within design, since every subject is tested under every level of the IV(s), subject variables cannot possibly be confounded with the IV(s), and error variability is automatically reduced. The within design is, in a sense, the ultimate matched or blocked design!

     Situational variables are variables related to the environment in which the experiment is conducted. Background noise, temperature, illumination, etc. are examples. These must be controlled in order to avoid confounds and to reduce error variability. The typical way in which situational EV's are controlled is to hold them constant for all subjects. When this is not possible, then randomization and matching procedures can be employed to insure that situational EV's do not systematically vary with he IV(s). Actually, holding them constant is equivalent to matching, but only one level of the EV is involved. Situational variables must be controlled in all experimental designs.

     Sequence variables can be a problem whenever a within manipulation is employed. Sequence effects were already described. There is no way to control sequence EV's that always works. One approach, counterbalancing, involves testing different subjects under every possible testing order. If there are two levels of the IV (1 and 2), then there are two possible testing orders: 1 first, then 2; 2 first then 1. As the number of levels increases, the number of possible testing orders does so dramatically. With three levels, there are six possible orders, and so on. In counterbalancing, a different group of subjects is tested under each order, thus the number of groups required can become quite large. With a large number of IV's and/or levels, incomplete counterbalancing is sometimes used. Instead of using all possible orders, a set of orders is selected to that each level of the IV(s) occurs equally often in each ordinal position. The result is a Latin square (or Greco-Latin square when there is more than one IV) design. Whenever counterbalancing is used, the experiment should be designed in such a way as to permit TESTING ORDER to be treated as an IV, so that any effect of testing order that might occur can be detected. If an order effect is present, it may be difficult to assess the effects of the other IV(s). Order of testing could be randomized, but if this is done, there is no way to even determine if order had an effect. In situations where sequence EV's are likely to be a problem, a between design is a better choice!

     Single-Subject Designs. All of the designs described thus far employ groups of subjects. Designs involving single subjects were suggested by B.F. Skinner, who argued that the use of statistics to describe the performance of groups of subjects obscures the most important information, the behavior of the individual. For Skinner, he orderliness and lawfulness of behavior, with which we ought to be concerned, resides in the individual, not in the group.

     In single-subject experiments, an IV is imposed and removed, and its effect on some preestablished baseline behavior is observed. A within manipulation is therefore involved. Situational EV's must, of course, be controlled in order to insure that any observed changes in the behavior are indeed due to the manipulation of the IV. The single-subject design has been used extensively in studies of the effects of the consequences of a behavior (reinforcers and punishers) on its rate of occurrence (operant rate).

 

Summary

 

     All experimental designs have in common the fact that they test difference hypotheses. They are used to determine if an IV causes a difference in a DV. Observed differences must be interpreted in terms of how well EV's were controlled, since it is the control of EV's, and thus the elimination of alternative explanations, that allow the researcher to conclude that the IV caused the difference.

 

Quasi-experimental Research

 

    Quasi-experimental designs are ones that approximate, but do not meet all of the requirements that define a true experiment. They also test difference hypotheses. There are two kinds of situations that give rise to quasi-experimental designs. The first involves IV's that cannot be manipulated. Gender is an example. We might want to test the hypothesis that male wolves are less aggressive than are females. To do so, we might observe all instances of aggression between wolves that occur during a three month period and note whether each was initiated by a male or a female. A difference between males and females might be discovered. Logically, initiation of aggression is the DV, and gender is the IV having two levels (male, female). But, since gender cannot be manipulated (subjects can only be assigned to groups on the basis of their gender), it is not a true IV. Variables of this sort are called pseudoindependent variables. The impact of this situation can easily be understood. It may be observed that males and females differ in initiation of aggression, that is, a reliable relationship between gender and initiation of aggression might be found. The two are correlated. But we cannot conclude that gender caused the observed difference. To be sure, something about females caused them to differ from males in initiating aggression, but the design does not reveal what! Whenever characteristics of subjects (gender, age, personality type, social status etc.) are used as IV's the design is quasi-experimental. Designs involving such variables are between designs. They are very common in psychology.

     Sometimes the researcher may be able to manipulate an IV, but be unable to control EV's. The result may be a quasi-experimental design. Its outcome will not allow for cause-effect conclusions, because the design has not eliminated alternative explanations for observed differences. A common design used in such situations is called the Single-Group Pretest-Posttest Design. Here, a DV is measured in a single group before and after an IV is imposed. The researcher may be interested in demonstrating that a change in the DV was caused by the IV, but this can be done only if all relevant EV's have been controlled. The design can be improved by including a control group which does not receive the IV. Subject EV's must be controlled so that the groups are comparable. The result is a Multiple-Groups Pretest-Posttest Design, analogous to the subjects/groups x trials design described earlier. If situational and sequence EV's are controlled as well, then the study could be a true experiment.

     Often there is no clear-cut difference between experimental and quasi-experimental designs. The difference is a often a matter of the adequacy of the methods used to control EV's. Note that there are quasi-experimental designs which are analogous to each of the experimental designs previously described.

     In terms of the conclusions permitted by experimental and quasi-experimental designs, the major difference is that cause-effect relationships can be demonstrated with experimental designs, but not with quasi-experimental ones. Quasi-experimental designs lack the internal validity that makes causal inferences possible. Both kinds of designs can reveal correlational and descriptive facts. Both test difference hypotheses, but they differ in terms of how observed differences are interpreted.

     Another example may help clarify all of this. The Ex Post Facto Design is a quasi-experimental design used to study past events. A researcher might be interested in events related to our annual “wolf roundup.” She might examine the records of aggressive behaviors among the wolves for the week before, and the week after the roundup. Suppose that she finds an increase in aggressive behavior following the roundup. Clearly, she cannot conclude that the roundup caused the increase, because other possible explanations of the increase might exist.

     Although the lack of internal validity makes generating cause-effect conclusions from quasi-experimental data questionable, such designs do provide valuable information, especially in situations where controlled experimentation is not possible. They do help us to understand and explain psychological phenomena.

     Finally, some n-way designs combine elements of experimental and quasi-experimental methods. For example, we might be interested in the effects of a drug on aggressive behavior, and decide to test the effects of three drug levels: 0 (control), .5, and 1.0 mg. We might suspect that the effect of the drug depends on the gender of the subjects, so we want to test both male and female wolves. The design is thus a 2 x 3 factorial. Notice however, that drug level is a true IV that can be directly manipulated, and that gender is a pseudo IV that cannot be manipulated. This kind of design is called mixed, since it involves both experimental and quasi-experimental methods (remember that this is the second and preferred use of the term mixed).

 

Non-Experimental Research

 

   There are many situations in which the researcher does not or cannot manipulate and control variables. Research of this sort is called non-experimental. There are many different kinds of non-experimental research, but only a few will be considered here.

     Correlational Designs. The goal of a correlational design is to measure two or more variables, and to determine if they are related, i.e. if they vary together. An example from human behavior is instructive. Is there a relationship between marital satisfaction and frequency of sexual intercourse with one's spouse? To answer this question, some measure of marital satisfaction would need to be developed, as well as a measure of frequency of sex. Assume that appropriate measures are available, and data are collected. One of many correlational statistics can be used to determine if a relationship exists. Suppose that a reliable relationship was found. Could the researcher conclude that a lot of sex determines (causes) the degree of marital satisfaction?

     There are two basic reasons why such a conclusion would not be justified. First, it would be impossible to know which variable was the cause, and which was the effect. It may well be that the level of marital satisfaction determines the frequency of sex, rather than being determined by it. This is sometimes called the directionality problem. The second problem is that both marital satisfaction and frequency of sex might be determined by some other variable that was not considered. For example, the economic stability of the couple might be the cause of both marital satisfaction and frequency of sex. This illustrates the third variable problem. Note that both of these problems are a matter of the failure to eliminate alternative explanations, and are avoided by the manipulation and control employed in experimental research. Note also that it would be very difficult to conduct an experiment on the issue described in this example. Correlational designs can often be used in situations where experimentation is impossible. Observed correlations can also serve as the bases for hypotheses that can be tested experimentally.

     Correlational designs thus yield both correlational and descriptive facts, and may point towards causal relationships. They do not permit causal conclusions. It is often said that confusing correlation with causality is one of the most serious, and most common errors. For example, the observation that convicted sex offenders read a lot of pornography (correlation) does not mean that pornography causes sex crimes (despite what many people would have you believe!). On the other hand, if there was no correlation between the two, then pornography could be eliminated as a possible cause.

     Naturalistic Studies. The key element in a naturalistic study is that the researcher does nothing that would interfere with, or influence the on-going behavior being observed. Unobtrusive measurement, of which the subject is not aware, is used. The goal is to observe and describe behavior in a natural setting. Behavioral descriptions from naturalistic studies constitute the most basic data of the behavioral sciences. Naturalistic studies can involve testing correlational and difference hypotheses. They can thus yield correlational and descriptive facts. Since neither manipulation nor control are involved, they do not permit causal conclusions. They are a rich source of causal hypotheses.

     Survey Research. Conceptually, survey research is quite similar to naturalistic study in that the researcher attempts to avoid influencing the observations. Control and manipulation are not involved. But responding to a questionnaire or interview makes the use of unobtrusive measures impossible. A problem in survey work is to design and present questions in such a way as to avoid biasing the subject's responses. Another problem is that the researcher can never be sure how accurate the subject's responses are. Correlational and difference hypotheses can be tested using survey techniques, but causal hypotheses cannot. Great care must be taken when interpreting survey data, even when the goal is to generate only descriptive information.

     Some writers prefer to think of surveys as measurement techniques rather than as research designs. This view is at least partially correct. For example, the DV in an experiment or quasiexperiment could be measured using a survey instrument (questionnaire or interview). But it is also the case that there is a distinct approach to research, as described in the preceding paragraph, called survey research. So one must remember the distinction between survey research and the survey as a measurement technique.

     The use of survey research is limited to human subjects. Attitudes and beliefs of our species about the behavior, value, welfare, etc. of other species can be important to the animal behaviorist.

     Case Studies. One of the oldest approaches to research is the case study. The goal of the case study is to generate an account of an individual's past history and present circumstances. In this way, it might be possible to discover the causes of his/her present behavior. The case study is most often used in human clinical settings. Data from a case study may come from many sources, including direct observations of the subject. In human studies, most often they come from the subject's own memory of the past. A major problem is that it is usually impossible to verify such information. Instead of getting a picture of the individual's past, we often get one of his/her memory of the past, which may be quite different. A similar problem in animals studies exists when the memory of a human observer is the source of the data. At best, case studies provide descriptive information. They can be the source of many hypotheses that can be tested using more rigorous methods.

 

Summary

 

   Behavior and cognitive processes are complex phenomena. No single research method is appropriate for seeking knowledge about them. All of the methods described here have their uses, and when employed appropriately and interpreted correctly, all advance our knowledge and understanding. Animal behaviorists are not limited to the use of experimental methods, but rather adopt the research methods that are most appropriate for the specific question under study.

     All research methods involve the observation and measurement of variables. The most useful measurements are those that result in the assignment of numbers to the observed levels of the variables, a process called scaling. The analysis of research data involves the use of statistical procedures, and these are of two basic sorts. Descriptive techniques are used to summarize and describe the data that have been collected. Inferential techniques are used to test hypotheses using that data. Some test hypotheses about differences, and others test hypotheses about covariation among variables. Knowledge of statistical procedures is essential to the research process.

 

Behavioral Variables

 

     Behavior can be measured in various ways for research purposes. Some of the commonly used measures are as follows:

     Latency - time from some specified event (e.g. the beginning of a recording session or the presentation of a stimulus) to the onset or first occurrence of the behavior.

     Frequency - number of occurrences of the behavior.

     Rate - number of occurrences of the behavior per unit of time.

     Duration - the amount of time for which a single occurrence of the behavior lasts.

    Speed - how quickly a behavior occurs.

     Intensity - difficult to define; one approach is to measure local rate - the number of component acts per unit of time spent performing the activity, e.g. intensity of eating could be described in terms of the number of bites of food taken per minute.

    Choice - when two or more alternative behaviors are possible (e.g. in a discrimination task) the chosen behavior is recorded.

     Event - behavior pattern of short duration; described in terms of frequency.

     State - behavior pattern of long duration; described in terms of time of onset and offset.

     Bout - same behavior is repeated several times in succession; inter-bout interval - time between bouts of the same behavior.

 

Non-Experimental Studies Revisited: Naturalistic Observation

 

     Research often involves observation of the behavior of animals or humans as it occurs without any specific intervention on the part of the researcher. This section provides an elaboration on the methods of naturalistic observation which can be defined in terms of the methods used to sample the behavior, and the methods used to record behavioral observations (see Table 1).


    Table 1

 

SAMPLING METHODS

RECORDING METHODS

Ad libitum Sampling

Focal Sampling

Scan Sampling

Behavior Sampling

Continuous Recording

 

 

 

 

Time Sampling:

Instantaneous

 

 

 

 

Time Sampling:

One-Zero

 

 

 

 


     Sampling Methods define which animals are observed, and when observations are made. The following are the basic sampling methods:

1.Ad libitum - no systematic constraints are place on what is to be observed; observer notes whatever is visible and seems relevant at the time, often in

  diary format.

2.Focal sampling - one individual (or dyad, or litter, or some other unit) is observed for a specified amount of time, and all instances of its behavior are recorded, usually for several different categories of behavior.

3.Scan sampling - a whole group of subjects is rapidly scanned at regular intervals and the behavior of each individual at that instant is recorded. Often only one or a few categories of behavior are considered.

4.Behavior sampling - a whole group of subjects is observed, and each occurrence of a particular type of behavior, together with details of which individuals were involved, is recorded.

     Recording Methods define how the behavior is recorded. The following are the basic recording methods

1.Continuous recording (all-occurrence recording) - an exact and faithful record of the behavior is made, including frequency, duration, and time at which the behaviors started and stopped.

2.Time sampling - behavior is sampled periodically. The observation session is divided into successive, short periods of time called sample intervals. The instant in time at the end of each sample interval is the sample point.

  a.Instantaneous time sampling - at each sample point, the observer records whether or not a specific behavior is occurring, or what behavior is occurring.

  b.One-zero sampling - at the sample point, the observer records whether or not the behavior has occurred during the preceding sample interval.

   The choice of sampling and recording methods is of course determined by the goals of the research project.

 

Data Forms

 

     Although other methods are used, data collected in observational studies are most often recorded on forms or checklists. The exact nature of these forms and lists reflects goals of the research project, the subjects observed, the sampling and recording methods used, and the behaviors of interest.

     Data forms typically include an area in which to record the name of the observer, the identity of subjects observed, the place where the observations are made, and the date and time of the observations. Additional information about wind and weather conditions, temperature, and other relevant factors should also be recorded.

     Coding data can be helpful, so long as the code is simple and easy to use. Codes for behaviors (e.g. eating = E, resting = R), animals (e.g. Denali = D, Minka = M), spatial location (e.g. northeast quadrant of enclosure = NE) and combinations of these (e.g Minka resting on berm next to Denali = M: R-B/D) can be devised. It is important to record the code somewhere for future reference.

     Some sample data forms are included at the end of this document.

 

Evaluating Research

 

     A specific research project can be evaluated at a number of levels. For example, we can attempt to evaluate the ethical status of the study. What is of interest here, however, is the question of whether a particular bit of research is important or valuable.

     At the outset, it must be recognized that what one person considers to be important and valuable may seem trivial to another. If a particular species of animal or a particular behavior or process is of interest to you, then you will probably think that research on that species, behavior, or process is important. The role of personal interest in dictating what we find to be important or trivial cannot be denied or avoided.

     Given that a research project attracts your attention, and you consider it important enough pursue, how is it to be evaluated? This question must be considered both from the perspective of the person who conducted the research, and from

that of the person who reads and/or wants to use the research results. Fortunately, both of these perspectives converge on the same criteria for evaluation.

     The first question to be asked is whether the results of the project are reliable. Reliable results are one that are trustworthy. They are repeatable (they are not due to sampling error). The only direct way to assess reliability is by replication, the repetition of the project to determine if the same results are obtained. Non-scientists sometimes wonder why studies are repeated, and the need to demonstrate reliability is one of the reasons. Reliability can be indirectly assessed by means tests of statistical significance. Statistically significant results are ones that probably are reliable.

     If we are satisfied that the results of a project are reliable, then (and only then!) can we ask if the project is internally valid (results that are not reliable cannot be valid!). The essential question is this: was the study designed and executed correctly so that it provides a valid test of the hypothesis or answer to the question posed by the researcher? Are the conclusions justified by the data? These are issues of research design. There are many potential threats to the internal validity of a research project as suggested in the previous survey of research designs, and these must be carefully considered.

    Note that results can be reliable without being valid. Repeating a poorly designed study might well produce the same results as obtained in the original study, but that does not make them valid!

     If we are satisfied that the results of a study are internally valid, we can then ask questions about external validity. Can the results be generalized? To some extent, the answer to this will be determined by the scope of the question. For example, asking whether (reliable, internally valid) results from a study of the wolves at the Julian Wolf Preserve generalize to wolves at Wolf Park, is a different question than asking if the results generalize to wolves in the wild. In general, we can ask if results generalize to a)other members of the same species, b)other situations or conditions, and c)other species. There is only one way to assess external validity, and that is by repeating the project with other individuals, situations, conditions, and/or species. These are empirical questions that can be answered only by additional research. Assessing external validity is another reason why scientists sometimes seem to be repeating a study.

     If the results of a study are reliable and valid, they have some intrinsic value, in that they point to the regularities (or “laws”) of nature that science seeks to discover. But they still might not seem to be particularly valuable to an individual who is not interested! Many people, if not most, do not appreciate the pursuit of knowledge for its own sake, and they see no value in basic scientific research.

     To many, the value of research results depends upon the uses to which they can be put. Valuable research is research that solves some practical problem, gives us more control over the forces of nature, makes life easier, etc. Such utilitarian concerns are easy to understand, and there is certainly nothing wrong with using science to improve life, but it is a mistake to believe that this is the goal of science, just as it is a mistake to dismiss research that seems to have no practical value! For one thing, history is full of examples of scientific observations that were once considered to be trivial, but later turned out to be of great practical value. For another, it is only the cumulative development of human knowledge over time that will reveal the scientific value of a particular bit of knowledge gained through research. Finally, every bit of knowledge, however insignificant, contributes something to our understanding of the universe which is the ultimate goal of science.


Sample Data Forms

 

Sample Data Form A - All-occurrence sampling.


 

SUBMITTOR

S

U

B

M

I

T

T

E

E

 

D

A

T

BE

D

 

••••

•••••

A

 

••

••

T

 

••

 

••••

BE

 

 

 

 


NOTES:

    SUBMITTOR - wolf who submits

    SUBMITTEE - wolf who is submitted to

    D, A, T, BE - codes for four different wolves

    A check is made in the appropriate box every time that one

      of the wolves submits to another. Of course, submission

      would need to be clearly defined and described so that

      the observer would know when it happens!)



Sample Data Form B - Instantaneous sampling.

 Sample Period

Nearest Wolf

     D A T BE

1

A

D

BE

T

2

BE

T

A

D

3

 

 

 

 

4

 

 

 

 


NOTE:

    At each sample point, the nearest wolf to each wolf is

    recorded.



Sample Data Form C - Focal animal sampling.


Animal Denali


Behavior

Time of Each Occurrence

Feeding

0915 0918 0935 0949 0955

Grooming

0920 0950

Resting

0855 1000

Threatening

0914


Sample Data Form D - Instantaneous scan sampling.

 Animals, Behavior, Location


Time

D

BE

T

A

Beh.

Loc.

Beh.

Loc.

Beh.

Loc.

Beh

Loc

0800

0805

0810

0815

0820

0825

0830

0835

E

D

W

SW

SW

SW

R

R

R

NW

NW

NW

R

W

E

NW

SW

SW

R

R

R

NE

NE

NE







NOTES:

    The location and behavior of each wolf is recorded at each

         sample point (time).

    D, BE, T, and A are different wolves.

    Beh = behavior: E-eating, D - drinking, R - resting, W -

         walking

    Loc = location: SW - southwest quadrant of enclosure, NW -

         Northwest quadrant of enclosure


Sample Data Form E - One-zero sampling.


Behavior Walking



Time

Wolves

D

A

T

BE

0700

0702

0704

0706

0708

0710

0712

0714



 


 

NOTES:

    D, A, T, & BE are different wolves.

    An ✘ is entered if the animal is walking during the sample

         period.


Research Design Decision Tree (Experimental Designs)

 

    To determine the experimental design used in a study, answer each of the following questions:

 

 1.Was an IV directly manipulated by the researcher?

   Yes - Go to question 2.

   No - The design was either quasi-experimental or non-experimental, but not a true

     experiment.

 2.Were extraneous variables controlled?

   Yes - Go to question 3.

   No - The design was quasi-experimental; not a true experiment.

 3.The design was a true experiment. How many IV's did the researcher manipulate?

   One - The design is a one-way experiment. Go to question 4.

   Two or more - The design in an n-way experiment. Go to question 7.

 4.Were the subject divided into groups, with a different group tested under every

   level of the IV?

   Yes - Go to question 5.

   No - Go to question 6.

 5.Were subjects matched, then randomly assigned to groups?

   Yes - The design in a one-way, randomized blocks experiment.

   No - Then the subjects were randomly assigned to groups, and the design is a one-

    way, randomized groups experiment.

 6.Then every subject was tested under every level of the IV, and repeated measures

   were thus used. The design is a one-way, within subjects design.

 7.Was there a different group of subjects tested under each and every combination

   of the levels of the IV's?

   Yes - The design is factorial. Go to question 8.

   No - Go to question 9.

 8.Were subjects matched, then randomly assigned to groups?

   Yes - The design is an n-way, factorial, randomized blocks experiment.

   No - Then the subjects were randomly assigned to groups, and the design is an –

    way, factorial, randomized groups experiment.

 9.Was every subject tested under every combination of IV's and their levels?

   Yes - The design is an n-way, within subjects experiment.

   No - Go to question 10

10.Was every combination of IV's and their levels used?

   Yes - Go to question 11

   No - The design is a Latin Square or Greco-Latin Square experiment. Find out

    about them in an advanced course!

11.Then one (or more) of the IV's must have been manipulated between groups, and one

  (or more) must have been manipulated within subjects. The design is a nested

  experiment.