Interobserver Agreement Behavior

The review was conducted in four phases. First, all the papers were kept for further examination, which reported at least some direct observational data, in vivo or video, of human behavior in free operation. For example, work that contained only automatically recorded data (mechanically or electronically) was excluded, as were articles that reported only limited operative behaviors (e.g. B, data from trial-by-trial teaching or researcher-focused bite-by-bite behaviour in diet-related studies) and two research articles on animal behaviour. This resulted in a total of 168 articles. All articles containing continuously recorded data reported Interobserver contractual data; no one reported measurements of the observer`s accuracy. Thus, the Interobserver agreement continues to be the method by which the quality of behavioural data is assessed (as in Kelly, 1977). Three methods of calculating matching dominated (i.e., they were reported more than 10 times) in the articles reviewed: block-by-block agreement, exact agreement, and slot analysis. Figure 2 shows the cumulative frequency of items reported using the three methods. The data should not be interpreted as suggesting one method because it was used more frequently than another (for example.B. the block-by-block method would have been used 46 times, three times more often than the time window analysis method). Frequency of use can be an indicator of the publication rates of research groups that have chosen to apply the different methods. During the course of the review, it was found that the methods for calculating inter-compliance agreements were not always described in detail or consistently named.

Therefore, we provide a detailed explanation of the three most popular algorithms identified during our review. For continuously collected data, the calculation of percentage matching by means of a time window analysis was developed. Intervals of one second are set for the data streams of two observers, and comparisons with the second are made between them. If both records display an event (for discrete behavior) or a second continuous occurrence (for persistently measured behavior), this is counted as an agreement. Every second that a single record contains an event or behavior that occurs is a disagreement. The percentage of approval is calculated by dividing the number of agreements by the number of agreements plus disagreements. MacLean et al. (1985) realized that their algorithm was too strict for discrete event data.

Therefore, they recommended allowing tolerance for chord counting by extending the definition of an agreement to observations where one observer has recorded an event within ± seconds of the other observer. In the research articles studied, t ranged from 1 s (e.B Romaniuk et al., 2002) to 5 s (e.B. Lalli, Mauro & Mace, 2000, Experiment 3). We reviewed all research articles in 10 recent volumes of the Journal of Applied Behavior Analysis (JABA): Vol. 28(3), 1995, to Vol. 38(2), 2005. Continuous recording was used in the majority (55%) of the 168 articles that reported data on human behavior in free operation. Three methods of reporting on inter-compliance agreements (exact agreement, block-by-block agreement and slot analysis) were used in more than 10 of the articles reporting continuous recording. After identifying these currently popular algorithms for calculating agreements, we explain them to help researchers, software authors, and other consumers of JABA articles. Third, the remaining papers were reviewed to determine whether the continuous data obtained were analyzed to produce frequency (or flow) measurements, duration measurements, or both.

Fourth, these papers were re-examined to determine which algorithms were used to assess the reliability of the data (p.B. block-by-block agreement; Bailey & Bostow, quoted in Page & Iwata, 1986; Bailey and Burch, 2002), exact agreement (Repp et al., 1976), slot analysis (MacLean, Tapp, & Johnson, 1985; Tapp & Wehby, 2000) or others. Landis, J. R. & Koch, G. G. (1977). The measurement of observer harmonization for categorical data.

Biometrics, 33, 159-174. Second, the selected elements were examined to determine whether they contained continuous or discontinuous data. Continuous data collection was identified by applying the following definition: The researchers described observational records that contained second-by-second records of the appearance of discrete behaviors or beginnings and behavioral shifts with time, and the results were reported in standard units of measurement or their derivatives (e.B responses per minute, percentage of the observation session). Discontinuous methods were defined as data collection methods that recorded behaviours at time samples or at intervals greater than 1 s. The 93 articles that contained continuous data were examined in more detail. Behavioral scientists have developed a sophisticated methodology to assess behavioral changes that depends on an accurate measure of behavior. Direct observation of behaviour has traditionally been the mainstay of behavioural measurement. Therefore, researchers need to pay attention to psychometric properties, . B such as the inter-observer agreement, observational measures to ensure a reliable and valid measurement.

Among the many indexes in the Interobserver agreement, the percentage of match is the most popular. Its use persists despite repeated warnings and empirical evidence suggesting that due to its inability to account for chance, it is not the most psychometrically sound statistic for determining agreement among observers. Cohen`s kappa (1960) has long been proposed as the most psychometrically sound statistic for assessing interobserver matching. Kappa is described and the calculation methods are presented. Cohen, J. (1960). Conformity coefficient for nominal scales. Educational and Psychological Measurement, 20, 37-46. Hoge, R. D. (1985).

The validity of measures for direct observation of pupils` teaching behaviour. Review of Educational Research, 55, 469-483. Ciminero, A. R., Calhoun, K. S., & Adams, H. E. (eds.). (1986).

Handbuch der Verhaltensbewertung (2nd ed.). ==References==The 256 JABA research papers published between mid-1995 and mid-2005 were reviewed. Of these, 168 reported direct observation data of human behavior in free operation. Of these 168 articles, 93 (55%) reported continuously recorded data. Discontinuous methods of recording such behaviors have been replaced in published behavioral analysis research applications. Figure 1 shows the rates of use of continuous and discontinuous methods for recording observations in research articles over the 10 years examined. Of the 93 articles that reported continuously recorded human behavior, 88 (95%) reported frequency measurements (typically a response rate per minute). Measures of duration were reported in 33 articles (36%). The exact matching coefficients were calculated by dividing each session into 10-second intervals. At each interval, two observers could agree on the exact number of behaviors that occurred, agree that no behavior occurred, or disagree on the exact number of behaviors that occurred (disagree). .

The coefficients were calculated by dividing the number of agreements by the sum of the agreements plus the disagreements and multiplying them by 100%. Compared to extensive methodological studies of discontinuous recording, there was little research to understand, evaluate or control the choice of methods for assessing data quality with continuous recording. It was recommended to evaluate the correspondences between observers with continuous data (p.B. Hollenbeck, 1978; MacLean et al., 1985), but no methodological studies compared the different methods used. The results of this review suggest that continuous recording is a topical topic for methodological studies. Cone, J. D. (1977). The relevance of reliability and validity to the evaluation of behaviour. Behavioral Therapy, 8, 411-426. The interobserver agreement for the review process was assessed using a stratified process in which approximately 20% of the articles at each level were randomly selected for independent review by the second author. Percentage matching to the first three levels of review was calculated by dividing the number of agreements between reviewers by the number of articles reviewed by the two reviewers and converting this ratio to a percentage [...].