a method of estimating the reliability of independent ratings; a measure of the degree to which two or more raters agree in their ratings on some behavior assessed in one or more subjects.
The degree to which multiple observers agree in their classification or quantification of behavior.
The extent to which two or more individuals agree. It addresses the consistency of the implementation of a rating system.
The degree to which different observers agree on their observations.
This method of reliability is used when a test includes performance tasks, or other items that need to be scored by human raters. Interrater reliability estimates the consistency, or dependability, of the scores produced by the human raters.
The reliability between measurements made by two different persons (or raters).