Search results for "Inter-rater reliability"
showing 10 items of 33 documents
Assessment of the Tricuspid Valve Morphology by Transthoracic Real-Time-3D-Echocardiography
2005
Aim: To demonstrate the feasibility of transthoracic three-dimensional real-time echocardiography (3D-TTE) supplemental to routine assessments of the tricuspid valve and to analyze interrater agreement. Methods: Twenty healthy subjects and 74 patients with right ventricular failure were examined with conventional 2D and additionally 3D-TTE (SONOS 7500, Philips, Netherlands). The 3D exams were performed and recorded by one of two raters. The recordings were evaluated offline and independently by both raters for visualization of morphological and functional features of the tricuspid valve according to a subjective 3-point scale. Statistical analyses were performed for interrater agreement and…
The reliability, distribution, and responsiveness of the Postural Control and Balance for Stroke Test
2005
Abstract Pyoria O, Talvitie U, Villberg J. The reliability, distribution, and responsiveness of the Postural Control and Balance for Stroke Test. Objectives To determine the inter- and intrarater reliability of the Postural Control and Balance for Stroke (PCBS) test and to assess its distribution and responsiveness to changes during 1-year follow-up. Design Intrarater reliability of the PCBS test was assessed by comparing the repeat ratings of videotaped test performances by each of the 5 raters. Interrater reliability was assessed by comparing the ratings of the videotaped test performances between the raters. Setting Hospital neurologic ward and outpatient department of physiotherapy as w…
The Elephant in the Machine: Proposing a New Metric of Data Reliability and its Application to a Medical Case to Assess Classification Reliability
2020
In this paper, we present and discuss a novel reliability metric to quantify the extent a ground truth, generated in multi-rater settings, as a reliable basis for the training and validation of machine learning predictive models. To define this metric, three dimensions are taken into account: agreement (that is, how much a group of raters mutually agree on a single case)
Concordance Analysis
2011
Background In this article, we describe qualitative and quantitative methods for assessing the degree of agreement (concordance) between two measuring or rating techniques. An assessment of concordance is particularly important when a new measuring technique is introduced.
Statement validity assessment: Inter-rater reliability of criteria-based content analysis in the mock-crime paradigm
2005
Methods. Three raters were trained in CBCA. Subsequently, they analysed transcripts of 102 statements referring to a simulated theft of money. Some of the statements were based on experience and some were confabulated. The raters used 4-point scales, respectively, to judge the degree to which 18 of the 19 CBCA criteria were fulfilled in each statement. Results. The analysis of rater judgment distributions revealed that, with judgments of individual raters varying only slightly across transcripts, the weighted kappa coefficient, the product-moment correlation, and the intra-class correlation were inadequate indices of reliability. The Finn-coefficient and percentage agreement, which were cal…
CHEESE HARDNESS ASSESSMENT BY EXPERTS AND UNTRAINED JUDGES
2001
Although expert assessment of food characteristics is recognized as a key step in product development, the use of consumer based measurements is sometimes recommended as an equivalent to the experts. From cognitive psychology, support of the role of perceptual learning is found in some instances, although this could not be relevant in others. To address this point performance analysis of experts and untrained panelists in cheese texture evaluation was carried out. Neither the untrained panelists nor the experts were familiar with either the scales or the kind of cheese. The same Cheddar cheese was given to 44 untrained subjects in three trials to assess hardness. The results showed that the…
Ensuring content validity of psychological and educational tests – the role of experts
2020
Many test developers try to ensure the content validity of their tests by having external experts review the items in terms of relevance, difficulty, clarity, and so on. Although this approach is widely accepted, a closer look reveals there are several pitfalls that need to be avoided if experts’ advice is to be truly helpful. First, I offer a classification of tasks experts are given by test developers as reported on in the literature dealing with procedures of drawing on experts’ advice. Second, I review a sample of reports on test development (N = 72) to identify the common current procedures for selecting and consulting experts. Results indicate that often the choice of experts seems to…
Assessing learners’ writing skills in a SLA study: Validating the rating process across tasks, scales and languages
2014
There is still relatively little research on how well the CEFR and similar holistic scales work when they are used to rate L2 texts. Using both multifaceted Rasch analyses and qualitative data from rater comments and interviews, the ratings obtained by using a CEFR-based writing scale and the Finnish National Core Curriculum scale for L2 writing were examined to validate the rating process used in the study of the linguistic basis of the CEFR in L2 Finnish and English. More specifically, we explored the quality of the ratings and the rating scales across different tasks and across the two languages. As the task is an integral part of the data-gathering procedure, the relationship of task p…
Intra- and Inter-Rater Reliability of Strength Measurements Using a Pull Hand-Held Dynamometer Fixed to the Examiner's Body and Comparison with Push …
2021
Hand held dynamometers (HHDs) are the most used method to measure strength in clinical sitting. There are two methods to realize the assessment: pull and push. The purpose of the present study was to evaluate the intra- and inter-rater reliability of a new measurement modality for pull HHD and to compare the inter-rater reliability and agreement of the measurements. Forty healthy subjects were evaluated by two assessors with different body composition and manual strength. Fifteen isometric tests were performed in two sessions with a one-week interval between them. Reliability was examined using the intra-class correlation (ICC) and the standard error of measurement (SEM). Agreement between …
Effects of Interrater Reliability of Psychopathologic Assessment on Power and Sample Size Calculations in Clinical Trials
2002
Although rater training is increasingly used to improve the quality of the investigated outcome parameters, the reliability of assessments is not perfect. Thus, empirical reliability estimates should be used instead of theoretically assumed perfect reliability. Implications of the reliability of psychiatric assessments for sample size and power calculations in clinical trials are presented. The theoretical basis of sample size and power calculations using empirical reliability scores is delineated. Examples from contemporary research on schizophrenia and depression are used to illustrate several implications for study design and interpretation of results. The tremendous impact of the lack o…