Limits...
External validation of a measurement tool to assess systematic reviews (AMSTAR).

Shea BJ, Bouter LM, Peterson J, Boers M, Andersson N, Ortiz Z, Ramsay T, Bai A, Shukla VK, Grimshaw JM - PLoS ONE (2007)

Bottom Line: Construct validity was shown by AMSTAR convergence with the results of the global assessment: Pearson's R 0.72 (95% CI: 0.53 to 0.84).For the AMSTAR total score, the limits of agreement were -0.19+/-1.38.Further validation of AMSTAR is needed to assess its validity, reliability and perceived utility by appraisers and end users of reviews across a broader range of systematic reviews.

View Article: PubMed Central - PubMed

Affiliation: Community Information and Epidemiological Technologies, Ottawa, Ontario, Canada. bshea@ciet.org

ABSTRACT

Background: Thousands of systematic reviews have been conducted in all areas of health care. However, the methodological quality of these reviews is variable and should routinely be appraised. AMSTAR is a measurement tool to assess systematic reviews.

Methodology: AMSTAR was used to appraise 42 reviews focusing on therapies to treat gastro-esophageal reflux disease, peptic ulcer disease, and other acid-related diseases. Two assessors applied the AMSTAR to each review. Two other assessors, plus a clinician and/or methodologist applied a global assessment to each review independently.

Conclusions: The sample of 42 reviews covered a wide range of methodological quality. The overall scores on AMSTAR ranged from 0 to 10 (out of a maximum of 11) with a mean of 4.6 (95% CI: 3.7 to 5.6) and median 4.0 (range 2.0 to 6.0). The inter-observer agreement of the individual items ranged from moderate to almost perfect agreement. Nine items scored a kappa of >0.75 (95% CI: 0.55 to 0.96). The reliability of the total AMSTAR score was excellent: kappa 0.84 (95% CI: 0.67 to 1.00) and Pearson's R 0.96 (95% CI: 0.92 to 0.98). The overall scores for the global assessment ranged from 2 to 7 (out of a maximum score of 7) with a mean of 4.43 (95% CI: 3.6 to 5.3) and median 4.0 (range 2.25 to 5.75). The agreement was lower with a kappa of 0.63 (95% CI: 0.40 to 0.88). Construct validity was shown by AMSTAR convergence with the results of the global assessment: Pearson's R 0.72 (95% CI: 0.53 to 0.84). For the AMSTAR total score, the limits of agreement were -0.19+/-1.38. This translates to a minimum detectable difference between reviews of 0.64 'AMSTAR points'. Further validation of AMSTAR is needed to assess its validity, reliability and perceived utility by appraisers and end users of reviews across a broader range of systematic reviews.

Show MeSH

Related in: MedlinePlus

Bland and Altman limits of agreement plot for AMSTAR scores.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2131785&req=5

pone-0001350-g001: Bland and Altman limits of agreement plot for AMSTAR scores.

Mentions: We calculated an overall agreement score using the weighted Cohen's kappa, as well as one for each item [52] (Table 1). Bland and Altman's limits of agreement methods were used to display agreement graphically [53], [54] (Fig. 1). We calculated the percentage of the theoretical maximum score. Pearson's Rank correlation coefficients were used to assess reliability of this total score. For comparisons of rating the methodological quality we calculated chance-corrected agreement (using kappa) and chance-independent agreement (using Φ) [52], [55], [56]. We accepted a correlation of >0.66. We further scrutinized items and reviews with kappa scores below 0.66 [52]. Kappa values of less than 0 rate as less than chance agreement; 0.01–0.20 slight agreement; 0.21–0.40 fair agreement; 0.41–0.60 moderate agreement; 0.61–0.80 substantial agreement; and 0.81–0.99 almost perfect agreement [52], [57]. We calculated PHI Φ for each question [55], [58].


External validation of a measurement tool to assess systematic reviews (AMSTAR).

Shea BJ, Bouter LM, Peterson J, Boers M, Andersson N, Ortiz Z, Ramsay T, Bai A, Shukla VK, Grimshaw JM - PLoS ONE (2007)

Bland and Altman limits of agreement plot for AMSTAR scores.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2131785&req=5

pone-0001350-g001: Bland and Altman limits of agreement plot for AMSTAR scores.
Mentions: We calculated an overall agreement score using the weighted Cohen's kappa, as well as one for each item [52] (Table 1). Bland and Altman's limits of agreement methods were used to display agreement graphically [53], [54] (Fig. 1). We calculated the percentage of the theoretical maximum score. Pearson's Rank correlation coefficients were used to assess reliability of this total score. For comparisons of rating the methodological quality we calculated chance-corrected agreement (using kappa) and chance-independent agreement (using Φ) [52], [55], [56]. We accepted a correlation of >0.66. We further scrutinized items and reviews with kappa scores below 0.66 [52]. Kappa values of less than 0 rate as less than chance agreement; 0.01–0.20 slight agreement; 0.21–0.40 fair agreement; 0.41–0.60 moderate agreement; 0.61–0.80 substantial agreement; and 0.81–0.99 almost perfect agreement [52], [57]. We calculated PHI Φ for each question [55], [58].

Bottom Line: Construct validity was shown by AMSTAR convergence with the results of the global assessment: Pearson's R 0.72 (95% CI: 0.53 to 0.84).For the AMSTAR total score, the limits of agreement were -0.19+/-1.38.Further validation of AMSTAR is needed to assess its validity, reliability and perceived utility by appraisers and end users of reviews across a broader range of systematic reviews.

View Article: PubMed Central - PubMed

Affiliation: Community Information and Epidemiological Technologies, Ottawa, Ontario, Canada. bshea@ciet.org

ABSTRACT

Background: Thousands of systematic reviews have been conducted in all areas of health care. However, the methodological quality of these reviews is variable and should routinely be appraised. AMSTAR is a measurement tool to assess systematic reviews.

Methodology: AMSTAR was used to appraise 42 reviews focusing on therapies to treat gastro-esophageal reflux disease, peptic ulcer disease, and other acid-related diseases. Two assessors applied the AMSTAR to each review. Two other assessors, plus a clinician and/or methodologist applied a global assessment to each review independently.

Conclusions: The sample of 42 reviews covered a wide range of methodological quality. The overall scores on AMSTAR ranged from 0 to 10 (out of a maximum of 11) with a mean of 4.6 (95% CI: 3.7 to 5.6) and median 4.0 (range 2.0 to 6.0). The inter-observer agreement of the individual items ranged from moderate to almost perfect agreement. Nine items scored a kappa of >0.75 (95% CI: 0.55 to 0.96). The reliability of the total AMSTAR score was excellent: kappa 0.84 (95% CI: 0.67 to 1.00) and Pearson's R 0.96 (95% CI: 0.92 to 0.98). The overall scores for the global assessment ranged from 2 to 7 (out of a maximum score of 7) with a mean of 4.43 (95% CI: 3.6 to 5.3) and median 4.0 (range 2.25 to 5.75). The agreement was lower with a kappa of 0.63 (95% CI: 0.40 to 0.88). Construct validity was shown by AMSTAR convergence with the results of the global assessment: Pearson's R 0.72 (95% CI: 0.53 to 0.84). For the AMSTAR total score, the limits of agreement were -0.19+/-1.38. This translates to a minimum detectable difference between reviews of 0.64 'AMSTAR points'. Further validation of AMSTAR is needed to assess its validity, reliability and perceived utility by appraisers and end users of reviews across a broader range of systematic reviews.

Show MeSH
Related in: MedlinePlus