OBJECTIVE: To assess the reliability of two instruments designed for critical appraisal of economic evaluations: the Quality of Health Economic Studies (QHES) scale and the Pediatric Quality Appraisal Questionnaire (PQAQ). METHODS: Thirty published articles were chosen at random from a recent bibliography of economic evaluations in health promotion. The quality of each of these studies was assessed independently by two raters using each of the two instruments. Inter-rater reliability and the agreement between the instruments were measured using an intraclass correlation coefficient (ICC). Cronbach's generalizability theory was also used to assess the sources of variation in quality scores of the studies and to indicate where improvements in reliability could best be made. RESULTS: Inter-rater reliability was excellent for both instruments (ICC = 0.81 for the QHES and 0.80 for the PQAQ). Agreement between the instruments varied (ICC = 0.77 for rater 1 and 0.56 for rater 2). The biggest source of variation in the scores assigned to the articles was the quality of the study (56% of total variance). Conventional measurement error explained 31% of the total variance. Variation due to rater (< 0.1%) and measurement instrument (1.8%) was very low. CONCLUSIONS: The results suggest that the two instruments perform equally well. Choice of instrument can therefore be based on other criteria--simplicity and speed of application in the case of one, and detail in the information provided in the case of the other. There is little improvement in reliability to be gained from using more than one rater or more than one assessment of quality.