Case definition has long been an issue for comparability of results obtained for musculoskeletal pain prevalence, however the test-retest reliability of questions used to determine joint pain prevalence has not been examined. The objective of this study was to determine question reliability and the impact of question wording, ordering and the time between questions on responses.A Computer Assisted Telephone Interviewing (CATI) survey was used to re-administer questions collected as part of a population-based longitudinal cohort study. On two different occasions questions were asked of the same sample of 203 community dwelling respondents (which were initially randomly selected) aged 18 years and over at two time points 14 to 27 days apart (average 15 days). Reliability of the questions was assessed using Cohen's kappa (κ) and intraclass correlation coefficient (ICC) and whether question wording and period effects existed was assessed using a crossover design.The self-reported prevalence of doctor diagnosed arthritis demonstrated excellent reliability (κ = 0.84 and κ = 0.79 for questionnaires 1 and 2 respectively). The reliability of questions relating to musculoskeletal pain and/or stiffness ranged from moderate to excellent for both types of questions, that is, those related to ever having joint pain on most days for at least a month (κ = 0.52 to κ = 0.95) and having pain and/or stiffness on most days for the last month (κ = 0.52 to κ = 0.90). However there was an effect of question wording on the results obtained for hand, foot and back pain and/or stiffness indicating that the area of pain may influence prevalence estimates.Joint pain and stiffness questions are reliable and can be used to determine prevalence. However, question wording and pain area may impact on estimates with issues such as pain perception and effect on activities playing a possible role in the recall of musculoskeletal pain.