The prognosis of diffuse fibrotic lung disease (DFLD) is known to be variable, but there is a paucity of literature on prognostic markers independent of precise clinical diagnosis. This study aimed to assess the mortality prediction of three high-resolution computed tomography (HRCT) scores in a heterogeneous population of patients with DFLD. A large radiologist and physician reader group was used to determine agreement among readers of varying background in applying these scores.Institutional review board approval was obtained. Informed consent was waived for this retrospective study. Eighty HRCTs in 68 patients with DFLD (35 men, mean age 72.9 years) were evaluated retrospectively by 18 readers. Readers included thoracic and general radiologists, respiratory physicians and radiology trainees. Features scored were honeycombing, extent of disease and traction bronchiectasis. Demographics, diagnosis and pulmonary function data were collected. Patients were categorised as having either idiopathic pulmonary fibrosis, fibrosis relating to connective tissue disease, 'miscellaneous' DFLD or 'undefined', where no single entity was felt entirely or confidently to explain the pulmonary disease. Agreement was assessed using the kappa statistic. Associations with mortality were analysed using the Cox marginal model.Agreement was better for honeycombing (kappa = 0.44) and disease extent (kappa = 0.47) than traction bronchiectasis (kappa = 0.24). Honeycombing presence (P < 0.0005) and disease extent >30% (P = 0.002) predicted increased mortality independent of clinical diagnosis. Traction bronchiectasis was non-predictive. Clinical diagnosis was not an independent predictor, but age was independently associated with mortality (P = 0.004). Pulmonary function data were only available for 43 patients, but in a limited subanalysis, the diffusion capacity of carbon monoxide was independently predictive of increased mortality (P = 0.005).The presence of honeycombing and a greater extent of fibrotic lung disease predict increased mortality independent of clinical diagnosis. Our large, mixed-expertise reader group shows moderate interobserver agreement, comparable with agreement values for these scores in the literature.