Background: We aimed to propose a risk model based on MM-Atten-CNN for predicting esophageal fistula in patients with esophageal cancer (EC) from computerized tomography (CT) -based radiomics. Methods: EC patients who didn’t received esophageal surgery between July 2014 and August 2019 were collected. Of these, 186 patients (cases) who developed esophageal fistula were enrolled and compared with 372 controls (1:2 matched with the diagnosis time of EC, sex, marriage, and race). All 558 patients were divided into training set (n = 390) and validation set (n = 168) randomly. The MM-Atten-CNN risk model was trained over 2D slices from nine views of planes, where there were three patches of contextual CT, segmented tumor and neighbouring information in each view. In the training set (130 cases and 260 controls), data augmentation was performed including pixel shifting [-10, -5, +5, +10] and rotation [-10, +10]. In total, there were (130+260) *16*2 = 12480 subjects used for training. Finally, the risk model was validated in the validation set (56 cases and 112 controls) and measured by accuracy (acc), sensitivity (sen), and specificity (spe). Results: The developed risk model achieved (acc, sen, spe) of (0.839, 0.807, 0.926), which were more predictive for the occurrence of esophageal fistula when compared to CNN models using single coronal view (acc 0.763, sen 0.581, spe 0.837), multi-view 2D contextual CT slices (acc 0.779, sen 0.656, spe 0.896), and 3D CNN using contextual CT volumes (acc 0.781, sen 0.689, spe 0.852). Conclusions: MM-Atten-CNN CT-based model improved the performance of esophageal fistula risk prediction, which has the potential to assist individualized stratification and treatment planning in EC patients.