Infants' visual scanning of social scenes is influenced by both exogenously and endogenously driven shifts of attention. We manipulate these factors by contrasting individual infants' distribution of visual attention to the eyes relative to the mouth when viewing complex dynamic scenes with multiple communicative signals (e.g. peek-a-boo), relative to the same infant viewing simpler scenes where only single features move (moving eyes, mouth and hands). We explore the relationship between context-dependent scanning patterns and later social and communication outcomes in two groups of infants, with and without familial risk for autism. Our findings suggest that in complex scenes requiring more endogenous control of attention, increased scanning of the mouth region relative to the eyes at 7 months is associated with superior expressive language (EL) at 36 months. This relationship holds even after controlling for outcome group. In contrast, in simple scenes where only the mouth is moving, those infants, irrespective of their group membership, who direct their attention to the repetitive moving feature, i.e. the mouth, have poorer EL at 36 months. Taken together, our findings suggest that scanning of complex social scenes does not begin as strikingly different in those infants later diagnosed with autism.