Data citation in linguistics publications Conference Paper uri icon


  • The creation and dissemination of reproducible research is receiving ever-growing attention in discussions on best practices in publication and education. A key element of these practices is appropriate citation of data sources. In this presentation we describe one scholar-led initiative to increase awareness of the value of data citation in scholarly communication across the discipline of linguistics.  Practices in linguistics are varied; it is primarily a data-driven social science, in which inferences about the properties of language, human cognition, cultures and societies are drawn from observations of language. The primary data sets underlying the field are records of these observations in the form of, for instance, texts, audio/video recordings and annotations. While linguists have always relied on language data, they have not always facilitated access to those data in publications (Berez-Kroeker et al. 2018). A great deal of published linguistic research is therefore not reproducible, either in principle or in practice. A primary factor hindering reproducible research in linguistics is the lack of standards for data citation in scholarly publishing. Lacking such standards, the field continues to emphasize linguistic analyses over linguistic data, and as a result, linguists have little incentive to make the data behind research publications accessible. Funded by the US National Science Foundation, since 2015 we have endeavored to develop and promote standards for citing data. We are an international (Norway, US, Canada, Australia) team of scholars including linguistic data practitioners, scholarly communication librarians, and digital archivists. In this presentation we discuss our coordinated efforts over the past four years, including: Network building 3 international workshops to identify technical and sociological barriers to research data citation in linguistics publications; The formation of the Linguistics Data Interest Group ( within the Research Data Alliance, with nearly 100 members from the international linguistics scholarly community. Outreach activities Short-form technical courses and presentations offered through the Linguistic Society of America. Deliverable products An open-access position paper (Berez-Kroeker et al. 2018). The Austin Principles of Data Citation in Linguistics (, which annotates the FORCE11 Joint Declaration of Data Citation Principles (Data Citation Synthesis Group 2014) for linguistic scholarship. Guidelines for citing linguistic data to be shared in late 2019 with linguistics journal editors and stylesheet curators. The open-access Open Handbook of Linguistic Data Management (MIT Press Open, est. publication date 2020).  With this presentation, we aim to encourage practitioners in other fields to initiate similar advancements, and to encourage decision-makers and publishers to actively collaborate with and support scholar-led initiatives working toward better research practices. 


  • Andreassen, Helene N
  • Berez-Kroeker, Andrea
  • Collister, Lauren
  • Conzett, Philipp
  • Cox, Christopher
  • De Smedt, Koenraad
  • Gawne, Lauren
  • McDonnell, Bradley