Corpus assisted discourse studies (CADS) results from the fluent synergy between discourse analysis (DS) and corpus linguistics (CL). Its primary goal, as Johansson (1991, 6) argues, is “the study of language(s) through corpora and other means.”
In this combination, DS explores language not “to find out about the ‘real world’ but rather to find out how ‘the real world’ is talked about” (McEnery and Hardie 2012, 135). Hence, DS contributes to this goal with inquisitive research questions and objects of study from a mainly qualitative standpoint. Corpus Linguistics, for its part and according to McEnery y Wilson (2001, 1) may be described as “in simple terms as the study of language based on examples of ‘real life’ language use”. CL nurtures the study with ample data and quantitative methods. As a result, much CADS work produces bottom-up or top-down explorations of real communicative exchanges that are of interest to different sections in society. Indeed, CADS has contributed to shedding light to a many a phenomenon associated with a large number of fields such as sociolinguistics (Baker 2010), forensic linguistics (Coterill in McCarthy y O’Keeffe, 2010), or translation (Calzada Pérez, 2018), to name but a few.
It is for this reason that we propose, as our main goal, to analyse on-the-fly language production about coronavirus with a CADS perspective. Coronavirus has entered our lives with anguished trepidation. Many electronic pages (and genres) have been drafted on the issue. It argued here that CADS is ready to provide a comparably expeditious analytical response to this linguistic emergency.
Following a CL methodology inspired by the Neo-Firthian school, the present paper proposes a bottom-up study of the CORONAVIRUS-WEB CORPUS (CWC). After a brief introduction with contextual data about the pandemic, extracted from specialised sites such those by the
European Centre for Disease Prevention and Control (https://www.ecdc.europa.eu/en) (in section 1), we delve into the description of the various components of CWS, with Wikipedia as a very prominent source of data (section 2). We then go on to put forward the methods and tools of compilation (e.g. SkethEngine) and analysis employed (such as statistics, wordlists, keywords, and concordances) (sections 3 and 4). Along these lines, an analysis of the most prominent linguistic nodes follows (in section 5), with the main aim to present how COVID-19 is being “talked about” (in the most typical of DS traditions) and which (didactic) lessons we can draw from our exploratory study.
Baker, Paul. 2010. Sociolinguistics and corpus linguistics. Edinburgh sociolinguistics. Edinburgh: Edinburgh University Press.
Calzada Pérez, María. 2018. “What is kept and what is lost without translation? A corpus-assisted discourse study of the European Parliament’s original and translated English”, Perspectives, 26:2, 277-291.
Johansson, Stig. 1991. “Computer corpora in English language research”. En English Computer Corpora, editado por Stig Johansson and Anna-Brita Stenström. Berlin, Boston: De Gruyter Mouton, 3-6.
Mccarthy, Michael, y O’Keeffe, Anne. 2010. Routledge Handbook of Corpus Linguistics. London; New York: Taylor & Francis.
McEnery, Tony and Andrew Hardie. 2012. Corpus Linguistics: Method, Theory and Practice. Cambridge: CUP.
McEnery, Tony and Wilson, Andrew. 2001. Corpus linguistics. 2nd ed. Edinburgh: Edinburgh University Press.