CATESOL Book Review: Corpus Linguistics for English Teachers: New Tools, Online Resources, and Classroom Activities by Eric Friginal
By Nicole Brun-Mercer and Kara Mac Donald
Corpus linguistics (CL) is growing, and it serves as a fantastic tool for English teachers to support their learners. As computers and web-based programs are increasingly available, classroom teachers can analyze students’ language production in large and mini-corpora. This permits a deeper understanding of learners’ vocabulary and language use patterns that can directly inform instruction.
Corpus Linguistics for English Teachers: Overview, Definitions, and Scope
A1: Corpus Linguistics for English Teachers: An Introduction
Section A1 opens with a brief overview of the purpose of the book and summary of some key volumes and researchers on CL. The author defines CL as the investigation of the systematic use of language, and for teachers it offers meaningful insight into learners’ naturally occurring language patterns and informs classroom instruction. Due to technological advances, the use of CL in ELT has expanded significantly making CL accessible to teachers, but there are many educators who are still not aware of how to use CL in the classroom.
The author emphasizes that the one objective of the book is to offer up-to-date information on the growing and changing field of CL for ESL and EFL teachers, as there exist only a limited number of texts for ELT educators. He provides a reading list of such books that do address CL for the language classroom. The second half of Section A1 covers the fundamentals of what is a corpus, what corpus linguistics consists of and different types of corpora: general/reference vs. specialized corpora, written vs. spoken corpora, annotated corpora, comparable vs. parallel corpora, monitor corpora, balanced corpora, and opportunistic corpora. The author provides a brief historical review of CL based on an identified text by Friginal and Hardy (2014)
A2: Connections: CL and Instructional Technology, CALL and Data-Driven Learning
In the opening of Section A2, the author describes how CL advanced and refocused grammar instruction in its early stages of use in fostering the implementation of research to classroom instruction with a renewed focus on form, socio-cultural discourse exploration, grammar awareness activities and language analysis tasks. As the use of CL continued to increase, the field saw more corpus informed textbooks that became more widely used.
Most recently CL has focused on three main areas: educational technology learning, computer-assisted learning, and data-driven instruction. Educational technology refers to the tools that support the teaching and learning of language elements, while technology integration focuses on how technology is used as a fundamental component of enhancing the learning process, with CL facilitating the collection and analysis of data to inform pedagogy.
Consequently, CL is situated within computer assisted language learning (CALL), as it has a focus on technological tools for instruction and learning, and corpus analysis for research-informed pedagogy. Some skills, like grammar and vocabulary, are more closely aligned with CL, while others, like speaking and listening, are more aligned with multimedia CALL. The author in the next section of the chapter provides an analysis chart for evaluating CALL and CL tools which present Chapelle (1998)’s framework of SLA hypotheses and the associated CALL and CL usages, with guidance in another table on standards for assessing CL tools and content based around assessment questions.
The final portion of Section A2 addresses data-driven learning (DDL), as a form of CL-based instruction, with a discussion of its theoretical foundations and perception of its possibly limited use with proficient learners, with an example related to the exploration of phraseology instruction and learning through CL.
A3: Analyzing and Visualizing English Using Corpora
The concordancer, the principal tool in CL, is a stand-alone software program or built into an online corpus database. It identifies words or phrases as they naturally occur in a language sample and can be used in various types of CL. Section A3 opens with a discussion of different corpus analyses. For word level analysis, researchers commonly focus on word frequency, concordance lines (sample sentences), collocations, and key word analysis. Word frequency is valuable in examining language varieties, register use and area for instructional focus, and is compared to baseline use of the lexical item/s (i.e., normalized frequency). Next the section describes corpus analysis at the phrase level (i.e., language chunks), explaining concordances, collocations, and multiword units and prefabricated chunks. Another possible focus in corpus analysis is vocabulary usage, key word analysis, lexico-syntactic measures (e.g., cohesion), and patterns of language re-occurrence. There are various samples of the above focus areas in teacher guidance boxes, external from the section discussion.
The last portion of Section A3 examines CL and visualization of linguistic data. The first discussion is word clouds and the ability to visually represent frequency of lexical occurrence and describes the ability to transform this data into figures/graphs that are easier to interpret for use with students. The author shares an account of his collaboration with another academic in the development of a POS-visualizer (i.e. Text X-Ray) and writing program to support higher education writing instruction, accompanied by various samples and images of the program’s use. The last portion of Section A3 discusses other visualizers and corpus tools and other areas of CL that intersect with online and contemporary socio-linguistic genres.
Tools. Corpora, and Online Resources
B1: Corpora and Online Databases
The author shares at the outset of Section B1 that he wishes to share free, readily available corpora, and a few useful ones available for purchase. He describes the value in using an existing available corpus, while also addressing questions teachers may have regarding publicly available and for-purchase corpora. Existing corpora addresses are International Corpus of Learner English (ICLE), Michigan Corpus of Upper-Level Student Papers (MICUSP), British Academic Written English (BAWE), Michigan Corpus of Academic Spoken English (MICASE), British Academic Spoken English Corpus (BASE), Vienna-Oxford International Corpus of English (VOICE), English as a Lingua Franca in Academic Contexts (ELFA), the Louvain International Database of Spoken English Interlanguage (LINDSEI), the European Corpus of Academic Talk (EuroCoAT), the TOEFL 2000 Spoken and Written Academic Language (T2k-SWAL), the International Corpus Network of Asian Learners of English (ICNALE), British National Corpus (BNC), American National Corpus (ANC), International Corpus of English (ICE), and online collections. Many of the corpora discussions have teacher guidance samples and recommendations.
B2: Collecting Your Own (Teaching) Corpus
Section B2 addresses the corpus collection process for written and spoken texts. The section opens with an overview discussion of how to collect a corpus yourself (i.e., DIY). Guidance is provided for the pre-collection stage, followed by the corpus collection stage. DIY corpora among students are most often collected by teachers and serve to inform teaching and learning and are often shared through published works.
The next portion of Section B2 provides suggestions for data collection from student written texts to explore lexico-syntactic complexity, expressions of stance, vocabulary use, comparison of informational content, and formal vs. informal syntactic features. A suggested sample for corpus collection is provided in a teacher support box, external from the section discussion. Another resource for self-collected corpus data can be obtained from published academic texts, which is matched with examples and supporting data figures.
The last portion of Section B2 discusses the collection and analysis of learner spoken corpora. The author shares examples of projects with descriptions of the processes used by the researchers, with tables and figures to support the readers’ understanding of the data collected. The section closes with a brief discussion of future avenues for classroom spoken corpora research.
B3: Corpus Tools, Online Resources, and Annotated Bibliography of Recent Studies
Friginal acknowledges the rapid changes in online offerings and publications and has designed Section B3 in such a way that much of the information remains valuable even several years after publication. Some of the resources listed in this section can no longer be accessed with the link provided, but they can be found through a simple internet search (e.g., Costas Gabrielatos’ annotated bibliography on published corpus-based discourse studies). Other resources are no longer available (e.g., Onlist and POS Tagger through Apps4ESL). Overall, however, this section provides ample information for the instructor and researcher to access online directories, Facebook groups, MOOCs, corpus taggers and parsers, concordancers, and other corpus-related tools. The final subsection provides annotated bibliographies of published corpus linguistics papers on (1) language teaching and learning, (2) learner writing, and (3) learner speech.
Many instructors just beginning to delve into corpus-based pedagogy might turn to the resources in this section after they have read Section C, which offers sample lessons and activities, in order to see how these corpus tools have been used in the classroom by other instructors. Reading some of the papers in the annotated bibliographies at the end of Section B3 would also offer instructors new to corpus linguistics insight into ways they could draw on these resources for their own instruction and research.
Corpus-Based Lessons and Activities in the Classroom
C1: Developing Corpus-Based Lessons in the Classroom
In Section C1, Friginal describes how he implemented corpus-based activities in an undergraduate EAP course at a School of Forestry. He compiled two corpora: one made up of student lab reports, the other of research articles from forestry and related journals. Friginal then asked students to compare the use of (1) reporting verbs, (2) linking adverbials, (3) passive structures, (4) pronouns, and (5) verb tenses.
Students examined frequency data provided by Friginal as well as concordance lines. They observed fewer tokens and types of reporting verbs in student writing, with the exception of show and find, which were both used much more frequently in student texts. Students also noted fewer linking adverbials, more present tense, less past tense, and more passive structures in student writing. The analysis of passive structures led to an activity in which students explored concordance lines for the research articles and found that professional writers in their field occasionally used we in papers by multiple authors but rarely I in single-authored articles.
Friginal includes a number of student comments about the benefits of these activities. They found the sample texts from professionals in the field particularly helpful and had specific grammar and vocabulary forms to focus on as a means of making their own writing in the field of forestry more similar to the writing of published experts.
This section would be of particular interest to an instructor teaching Writing Across the Curriculum or other content-based EAP course.
C2: CL and Vocabulary Instruction
Section C2 begins with an overview of the ways in which corpus linguistics has informed vocabulary instruction. Friginal notes that corpus-informed dictionaries provide valuable examples and frequency data so learners understand not only what a word or phrase means but also how often its various meanings or word forms are used and in which registers. In addition, corpus linguistics has shed light on semantic prosody (i.e., the positive or negative associations words have as demonstrated by highly frequent collocations) and has facilitated the creation of extensive word lists (e.g., Academic Word List, General Service List).
The remainder of Section C2 details three corpus-based vocabulary lessons. The first, authored by Jennifer Roberts, is a content-based ESP lesson that aims to improve student recognition and production of vocabulary from authentic texts. The instructor is encouraged to use WordandPhrase.info to create a word list from a course text. (Note that WordandPhrase.info is now embedded into the Corpus of Contemporary American English, COCA, https://www.english-corpora.org/coca/.) Activities can be created to provide opportunities for students to analyze and practice using the vocabulary from the word list. For instance, learners can work directly with WordandPhrase.info to examine concordance lines and excerpts from their text and determine a word’s part-of-speech, definition, synonyms, and collocates. Once they have a better understanding of the word’s meaning and use, Roberts suggests having learners create their own original sentences with the word.
The second lesson, written by Jonathan McNair, describes how to use a Japanese-English bilingual corpus in a high beginner or pre-intermediate EFL class to compare near synonyms. Students are asked to explore near synonyms (e.g., start/begin, warm/hot) in WebParaNews (WPN) to determine how idiomatic language can dictate use (e.g., start the car rather than begin the car). After this guided activity, the instructor asks students to use WPN to self-edit inappropriate word choice which the instructor has highlighted in their papers. The lesson ends with a separate COCA lesson in which students are shown how to search for the correct preposition in set phrases.
The final lesson, by Robert Nelson, explores how to use VocabProfile from LexTutor to improve essay writing. In this lesson, the target student group is studying for the TOEFL exam. Because less frequent vocabulary has been shown to correlate with higher scores on the TOEFL independent writing section, this lesson is designed to help students use less frequent (more academic) vocabulary. Students copy and paste their essays into VocabProfile to determine the percentage of words that are among the (1) 1,000 most commonly used words in English, (2) 1,000 next most common words, (3) Academic Word List words (AWL), and (4) other words (infrequent non-academic words and proper nouns). They compare their results with the results from a paper that received a higher TOEFL score. For homework, they edit their paper by replacing some highly frequent words with words from the AWL.
C3: CL and Grammar Instruction
This section on corpus linguistics and grammar instruction begins with an overview of several notable corpus-based grammar resource books: Longman Grammar of Spoken and Written English (Biber et al., 1999) and Real Grammar: A Corpus-Based Approach to Language (Conrad & Biber, 2009).
Following this introduction are six grammar-related lessons. The first lesson, by Janet Beth Randall, is a homework assignment for college-level EAP students. Learners are asked to use COCA to investigate collocates and sample sentences for the verbs prove and illustrate. Finally, they complete a fill-in-the-blank activity in which they must conjugate either prove or illustrate and then write their own sentences with the two verbs.
The second lesson, authored by Sean Dunaway, focuses on telic verbs (i.e., verbs such as kick and clean that establish completion), atelic verbs (i.e., verbs such as play and swim in which no completion occurs), and stative verbs (i.e., not an event or action). After an initial set of activities to introduce students to these verb categories, the instructor asks students to find examples of the verbs in the progressive. A “further projects” section encourages students to continue their research into verb tense and aspect using both COCA and the Corpus of Global Web-Based English (GloWbE).
Quantifiers in spoken and academic registers is the topic of the third lesson, contributed by Marsha Walker. Students use COCA to compare the frequency of quantifiers (e.g., numerous, lots) in spoken texts and academic texts. They note examples and determine whether each quantifier can be used with count nouns, non-count nouns, or both.
The next lesson, by Tyler Heath, illustrates how to use AntConc to find and analyze the function of linking adverbials in an ESP course. Students are given a collection of legal texts to run through AntConc and asked to note down how often and where in a sentence each linking adverbial (e.g., moreover) occurs. Students then examine the text surrounding each adverbial to determine its function. The lesson includes a helpful list of linking adverbials categorized by function as well as a handout for a related homework assignment.
Transitions are also the topic of the fifth lesson in section C3 by Lena Emeliyanova. This lesson focuses on linking adverbials of addition, causality, and contrast/concession in the Michigan Corpus of Upper-Intermediate Student Papers (MICUSP). Students are asked to find frequency information, position in a sentence, and sample sentences, then to write their own sentences with some of the transition words.
The sixth lesson, contributed by Matthew Nolen, is a long-term (nine-week) project for students who have already been introduced to several corpus tools. For this project, students keep a vocabulary journal in which they choose a word or phrase they wish to explore, find the part of speech of the word, identify its frequency in a given corpus (the Michigan Corpus of Academic Spoken English, MICASE, in the example provided), find the most frequent collocates, determine patterns in collocations (e.g., the word is often followed by the passive voice), and write two sentences using the word. Nolen includes a helpful handout with step-by-step instructions and tips, along with a rubric for assessing the journals.
C4: CL and Teaching Spoken/Written Discourse
Friginal begins section C4 with a discussion of corpus linguistics to explore register. Corpus linguistics can help elucidate differences between spoken and written texts as well as other situational factors (e.g., relationships between speakers, channel of communication). Friginal highlights work carried out to investigate (1) World Englishes, (2) speech between international teaching assistants and students, as well as (3) customer service call center interactions.
In the first lesson of Section C4, contributed by Maxi-Ann Campbell, students use COCA to compare the use of doing good and doing well in spoken and academic written texts. Students are asked to find frequencies and examples for these two phrases and to identify the part of speech of good and well in specific instances. Campbell’s goals for this lesson include discussing differences between spoken and written registers and showing students that English is not always used in a way that conforms to prescriptive grammar rules.
In the second lesson, Tia Gass demonstrates how to use Tex Lex Compare to explore persuasion in political speeches. This lesson is particularly well suited for a public speaking course, civics course, composition course, or political science course. Students compare two political speeches, investigating the most frequent words appearing in both texts as well as the most frequent words appearing in one, but not the other. Their goal is to find patterns and to analyze differences based on political affiliation, era, or other situational features. Finally, students are encouraged to reflect on how this might inform their own word choices when speaking publicly.
Professional academic writing is the topic of the third lesson, authored by Peter Dye. Students first use COCA to examine linguistic similarities and differences in different text types (spoken and written). Then AntConc is used to create a word list from a collection of instructor-provided texts. Students are asked to find the most frequent words and their collocates and then identify patterns of writing (e.g., high frequency of past tense in fiction texts and of passive voice in research articles).
The last lesson, by Cynthia Berger, describes how to use a text visualization program in a writing course. In this lesson, Text X-Ray helps students self-edit their texts by identifying words of the same part-of-speech (e.g., identify all the verbs and then check subject-verb agreement). It is also used to compare part-of-speech frequencies in one text to those in a different corpus. For instance, how does a published essay differ from a corpus of student papers in terms of the number of personal pronouns? Finally, Text X-Ray can identify which words from a target vocabulary list were used in papers that the students had written. Students then create a word cloud illustrating the most frequent words in their writing and reflect on (1) what words they are using, (2) why they are using those words, and (3) how they might incorporate other words into their writing.
The lessons throughout Section C are useful for instructors looking for innovative ways to integrate corpus linguistics into their classrooms. Each lesson ends with the author’s perspective on the challenges, benefits, and future directions of corpus linguistics in the classroom. These comments are particularly valuable for instructors who are new to corpus tools because they highlight limitations (e.g., some tools can be overwhelming for students, particularly at lower proficiency levels of English or digital literacy) and tips (e.g., when using corpus tools in the classroom, focus on one specific part of the website to avoid overwhelming students).
It should be noted, however, that the lessons vary in terms of procedure detail. Instructors new to corpus linguistics will undoubtedly need time to consider how they might adapt these lessons to their own teaching contexts and then to try out each corpus tool.
The book is a great resource for teachers who are new to CL and would like an overview of what has been, and can be, done. The author uses very accessible language to describe the foundational underpinnings of today’s tools for CL. Each section has teacher support pull-out sections to model what was discussed in the main text. The model lessons in the second portion of the text are very useful, but some may seem too complex depending on the learner population.