Chapter one lays the ground for the broad discussion of language program evaluation presented in subsequent chapters, with some other fundamental terminology defined in chapter two. The author starts by defining applied linguistics, evaluation (opposed to assessment and testing a program). Then, with the existing need for language programs to be evaluated, a model that is contextually situated, consisting of seven steps, is outlined. This Context-Adaptive Model (CAM) first requires the identification of the audience and the objectives of the evaluation. Next, an inventory of the context and the language program’s essential features needs to be conducted, and a guiding inventory is offered. The third step offers guidance on how to construct a framework for the collection and analysis of data, and an example thematic framework is provided. Fourth, considerations for how to collect the necessary information are described. Fifth, the data collection process is discussed, followed by analysis of the data, with the evaluation report being the final step.
Chapter two examines the historical background of language program evaluation, with a description of the debate between positivistic, quantitative research and naturalistic, qualitative research. With the theoretical underpinning of the two major approaches to research established, the author summarizes the approaches to language program evaluation the 1960s and 1970s, followed by more recent changes in the field. In sum, the author highlights how language program evaluation has moved away from strictly controlled qualitative research geared toward assessing student achievement of program outcomes to having a focus on explaining and analyzing language programs’ processes, which lean more toward qualitative or mixed-methods approaches.
With any research the topic of validity is important, and chapter three is dedicated to this both on a theoretical and practical level. First, the issue of validity from a traditional positivistic viewpoint is described, as well as other positivistic validity conceptions, with discussion of the types of threats to internal and external validity. There is a lot packed in approximately ten pages, and the summary of validity from the positivistic perspective is a useful wrap up to the discussion. Then, validity from the naturalistic perspective is defined as the connection between how researchers give account of something and their reality. Internal and external validity are substituted with descriptive, interpretative, theoretical, generalizability, evaluative validity. The frame of trustworthiness criteria is also presented as means to assess validity through credibility, transferability, dependability, and confirmability. Procedures for ensuring and assessing naturalistic validity are discussed: triangulation, multiple perspective negotiation, utility criterion, and authenticity criteria, which is supplemented by a table with additional techniques for naturalistic validity. Again, a summary of the dense description of naturalistic validity on the previous ten plus pages is highly useful. The chapter closes with a comparison summary of the validity from the two research perspectives.
Models for conducting positivistic research in the form of experimental and quasi-experimental are offered in chapter 4. This chapter supports the goals of language program evaluation that seek to identify how well the program is operating and are based on quantitative data with an experimental and control group. In the case of language program evaluation, the treatment is the instructional curriculum. The experimental group consists of the particular school’s students, and the control group are students to which the school’s students are being compared. In addressing true experimental and quasi-experimental research designs, the author outlines three model structures of when a control is used or not, the allocation of students to groups and how many times data is collected. He uses example contexts (e.g. ESL school and implementation of technology mediated instruction) for each of the six research situations, which makes the discussion tangible and relatable but also functions as designs for data collection for positivistic research.
Next, chapter five offers naturalistic models for conducting program evaluation, which are based on description and interpretation of the reality or thing observed. The reader is also provided details of each evaluation model regarding the role of the evaluator and all the aspects such as evaluation goals, scope of observation and the nature of the data gathering techniques. Additionally, this chapter features examples (vignettes) of program evaluation for each model in applied linguistics contexts, which allows for a better comprehension of them. Remarkable differences between models and diversity in terms of perspective and methodology are also considered in this chapter (e.g. goal-oriented vs. goal-free evaluation; Smith´s metaphors for naturalistic evaluation).
As a further step towards reaching the program evaluation purposes, chapter six focuses on the gathering and analysis of quantitative data. For this purpose, norm-referenced (NRT) and criterion-referenced tests (CRT) are presented as the two major instruments to evaluate programs considering sensitivity and representativeness in terms of the context characteristics and the evaluation purposes. Additionally, the chapter highlights the most common data analyses that can be employed to process the data once it is collected through the tests, be they NRT or CRT. Among these analyses, this chapter focuses on a wide range of statistical procedures and techniques such as the chi-square, effect-size, standardized change-score, analysis of covariance (ANCOVA) and valued-added, while providing samples of analysis that offer insight into particular details such as levels of significance. All these elements allow program evaluators to have the knowledge and tools to analyze quantitative data in most applied linguistic contexts.
Chapter seven is devoted to the process and mechanisms of gathering and analysis of qualitative data. As in the previous chapter, a thorough description of the diverse methodologies to collect qualitative data for program evaluation are provided along with the steps to analyze it. In this chapter, observation is seen as one of the most important mechanisms to describe any program in a variety of ways, at the same time, it provides the reader an observation checklist and observation form samples to help the observer keep the objectiveness through the program evaluation process. This chapter also poses the importance of interviews, which range from individual or group unstructured ones (e.g. informal conversational interview) to more structured types (e.g. standardized open-ended interview), and whose key and common ingredient is listening. Questionnaires, logs, retrospective narratives and program documents are also addressed in chapter seven as other important techniques for data gathering. Regarding the analysis of qualitative data, the use of preliminary themes, organization, systematization, coding and classification/reduction of the data are considered quintessential aspects to obtain clear and objective interpretations and conclusions. As a final remark, the author considers it is necessary to return to the data before arriving at conclusions.
In chapter eight readers can gain insight into the possibilities of combining positivistic and naturalistic program evaluation designs with quantitative and qualitative data and analyses. The chapter features an analysis of the compatibilist and the incompatibilist perspectives, which basically illustrates theoretically and methodologically to what extent both designs can be (or can not be) mixed, since not all evaluations can employ both designs. This analysis is enriched with samples of program evaluations developed by employing mixed designs with multiple strategies. As in previous chapters, practical guidelines and evidence are provided to reduce the risks of methodological mistakes in the combination of both designs, thus avoiding contradictory or ambiguous results.
Finally, chapter nine begins by summarizing the main theoretical and practice issues surrounding language program evaluation examined in the first eight chapters before moving to the function of program evaluation in the field of applied linguistics. The author starts by highlighting three types of audiences and their role in establishing the program evaluation goals and whose main purpose can be descriptive or judgmental. The nature of the established purposes, at the same time, determines the type of data the evaluator requires (quantitative or qualitative). He emphasizes the importance of a preliminary thematic framework because this defines the starting point and the subsequent stages of the program evaluation process. The chapter serves not only as a summary of the previous chapters, but also functions as a type of check for individuals doing program evaluation. The chapter closes with a discussion on how program evaluation can inform the field, as when examining program objectives, processes, and assessment contributes to classroom-based research, language testing, and instructional methodologies. The role of language program evaluation should not be understood as separate from applied linguistics.
Conducting a language program evaluation is a large task and there are many components that need to be considered beyond the salient elements that come to mind, such as student achievement and effectiveness of the curriculum, and not every program evaluation has the same function and purpose. This book provides a very accessible discussion of the theory and practices surrounding effective program evaluation, and in a sense serves as a guidebook with step by step instructions.