What can Data do for EFL?

image of glasses

In the US, something very interesting is happening as an indirect result of standards-based testing and the establishment of charter schools: a certain interesting set of research conditions has emerged. Charter schools are themselves often experiments, established in part to be “…R&D engines for traditional public schools” (Dobbie & Fryer, 2012). As such, they often eschew many elements of traditional education practices, and instead attempt to institute the results of research into educational effectiveness from the last several decades. For many charter schools, this research-informed pedagogy assumes a unifying or rallying role, and the ideas are woven into their mission statements and school cultures, as they try to take underprivileged kids and put them on the road to college, through grit training, artistic expression, or higher expectations. Even from a brief visit to the websites of charter systems such as Uncommon Schools, MATCH, or KIPP, you can see what they are intending to do and why. And some of these charter schools have become extremely successful—in terms of academic achievement by students, and in terms of popularity. And both of these have led to new research opportunities. You see, the best charter schools now have to resort to lotteries to choose students, lotteries that create nice groups of randomly-sampled individuals: those who got in and received an experimental treatment in education, and those who didn’t and ended up going to more traditional schools. And that provides researchers with a way to compare programs by looking at what happens to the students in these groups. And some of the results may surprise you.

Let’s play one of our favorite games again: Guess the Effect Size!! It’s simple. Just look at the list of interventions below and decide whether each intervention has a large (significant, important, you-should-be-doing-this) impact, or a small (minimal, puny, low-priority) impact. Ready? Let’s go!

  1. Make small classes
  2. Spend more money per student
  3. Make sure all teachers are certified
  4. Deploy as many teachers with advanced degrees as possible
  5. Have teachers give frequent feedback
  6. Make use of data to guide instruction
  7. Create a system of high-dosage tutoring
  8. Increase the amount of time for instruction
  9. Have high expectations for students

Ready to hear the answers? Well, according to Dobbie & Fryer (2012), the first four on the list are not correlated with school effectiveness, while the next five account for a whopping 45% of the reasons schools are effective. Looking at the list, this is not surprising, especially if you are aware  of the power of formative feedback.

Some people might be a little skeptical still. Fine. Go and look for studies that prove Dobbie and Fryer wrong. You might find some. Then look at where the research was done. Is the setting like yours? Just going through this process means we are putting data to work. And that is much, much better than just going with our own instincts, which are of course based on our own experiences. I work teaching English in Japan, and I know that is a far cry from the hard knocks neighborhoods where Dobbie and Fryer looked into the effects of interventions in Match schools. But I think there are enough similarities to warrant giving credence to these results and even giving them a try at schools Tokyo. I have several reasons. First, extensive research on formative assessment, high expectations, classroom time, and pinpointed direct instruction is very robust. Nothing in their list is surprising. Second, in Japan, English is often as foreign from the daily lives of most students as physics or math are from the lives of many American teens. The motivation for learning it is likewise unlikely to be very strong at the beginning. Many of the students in the Match system are less than confident with their ability with many subjects, and are less than confident with aiming at college, a world that is often quite foreign to their lives. Many English learners in Japan similarly see English as foreign and unrelated to their lives, and the notion that they can become proficient at it and make it a part of their future social and/or professional lives, requires a great leap in faith.

But through the Match program, students do gain in confidence, and they do gain in ability, and they do get prepared for college. Given the demographic, the success of Match and the other “No Excuses” systems mentioned above is stunning. It also seems to be long lasting. Davis & Heller (2015) found that students who attended “No Excuses” schools were 10.0 percentage points more likely to attend college and 9.5 percentage points more likely to enroll for at least four semesters. Clearly the kids are getting more than fleeting bumps in scores on tests. And clearly the approach of these schools—in putting to work proven interventions—is having a positive effect, although not everyone seems to be happy.

And it’s not just that they are making use of research results. These schools are putting data to use in a variety of ways. Paul Bambrick-Sotoyo of Uncommon Schools has published a book that outlines their approach very nicely. In it we can find this:

Data-driven instruction is the philosophy that schools should constantly focus on one simple question: are our students learning? Using data-based methods, these schools break from the traditional emphasis on what teachers ostensibly taught in favor of a clear-eyed, fact-based focus on what students actually learned (pg. xxv).

Driven By Data book cover

They do this by adhering to four basic principles. Schools must create serious interim assessments that provide meaningful data. This data then must be carefully analyzed so the data produces actionable finding. And these findings must be tied to classroom practices that build on strengths and eliminate shortcomings. And finally, all of this must occur in an environment where the culture of data-driven instruction is valued and practiced and can thrive. Mr. Bambrick-Sotoyo goes through a list of mistakes that most schools make, challenges that are important to meet if data is to be used to “…make student learning the ultimate test of teaching.” The list feels more like a checklist of standard operating procedures at almost every program I have ever worked in in EFL. Inferior, infrequent or secretive assessments? Check, check, check. Curriculum-assessment disconnect? Almost always. Separation of teaching and analysis? Usually, no analysis whatsoever. Ineffective follow-up? Har har har. I don’t believe I have ever experienced or even heard of any kind of follow-up at the program level. Well, you get the point. What is happening in EFL programs in Japan now is very far removed from a system where data is put to work to maximize learning.

But let’s not stop at the program level. Doug Lemov has been building up a fine collection of techniques that teachers can use to improve learning outcomes. He is now up to 62 techniques that “put students on the path to college,” after starting with 49 in the earlier edition. And how does he decide on these techniques? Through a combination of videoing teachers and tracking the performance of their classes. Simple, yet revolutionary. The group I belonged to until this past April was trying to do something similar with EFL at public high schools in Japan, but the lack of standardized test taking makes it difficult to compare outcomes. But there is no doubt in my mind that this is exactly the right direction in which we should be going. Find out what works, tease out exactly what it is that is leading to improvement (identify the micro-skills), and then train people through micro-teaching to do these things and do them well and do them better still. Teaching is art, Mr. Lemov says in the introduction, but “…great art relies on the mastery and application of foundational skills” (pg.1). Mr. Lemov has done a great service to us by videoing and analyzing thousands of hours of classes, and then triangulating that with test results. And then further trialing and tweaking those results. If you don’t have copy of the book, I encourage you to do so. It’s just a shame that it isn’t all about language teaching and learning.

Interest in using data in EFL/ESL is also growing. A recent issue of Language Testing focused on diagnostic assessment. This is something that has grown out of the same standards-based testing that allowed the charter schools in the US to thrive. You can download a complimentary article (“Diagnostic assessment of reading and listening in a second or foreign language: Elaborating on diagnostic principles”, by Luke Harding, J. Charles Alderson, and Tineke Brunfaut). You can also listen to a podcast interview with Glen Fulcher and Eunice Eunhee Jang, one of the contributors to the special issue. It seems likely that this is an area of EFL that will continue to grow in the future.

The Slow Drive to Data in Japanese EFL

highway image

Japanese public school education, as a whole, is remarkably cost efficient, or so it seems at first glance on paper. Japan spends right around the OECD average per child for both primary and secondary education, and much less than the U.S., the U.K., or the Scandinavian countries, or indeed most European countries. Yet, Japan continually scores high on international tests of achievement in reading, math, and science. On the most recent PISA test (2012), for example, Japan was 4th in reading and science, and 7th in math. This is a stunning achievement, one that most countries in the world would love to emulate.

No doubt some of these impressive results are at least partially due to factors outside the school and classrooms of public or government-mandated schools, however. We really can’t underestimate the effects of high expenditures by parents on supplementary education, expensive cram schools, or juku  in particular. There is an industry built up around these school-companies that boggles the minds of the uninitiated . They come in many flavors, but generally speaking do only one thing—prepare kids to take tests, especially entrance exams. They do this through a combination of tracking entrance exams and demographics, and providing intensive preparation for taking those tests. They are data collecting and processing machines, making extensive use of data for all parts of their operations–from advertising, to information gathering, to student performance tracking. They do this all in a way that is extremely impressive.  There is in Japan both a strong cultural emphasis on the importance of education, and a climate where frequent test taking is considered both normal and important. The jukus have leveraged that to create an industry that is huge, ubiquitous, and because parents are paying 35,000 yen-50,000 yen  per month per child to these businesses, economically very significant. Combined with the general education, this is  an education system that, although expensive and requiring serious commitments in time (evenings, holidays), is effective for the education of reading, math, and science.

But somehow not for English. PISA does not test English, but comparisons on norm-referenced proficiency scores across countries reveal Japan to be a poor performer. TOEFL iBT scores from 2013 show that Japan is not punching at its weight. If we look only at overall scores, Japan (70) is woefully behind China (77), South Korea (85), and Taiwan (79), but remarkably similar to Mongolia, Cambodia, and Laos. And if we look only at the scores for reading, the skill that receives by far the greatest amount of attention in the school system, the results are not really any better: Japan (18), China (20), South Korea (22), and Taiwan (20). The scores on the IELTS tests a show similar, though less pronounced pattern. On the Academic version, Japan again scores lower than its Asian neighbors: Japan (5.7). South Korea (5.9), and Taiwan (6.0). Now I know some people have validity issues when comparing countries using test data, and certainly that is true for TOEIC scores by country, because that test is so widely applied and misused. But the TOEFL iBT and the IELTS are high-stakes tests that are taken by a fairly specific, highly-motivated, and well-heeled demographic. The scores say nothing about average students in those countries, not to mention the less proficient students, to be sure, but I do think they are fair to compare. And I know that students and programs are much more than the sum of the ability of students to take tests, but come on. It is not totally wrong to say that almost the entire purpose of English education in junior and senior high school, and the accompanying jukus, is to get students ready for tests, and yet the results are still pretty poor.

graph showing percentages of jr and sr high kids who go to juku

Percentages of students who go to juku (and how often per week) from elementary school to high school http://berd.benesse.jp/berd/data/dataclip/clip0006/

So what explains the problem? Well, this has been the subject of endless debates, from what should be taught to how it should be taught. Lots of people blame the entrance exams, but let’s be careful with that. It is probably more accurate to say that the type and quality of the entrance exams is certainly preventing the power of the juku machine to help improve the situation. What I mean is that the types of tests jukus and most schools focus on are different from tests like the TOEFL iBT or IELTS. The TOEFL iBT and IELTS assess all four skills (reading, listening, writing, and speaking), and they do so in a way that judges whether the test taker can use the skills communicatively to understand and express ideas and information. Entrance exams in Japan, however, very often have an abundance of contextless sentences and a abnormally large number of grammar-focused questions. Simply put: the preparation students engage in to pass high school or college entrance exams will not help all that much when students sit down to take the TOEFL iBT or the IELTS tests.

If entrance exams tested four skills and the quality of written and spoken expression, you can bet that the jukus would find a way to prepare students for that (and a very large number of parents would be really willing to pay them handsomely to do so), instead of the (mostly) discrete vocabulary and grammar items they can get away with focusing on now. You can be sure that they would find ways to bring data collection and analysis to bear, if they had to deal with this new reality. The fact that their system works so well for multiple choice items  and the fact that productive skills of English are not well-suited for multiple choice assessment is probably one of the biggest problems for Japanese English education.

But it’s not only the tests that are a problem. The current official policy for public school classrooms favors a better balance of the four skills, using the L2 more predominantly in the classroom for procedural and communicative interaction between the teacher and the students and between the students themselves (communicative language teaching, or CLT). However, what the Course of Study pushes for and what the teachers in classrooms are able to manage is not always the same. Of the recent policy mandates, it is the Teach-English-(mostly)-in-English directive that is causing the most consternation among teachers, probably because it is so obvious and measurable. Teachers are mostly, if often tepidly, complying with this policy, and in many cases are trying hard to make it happen, according to statistics I’ve seen. These statistics on use of English are tracked regularly using questionnaires and self-reporting by teachers. And the numbers show that about 50% of teachers are now using English at least 50% of the time they are in classrooms—although there is great variation between teachers at the school level, district level, and prefectural level. Almost no one is recording classes regularly and counting the minutes, however, with this group the only exception I know. The case of CLT use is fuzzier and less reliable still, partly because interpretations of what are and are not CLT activities vary. Compliance with CLT directives is happening, but its deployment is certainly not systematic, and it is not widespread, and it is not receiving a lot of classroom time. Even these modest changes (inroads?), however, have taken tremendous effort to achieve, both in terms of government resources and effort on the part of individual English teachers who, in most cases, never experienced lessons taught in English (or using a CLT approach) themselves as students, were not trained to conduct lessons that way in pre-service education courses or training, and received very little in-service guidance or training as they attempted to comply with government directives. It’s a lot of effort and resources going toward something that might not work, something that is debatable; because not enough clear evidence exists to prove it works. Not yet, at least. Neither the public school system, nor the Education Ministry have the resources, expertise, or system for gathering English subject performance data effectively and efficiently. In classrooms, teachers rarely track performance. At the school or program level, there is no concept of tracking micro-skill development over months or years, at least none that I’ve seen. Maybe it’s happening at some private schools, but my guess is that all anyone is tracking is multiple choice test-taking performance, with maybe some vocabulary size and reading speed in programs that have their act together a little.

The reason I bring this up, however, is to make you think about what is driving this policy, why people have the faith they have in approaches, methods, or materials, without really knowing if or to what degree they work. In the world of EFL in Japan, a lot of faith drives a lot of programs—more specifically, a lot of faith and a lot of tradition. Walk into any mid-level high school and you can find students in English classes being prepared for multiple choice tests they will never take, for example. Within existing lessons, there is a lot of tweaking to make interventions “work” better, no doubt. And while sometimes that means more effective, it could also mean more time-efficient, or easier for students to do. The honest truth is that “effective” is often hard to determine. By definition, effective interventions (even those that have been carefully researched) must be sustained for rather long periods of time—months at least. Micro-skills are hard to decide, hard to set goals for, and hard to track. But the potential effect is great.

Adding more English to classrooms might make students a little better at listening (though Eiken scores comparing prefectures that differ greatly in the amount of classroom English used seem to show no correlation). And I haven’t seen any data that suggests that students in Japan are doing better at anything English-wise because their teachers have tagged a little bit of “communicative” writing or speaking to the end of regular explanation-heavy lessons. I’ve made this point before: a little bit of CLT dabbling is unlikely to have much effect (though this should not be interpreted as criticism of introducing more CLT or any CLT activities into a classroom—you gotta start somewhere, you know). I have spoken to more than a few high school and university teachers who express great alarm at the state of grammar knowledge of the students they see regularly. The suggestion I hear is that all of this CLT stuff is coming at the expense of good old grammar teaching. While I am sure that this may be impacting the ability of students to tackle entrance exam questions, my own experience and my own opinion is that students these days are indeed more able to use at least a little of the knowledge of English they build up over the years in schools, something that was really not the case years ago. And, by the way, if you have ever sat through grammar lessons in high schools in Japan, you probably won’t think that more of that could be better for anything.

But that brings me to my point. We are all slaves to our own experience and our own perspective; as Daniel Kahneman  calls it, what you see is all there is.  All we seem to have is anecdotal evidence when it comes to program-level decisions. If only there were a way to take all that data generated by all that testing in Japan and make it work better for us. In closing, I’d like to leave you with a quote for John Hattie’s wonderful book Visible Learning for Teachers:

“The major message, however, is that rather than recommending a particular teaching method, teachers need to be evaluators of the effect of the methods that they choose” (pg. 84)