“OKAY? YOU’RE GOING to an Eagles game,” the charismatic math teacher tells her eighth-grade class. She takes care to frame problems using situations that motivate students. “They’re selling hot dogs,” she continues. “They’re very good, by the way, in Philadelphia.” Students giggle. One interjects, “So are the cheesesteaks.”

The teacher brings them back to today’s lesson, simple algebraic expressions: “The hot dogs at [the] stadium where the Eagles play sell for three dollars. I want you to give me a variable expression for [the cost of] N hot dogs.” The students need to learn what it means for a letter to represent an undetermined number. It is an abstraction they must grasp in order to progress in math, but not a particularly easy one to explain.

Marcus volunteers: “N over three dollars.”

“Not over,” the teacher responds, “because that means divided.” She gives the correct expression: “Three N. Three N means however many I buy I have to pay three dollars for [each], right?” Another student is confused. “Where do you get the N from?” he asks.

“That’s the N number of hot dogs,” the teacher explains. “That’s what I’m using as my variable.” A student named Jen asks if that means you should multiply. “That’s right. So if I got two hot dogs, how much money am I spending?”

Six dollars, Jen answers correctly.

“Three times two. Good, Jen.” Another hand shoots up. “Yes?”

“Can it be any letter?” Michelle wants to know. Yes, it can.

“But isn’t it confusing?” Brandon asks.

It can be any letter at all, the teacher explains. On to part two of today’s lesson: evaluating expressions.

“What I just did with the three dollars for a hot dog was ‘evaluating an expression,’” the teacher explains. She points to “7H” on the board and asks, if you make seven dollars an hour and work two hours this week, how much would you earn? Fourteen, Ryan answers correctly. What about if you worked ten hours? Seventy, Josh says. The teacher can see they’re getting it. Soon, though, it will become clear that they never actually understood the expression, they just figured out to multiply whatever two numbers the teacher said aloud.

“What we just did was we took the number of hours and did what? Michelle?” Multiplied it by seven, Michelle answers. Right, but really what we did, the teacher explains, was put it into the expression where H is. “That’s what evaluating means,” she adds, “substituting a number for a variable.”

But now another girl is confused. “So for the hot-dog thing, would the N be two?” she asks. “Yes. We substituted two for the N,” the teacher replies. “We evaluated that example.” Why, then, the girl wants to know, can’t you just write however many dollars a hot dog costs times two? If N is just two, what sense does it make to write “N” instead of “2”?

The students ask more questions that slowly make clear they have failed to connect the abstraction of a variable to more than a single particular number for any given example. When she tries to move back to a realistic context—“social studies class is three times as long as math”—they are totally lost. “I thought fifth period was the longest?” one chimes in. When the students are asked to turn phrases into variable expressions, they have to start guessing.

“What if I say ‘six less than a number’? Michelle?” the teacher asks.

“Six minus N,” Michelle answers. Incorrect.

Aubrey guesses the only other possibility: “N minus six.” Great.

The kids repeat this form of platoon multiple choice. Watched in real time, it can give the impression that they understand.

“What if I gave you 15 minus B?” the teacher asks the class, telling them to transform that back into words. Multiple-choice time. “Fifteen less than B?” Patrick offers. The teacher does not respond immediately, so he tries something else. “B less than 15.” This time the response is immediate; he nailed it. The pattern repeats. Kim is six inches shorter than her mother. “N minus negative six,” Steve offers. No. “N minus six.” Good. Mike is three years older than Jill. Ryan? “Three X,” he says. No, that would be multiply, wouldn’t it? “Three plus X.” Great.

Marcus has now figured out the surefire way to get to the right answer. His hand shoots up for the next question. Three divided by W. Marcus? “W over three, or three over W,” he answers, covering his bases. Good, three over W, got it.

Despite the teacher’s clever vignettes, it is clear that students do not understand how these numbers and letters might be useful anywhere but on a school worksheet. When she asks where variable expressions might be used in the world, Patrick answers: when you’re trying to figure out math problems. Still, the students have figured out how to get the right answers on their worksheets: shrewdly interrogating their teacher.

She mistakes the multiple-choice game they are mastering for productive exploration. Sometimes, the students team up. In staccato succession: “K over eight,” one offers, “K into eight,” another says, “K of eight,” a third tries. The teacher is kind and encouraging even if they don’t manage to toss out the right answer. “It’s okay,” she says, “you’re thinking.” The problem, though, is the way in which they are thinking.


  • • •


That was one American class period out of hundreds in the United States, Asia, and Europe that were filmed and analyzed in an effort to understand effective math teaching. Needless to say, classrooms were very different. In the Netherlands, students regularly trickled into class late, and spent a lot of class time working on their own. In Hong Kong, class looked pretty similar to the United States: lectures rather than individual work filled most of the time. Some countries used a lot of problems in real-world contexts, others relied more on symbolic math. Some classes kept kids in their seats, others had them approach the blackboard. Some teachers were very energetic, others staid. The litany of differences was long, but not one of those features was associated with differences in student achievement across countries. There were similarities too. In every classroom in every country, teachers relied on two main types of questions.

The more common were “using procedures” questions: basically, practice at something that was just learned. For instance, take the formula for the sum of the interior angles of a polygon (180 × (number of polygon sides − 2)), and apply it to polygons on a worksheet. The other common variety was “making connections” questions, which connected students to a broader concept, rather than just a procedure. That was more like when the teacher asked students why the formula works, or made them try to figure out if it works for absolutely any polygon from a triangle to an octagon. Both types of questions are useful and both were posed by teachers in every classroom in every country studied. But an important difference emerged in what teachers did after they asked a making-connections problem.

Rather than letting students grapple with some confusion, teachers often responded to their solicitations with hint-giving that morphed a making-connections problem into a using-procedures one. That is exactly what the charismatic teacher in the American classroom was doing. Lindsey Richland, a University of Chicago professor who studies learning, watched that video with me, and told me that when the students were playing multiple choice with the teacher, “what they’re actually doing is seeking rules.” They were trying to turn a conceptual problem they didn’t understand into a procedural one they could just execute. “We’re very good, humans are, at trying to do the least amount of work that we have to in order to accomplish a task,” Richland told me. Soliciting hints toward a solution is both clever and expedient. The problem is that when it comes to learning concepts that can be broadly wielded, expedience can backfire.

In the United States, about one-fifth of questions posed to students began as making-connections problems. But by the time the students were done soliciting hints from the teacher and solving the problems, a grand total of zero percent remained making-connections problems. Making-connections problems did not survive the teacher-student interactions.

Teachers in every country fell into the same trap at times, but in the higher-performing countries plenty of making-connections problems remained that way as the class struggled to figure them out. In Japan, a little more than half of all problems were making-connections problems, and half of those stayed that way through the solving. An entire class period could be just one problem with many parts. When a student offered an idea for how to approach a problem, rather than engaging in multiple choice, the teacher had them come to the board and put a magnet with their name on it next to the idea. By the end of class, one problem on a blackboard the size of an entire wall served as a captain’s log of the class’s collective intellectual voyage, dead ends and all. Richland originally tried to label the videotaped lessons with a single topic of the day, “but we couldn’t do it with Japan,” she said, “because you could engage with these problems using so much different content.” (There is a specific Japanese word to describe chalkboard writing that tracks conceptual connections over the course of collective problem solving: bansho.)

Just as it is in golf, procedure practice is important in math. But when it comprises the entire math training strategy, it’s a problem. “Students do not view mathematics as a system,” Richland and her colleagues wrote. They view it as just a set of procedures. Like when Patrick was asked how variable expressions connected to the world, and answered that they were good for answering questions in math class.

In their research, Richland and her collaborators highlighted the stunning degree of reliance community college students—41 percent of all undergraduate students in the United States—have on memorized algorithms. Asked whether a/5 or a/8 is greater, 53 percent of students answered correctly, barely better than guessing. Asked to explain their answers, students frequently pointed to some algorithm. Students remembered that they should focus on the bottom number, but a lot of them recalled that a larger denominator meant a/8 was bigger than a/5. Others remembered that they should try to get a common denominator, but weren’t sure why. There were students who reflexively cross-multiplied, because they knew that’s what you do when you see fractions, even though it had no relevance to the problem at hand. Only 15 percent of the students began with broad, conceptual reasoning that if you divide something into five parts, each piece will be larger than if you divide the same thing into eight parts. Every single one of those students got the correct answer.

Some of the college students seemed to have unlearned number sense that most children have, like that adding two numbers gives you a third comprised of the first two. A student who was asked to verify that 462 + 253 = 715, subtracted 253 from 715, and got 462. When he was asked for another strategy, he could not come up with subtracting 462 from 715 to see that it equals 253, because the rule he learned was to subtract the number to the right of the plus sign to check the answer.

When younger students bring home problems that force them to make connections, Richland told me, “parents are like, ‘Lemme show you, there’s a faster, easier way.’” If the teacher didn’t already turn the work into using-procedures practice, well-meaning parents will. They aren’t comfortable with bewildered kids, and they want understanding to come quickly and easily. But for learning that is both durable (it sticks) and flexible (it can be applied broadly), fast and easy is precisely the problem.


  • • •

“Some people argue that part of the reason U.S. students don’t do as well on international measures of high school knowledge is that they’re doing too well in class,” Nate Kornell, a cognitive psychologist at Williams College, told me. “What you want is to make it easy to make it hard.”

Kornell was explaining the concept of “desirable difficulties,” obstacles that make learning more challenging, slower, and more frustrating in the short term, but better in the long term. Excessive hint-giving, like in the eighth-grade math classroom, does the opposite; it bolsters immediate performance, but undermines progress in the long run. Several desirable difficulties that can be used in the classroom are among the most rigorously supported methods of enhancing learning, and the engaging eighth-grade math teacher accidentally subverted all of them in the well-intended interest of before-your-eyes progress.

One of those desirable difficulties is known as the “generation effect.” Struggling to generate an answer on your own, even a wrong one, enhances subsequent learning. Socrates was apparently on to something when he forced pupils to generate answers rather than bestowing them. It requires the learner to intentionally sacrifice current performance for future benefit.

Kornell and psychologist Janet Metcalfe tested sixth graders in the South Bronx on vocabulary learning, and varied how they studied in order to explore the generation effect. Students were given some of the words and definitions together. For example, To discuss something in order to come to an agreement: Negotiate. For others, they were shown only the definition and given a little time to think of the right word, even if they had no clue, before it was revealed. When they were tested later, students did way better on the definition-first words. The experiment was repeated on students at Columbia University, with more obscure words (Characterized by haughty scorn: Supercilious). The results were the same. Being forced to generate answers improves subsequent learning even if the generated answer is wrong. It can even help to be wildly wrong. Metcalfe and colleagues have repeatedly demonstrated a “hypercorrection effect.” The more confident a learner is of their wrong answer, the better the information sticks when they subsequently learn the right answer. Tolerating big mistakes can create the best learning opportunities.*

Kornell helped show that the long-run benefits of facilitated screwups extend to primates only slightly less studious than Columbia students. Specifically, to Oberon and Macduff, two rhesus macaques trained to learn lists by trial and error. In a fascinating experiment, Kornell worked with an animal cognition expert to give Oberon and Macduff lists of random pictures to memorize, in a particular order. (Example: a tulip, a school of fish, a cardinal, Halle Berry, and a raven.) The pictures were all displayed simultaneously on a screen. By pressing them in trial-and-error fashion, the monkeys had to learn the desired order and then practice it repeatedly. But all practice was not designed equal.

In some practice sessions, Oberon (who was generally brighter) and Macduff were automatically given hints on every trial, showing them the next picture in the list. For other lists, they could voluntarily touch a hint box on the screen whenever they were stuck and wanted to be shown the next item. For still other lists, they could ask for a hint on half of their practice attempts. And for a final group of lists, no hints at all.

In the practice sessions with hints upon request, the monkeys behaved a lot like humans. They almost always requested hints when they were available, and thus got a lot of the lists right. Overall, they had about 250 trials to learn each list.

After three days of practice, the scientists took off the training wheels. Starting on day four, the memorizing monkeys had to repeat all the lists from every training condition without any hints whatsoever. It was a performance disaster. Oberon only got about one-third of the lists right. Macduff got less than one in five. There was, though, an exception: the lists on which they never had hints at all.

For those lists, on day one of practice the duo had performed terribly. They were literally monkeys hitting buttons. But they improved steadily each training day. On test day, Oberon nailed almost three-quarters of the lists that he had learned with no hints. Macduff got about half of them.

The overall experiment results went like this: the more hints that were available during training, the better the monkeys performed during early practice, and the worse they performed on test day. For the lists that Macduff spent three days practicing with automatic hints, he got zero correct. It was as if the pair had suddenly unlearned every list that they practiced with hints. The study conclusion was simple: “training with hints did not produce any lasting learning.”

Training without hints is slow and error-ridden. It is, essentially, what we normally think of as testing, except for the purpose of learning rather than evaluation—when “test” becomes a dreaded four-letter word. The eighth-grade math teacher was essentially testing her students in class, but she was facilitating or outright giving them the answers.

Used for learning, testing, including self-testing, is a very desirable difficulty. Even testing prior to studying works, at the point when wrong answers are assured. In one of Kornell’s experiments, participants were made to learn pairs of words and later tested on recall. At test time, they did the best with pairs that they learned via practice quizzes, even if they had gotten the answers on those quizzes wrong. Struggling to retrieve information primes the brain for subsequent learning, even when the retrieval itself is unsuccessful. The struggle is real, and really useful. “Like life,” Kornell and team wrote, “retrieval is all about the journey.”


  • • •

If that eighth-grade classroom followed a typical academic plan over the course of the year, it is precisely the opposite of what science recommends for durable learning—one topic was probably confined to one week and another to the next. Like a lot of professional development efforts, each particular concept or skill gets a short period of intense focus, and then on to the next thing, never to return. That structure makes intuitive sense, but it forgoes another important desirable difficulty: “spacing,” or distributed practice.

It is what it sounds like—leaving time between practice sessions for the same material. You might call it deliberate not-practicing between bouts of deliberate practice. “There’s a limit to how long you should wait,” Kornell told me, “but it’s longer than people think. It could be anything, studying foreign language vocabulary or learning how to fly a plane, the harder it is, the more you learn.” Space between practice sessions creates the hardness that enhances learning. One study separated Spanish vocabulary learners into two groups—a group that learned the vocab and then was tested on it the same day, and a second that learned the vocab but was tested on it a month later. Eight years later, with no studying in the interim, the latter group retained 250 percent more. For a given amount of Spanish study, spacing made learning more productive by making it easy to make it hard.

It does not take nearly that long to see the spacing effect. Iowa State researchers read people lists of words, and then asked for each list to be recited back either right away, after fifteen seconds of rehearsal, or after fifteen seconds of doing very simple math problems that prevented rehearsal. The subjects who were allowed to reproduce the lists right after hearing them did the best. Those who had fifteen seconds to rehearse before reciting came in second. The group distracted with math problems finished last. Later, when everyone thought they were finished, they were all surprised with a pop quiz: write down every word you can recall from the lists. Suddenly, the worst group became the best. Short-term rehearsal gave purely short-term benefits. Struggling to hold on to information and then recall it had helped the group distracted by math problems transfer the information from short-term to long-term memory. The group with more and immediate rehearsal opportunity recalled nearly nothing on the pop quiz. Repetition, it turned out, was less important than struggle.

It isn’t bad to get an answer right while studying. Progress just should not happen too quickly, unless the learner wants to end up like Oberon (or, worse, Macduff), with a knowledge mirage that evaporates when it matters most. As with excessive hint-giving, it will, as a group of psychologists put it, “produce misleadingly high levels of immediate mastery that will not survive the passage of substantial periods of time.” For a given amount of material, learning is most efficient in the long run when it is really inefficient in the short run. If you are doing too well when you test yourself, the simple antidote is to wait longer before practicing the same material again, so that the test will be more difficult when you do. Frustration is not a sign you are not learning, but ease is.

Platforms like Medium and LinkedIn are absolutely rife with posts about shiny new, unsupported learning hacks that lead to mind-blowingly rapid progress—from special dietary supplements and “brain-training” apps to audio cues meant to alter brain waves. In 2007, the U.S. Department of Education published a report by six scientists and an accomplished teacher who were asked to identify learning strategies that truly have scientific backing. Spacing, testing, and using making-connections questions were on the extremely short list. All three impair performance in the short term.

As with the making-connections questions Richland studied, it is difficult to accept that the best learning road is slow, and that doing poorly now is essential for better performance later. It is so deeply counterintuitive that it fools the learners themselves, both about their own progress and their teachers’ skill. Demonstrating that required an extraordinarily unique study. One that only a setting like the U.S. Air Force Academy could provide.


  • • •

In return for full scholarships, cadets at the Air Force Academy commit to serve as military officers for a minimum of eight years after graduation.* They submit to a highly structured and rigorous academic program heavy on science and engineering. It includes a minimum of three math courses for every student.

Every year, an algorithm randomly assigns incoming cadets to sections of Calculus I, each with about twenty students. To examine the impact of professors, two economists compiled data on more than ten thousand cadets who had been randomly assigned to calculus sections taught by nearly a hundred professors over a decade. Every section used the exact same syllabus, the exact same exam, and the exact same post-course professor evaluation form for cadets to fill out.

After Calculus I, students were randomized again to Calculus II sections, again with the same syllabus and exam, and then again to more advanced math, science, and engineering courses. The economists confirmed that standardized test scores and high school grades were spread evenly across sections, so the instructors were facing similar challenges. The Academy even standardized test-grading procedures, so every student was evaluated in the same manner. “Potential ‘bleeding heart’ professors,” the economists wrote, “had no discretion to boost grades.” That was important, because they wanted to see what differences individual teachers made.

Unsurprisingly, there was a group of Calculus I professors whose instruction most strongly boosted student performance on the Calculus I exam, and who got sterling student evaluation ratings. Another group of professors consistently added less to student performance on the exam, and students judged them more harshly in evaluations. But when the economists looked at another, longer-term measure of teacher value added—how those students did on subsequent math and engineering courses that required Calculus I as a prerequisite—the results were stunning. The Calculus I teachers who were the best at promoting student overachievement in their own class were somehow not great for their students in the long run. “Professors who excel at promoting contemporaneous student achievement,” the economists wrote, “on average, harm the subsequent performance of their students in more advanced classes.” What looked like a head start evaporated.

The economists suggested that the professors who caused short-term struggle but long-term gains were facilitating “deep learning” by making connections. They “broaden the curriculum and produce students with a deeper understanding of the material.” It also made their courses more difficult and frustrating, as evidenced by both the students’ lower Calculus I exam scores and their harsher evaluations of their instructors. And vice versa. The calculus professor who ranked dead last in deep learning out of the hundred studied—that is, his students underperformed in subsequent classes—was sixth in student evaluations, and seventh in student performance during his own class. Students evaluated their instructors based on how they performed on tests right now—a poor measure of how well the teachers set them up for later development—so they gave the best marks to professors who provided them with the least long-term benefit. The economists concluded that students were actually selectively punishing the teachers who provided them the most long-term benefit. Tellingly, Calculus I students whose teachers had fewer qualifications and less experience did better in that class, while the students of more experienced and qualified teachers struggled in Calculus I but did better in subsequent courses.

A similar study was conducted at Italy’s Bocconi University, on twelve hundred first-year students who were randomized into introductory course sections in management, economics, or law, and then the courses that followed them in a prescribed sequence over four years. It showed precisely the same pattern. Teachers who guided students to overachievement in their own course were rated highly, and undermined student performance in the long run.

Psychologist Robert Bjork first used the phrase “desirable difficulties” in 1994. Twenty years later, he and a coauthor concluded a book chapter on applying the science of learning like this: “Above all, the most basic message is that teachers and students must avoid interpreting current performance as learning. Good performance on a test during the learning process can indicate mastery, but learners and teachers need to be aware that such performance will often index, instead, fast but fleeting progress.”


  • • •

Here is the bright side: over the past forty years, Americans have increasingly said in national surveys that current students are getting a worse education than they themselves did, and they have been wrong. Scores from the National Assessment of Educational Progress, “the nation’s report card,” have risen steadily since the 1970s. Unquestionably, students today have mastery of basic skills that is superior to students of the past. School has not gotten worse. The goals of education have just become loftier.

Education economist Greg Duncan, one of the most influential education professors in the world, has documented this trend. Focusing on “using procedures” problems worked well forty years ago when the world was flush with jobs that paid middle-class salaries for procedural tasks, like typing, filing, and working on an assembly line. “Increasingly,” according to Duncan, “jobs that pay well require employees to be able to solve unexpected problems, often while working in groups. . . . These shifts in labor force demands have in turn put new and increasingly stringent demands on schools.”

Here is a math question from the early 1980s basic skills test of all public school sixth graders in Massachusetts:

Carol can ride her bike 10 miles per hour. If Carol rides her bike to the store, how long will it take?

To solve this problem, you would need to know:

  1. A) How far it is to the store.
  2. B) What kind of bike Carol has.
  3. C) What time Carol will leave.
  4. D) How much Carol has to spend.

And here is a question Massachusetts sixth graders got in 2011:

Paige, Rosie, and Cheryl each spent exactly $9.00 at the same snack bar.

  • Paige bought 3 bags of peanuts.
  • Rosie bought 2 bags of peanuts and 2 pretzels.
  • Cheryl bought 1 bag of peanuts, 1 pretzel, and 1 milk shake.
    1. What is the cost, in dollars, of 1 bag of peanuts? Show or explain how you got your answer.
    2. What is the cost, in dollars, of 1 pretzel? Show or explain how you got your answer.
    3. What is the total number of pretzels that can be bought for the cost of 1 milk shake? Show or explain how you got your answer.

For every problem like the first one, the simple formula “distance = rate × time” could be memorized and applied. The second problem requires the connection of multiple concepts that are then applied to a new situation. The teaching strategies that current teachers experienced when they were students are no longer good enough. Knowledge increasingly needs not merely to be durable, but also flexible—both sticky and capable of broad application.

Toward the end of the eighth-grade math class that I watched with Lindsey Richland, the students settled into a worksheet for what psychologists call “blocked” practice. That is, practicing the same thing repeatedly, each problem employing the same procedure. It leads to excellent immediate performance, but for knowledge to be flexible, it should be learned under varied conditions, an approach called varied or mixed practice, or, to researchers, “interleaving.”

Interleaving has been shown to improve inductive reasoning. When presented with different examples mixed together, students learn to create abstract generalizations that allow them to apply what they learned to material they have never encountered before. For example, say you plan to visit a museum and want to be able to identify the artist (Cézanne, Picasso, or Renoir) of paintings there that you have never seen. Before you go, instead of studying a stack of Cézanne flash cards, and then a stack of Picasso flash cards, and then a stack of Renoir, you should put the cards together and shuffle, so they will be interleaved. You will struggle more (and probably feel less confident) during practice, but be better equipped on museum day to discern each painter’s style, even for paintings that weren’t in the flash cards.

In a study using college math problems, students who learned in blocks—all examples of a particular type of problem at once—performed a lot worse come test time than students who studied the exact same problems but all mixed up. The blocked-practice students learned procedures for each type of problem through repetition. The mixed-practice students learned how to differentiate types of problems.

The same effect has appeared among learners studying everything from butterfly species identification to psychological-disorder diagnosis. In research on naval air defense simulations, individuals who engaged in highly mixed practice performed worse than blocked practicers during training, when they had to respond to potential threat scenarios that became familiar over the course of the training. At test time, everyone faced completely new scenarios, and the mixed-practice group destroyed the blocked-practice group.

And yet interleaving tends to fool learners about their own progress. In one of Kornell and Bjork’s interleaving studies, 80 percent of students were sure they had learned better with blocked than mixed practice, whereas 80 percent performed in a manner that proved the opposite. The feeling of learning, it turns out, is based on before-your-eyes progress, while deep learning is not. “When your intuition says block,” Kornell told me, “you should probably interleave.”

Interleaving is a desirable difficulty that frequently holds for both physical and mental skills. A simple motor-skill example is an experiment in which piano students were asked to learn to execute, in one-fifth of a second, a particular left-hand jump across fifteen keys. They were allowed 190 practice attempts. Some used all of those practicing the fifteen-key jump, while others switched between eight-, twelve-, fifteen-, and twenty-two-key jumps. When the piano students were invited back for a test, those who underwent the mixed practice were faster and more accurate at the fifteen-key jump than the students who had only practiced that exact jump. The “desirable difficulty” coiner himself, Robert Bjork, once commented on Shaquille O’Neal’s perpetual free-throw woes to say that instead of continuing to practice from the free-throw line, O’Neal should practice from a foot in front of and behind it to learn the motor modulation he needed.

Whether the task is mental or physical, interleaving improves the ability to match the right strategy to a problem. That happens to be a hallmark of expert problem solving. Whether chemists, physicists, or political scientists, the most successful problem solvers spend mental energy figuring out what type of problem they are facing before matching a strategy to it, rather than jumping in with memorized procedures. In that way, they are just about the precise opposite of experts who develop in kind learning environments, like chess masters, who rely heavily on intuition. Kind learning environment experts choose a strategy and then evaluate; experts in less repetitive environments evaluate and then choose.


  • • •

Desirable difficulties like testing and spacing make knowledge stick. It becomes durable. Desirable difficulties like making connections and interleaving make knowledge flexible, useful for problems that never appeared in training. All slow down learning and make performance suffer, in the short term. That can be a problem, because like the Air Force cadets, we all reflexively assess our progress by how we are doing right now. And like the Air Force cadets, we are often wrong.

In 2017, Greg Duncan, the education economist, along with psychologist Drew Bailey and colleagues, reviewed sixty-seven early childhood education programs meant to boost academic achievement. Programs like Head Start did give a head start, but academically that was about it. The researchers found a pervasive “fadeout” effect, where a temporary academic advantage quickly diminished and often completely vanished. On a graph, it looks eerily like the kind that show future elite athletes catching up to their peers who got a head start in deliberate practice.

A reason for this, the researchers concluded, is that early childhood education programs teach “closed” skills that can be acquired quickly with repetition of procedures, but that everyone will pick up at some point anyway. The fadeout was not a disappearance of skill so much as the rest of the world catching up. The motor-skill equivalent would be teaching a kid to walk a little early. Everyone is going to learn it anyway, and while it might be temporarily impressive, there is no evidence that rushing it matters.

The research team recommended that if programs want to impart lasting academic benefits they should focus instead on “open” skills that scaffold later knowledge. Teaching kids to read a little early is not a lasting advantage. Teaching them how to hunt for and connect contextual clues to understand what they read can be. As with all desirable difficulties, the trouble is that a head start comes fast, but deep learning is slow. “The slowest growth,” the researchers wrote, occurs “for the most complex skills.”

Duncan landed on the Today show discussing his team’s findings. The counteropinion was supplied by parents and an early childhood teacher who were confident that they could see a child’s progress. That is not in dispute. The question is how well they can judge the impact on future learning, and the evidence says that, like the Air Force cadets, the answer is not very well.*

Before-our-eyes progress reinforces our instinct to do more of the same, but just like the case of the typhoid doctor, the feedback teaches the wrong lesson. Learning deeply means learning slowly. The cult of the head start fails the learners it seeks to serve.

Knowledge with enduring utility must be very flexible, composed of mental schemes that can be matched to new problems. The virtual naval officers in the air defense simulation and the math students who engaged in interleaved practice were learning to recognize deep structural commonalities in types of problems. They could not rely on the same type of problem repeating, so they had to identify underlying conceptual connections in simulated battle threats, or math problems, that they had never actually seen before. They then matched a strategy to each new problem. When a knowledge structure is so flexible that it can be applied effectively even in new domains or extremely novel situations, it is called “far transfer.”

There is a particular type of thinking that facilitates far transfer—a type that Alexander Luria’s Uzbek villagers could not employ—and that can seem far-fetched precisely because of how far it transfers. And it’s a mode of broad thinking th…

Extract of the Book “Range“.