About the Original Article's Tone

This is a peer-reviewed empirical journal article published in Instructional Science (2024) — a cognitive psychology / learning sciences journal, not a math education or ESOL-specific one. The intended audience is researchers in instructional design, cognitive load theory, and mathematics education. It is not written for classroom teachers.

It uses:

The vibe: This feels like a careful lab report that wandered into a real classroom. You get a sense of researchers who genuinely tried to do this right under difficult conditions (during COVID, in urban sheltered-algebra classes, with a small N) — but the writing is designed to satisfy peer reviewers, not to make you want to run this in your classroom on Monday.

What it glosses over: The article says sentence frames were used but doesn't show you what they looked like at scale. The appendices have four sample WEPs (Worked Example Pairs) but no systematic description of how many frames, what linguistic structures, or how the phrase bank was organized. You get the forest — "sentence frames help ELs organize responses" — but not the trees. For MASL purposes, this gap is exactly where the original contribution lives.

Visual Metaphor

A songbird hatches in a nest it didn't choose. Before it can form a single note, it spends weeks hearing the same pattern from its parent — the same phrase, the same rhythm, the same specific call. It cannot produce it yet. But something is being inscribed: the neural template that will govern its voice for the rest of its life. Then one morning it opens its bill and out comes the song. Not a rough draft. The actual song. Learned before it could be spoken. The scaffold wasn't training wheels. It was the template itself.

What This Is Really About

You've heard the argument before: kids learn by doing, not by being shown. Worked examples are "passive." Sentence frames are "crutches." Let students figure it out themselves and they'll develop real understanding. It's an appealing idea. It's also, for students juggling a second language and algebra at the same time, often wrong.

Ke and Newton took the existing research on comparing worked examples — which has a solid track record in mainstream algebra classrooms — and asked the obvious question no one had gotten around to asking: does this work for English Learners (ELs) too? And if so, what modifications do you actually need to make?

The Core Idea

Modified for Language Support–Worked Example Pairs (MLS-WEPs) are the standard Worked Example Pair (WEP) curriculum adapted specifically for ELs in sheltered algebra classes. The core structure is unchanged: you put two worked examples side by side, and students compare them. What changed is everything around that comparison:

The four types of WEPs used in this study — which correspond directly to different comparison purposes — are:

  1. Which is better? — Two correct methods, one more efficient; students determine which is better in which circumstances
  2. Why does it work? — Two correct methods, one showing the conceptual rationale; students explain the why, not just the how
  3. Which is correct? — One correct and one incorrect method; students identify the error and explain it
  4. How do they differ? — Two different problem types solved similarly; illuminates structural mathematical features students often conflate

What They Found

The study ran across two algebra units — Linear Equations (Unit 1) and Functions (Unit 2) — using a waitlist design: Teacher 1's students got the MLS-WEPs intervention in Unit 1 while Teacher 2's were the control, then they switched for Unit 2. This meant both teachers were in both conditions, and both groups eventually received the intervention.

The main findings, controlling for prior knowledge:

Why This Challenges the Status Quo

Math teachers who work with ELs are frequently told to "simplify" — reduce language demands, provide more computation, skip the explanation. The implicit assumption is that language is the obstacle and mathematics is the real goal. Ke and Newton's data suggest the opposite: structured language about mathematical procedures is itself the mechanism through which ELs develop both procedural skill AND conceptual understanding. The sentence frames didn't carry the students over the mathematical wall. They were the mechanism through which students scaled it.

There's also the proficiency-independence finding, which matters enormously for classroom organization. If MLS-WEPs works approximately equally well for Level 1 and Level 5 ELs (controlling for prior math knowledge), then teachers don't need to run separate interventions for different proficiency groups. One well-designed activity serves the room.

The Cognitive Load Story

The theoretical engine here is cognitive load theory (CLT). When ELs learn algebra in English, they're running two cognitively demanding processes simultaneously: (1) solving math problems and (2) decoding, comprehending, and producing mathematical language in a non-native tongue. Working memory is small and shared. Worked examples reduce the problem-solving load by showing the solution, freeing cognitive resources for the comparison task. Sentence frames reduce the language-production load, freeing resources for the mathematical reasoning. The two scaffolds work on different dimensions of cognitive demand — that's not a coincidence, it's the design logic.

The Big Picture

This is the first study to test worked example comparison in a sheltered ESOL algebra setting. Before this paper, you could argue (from the literature) that worked examples help with algebra, and that sentence frames help ELs, but you couldn't cite direct evidence that their combination works in actual ESOL sheltered algebra classes. Now you can. That's not nothing — it's the empirical foundation for any structured language-in-mathematics intervention aimed at secondary ELs. MASL sits squarely in this research space.

🔬 Evidence Audit

Study Snapshot

Study TypeQuasi-experimental (waitlist crossover design — not a true RCT; assignment to condition was by teacher/class, not by individual student randomization)
PopulationN = 78 ELs in sheltered algebra classes (47 from Teacher 1, 31 from Teacher 2); grades 9–11, predominantly 10th grade; WIDA proficiency levels 1–5; primary languages Spanish (47%), Portuguese (11%), French (20%), Chinese (9%), other; large urban K-12 district in Philadelphia area; study conducted during COVID-19 (all virtual instruction); year: 2020–2021 estimated
InterventionMLS-WEPs (Modified for Language Support–Worked Example Pairs) — supplemental curriculum with sentence frames, simplified prompt language, example-before-definition sequencing, and native language permission; implemented by classroom teachers after topics were introduced; 2 units: Linear Equations and Functions; variable implementation timing (opening activity, example, or closing activity at teachers' discretion)
Control / ComparisonActive comparison — NOT a no-instruction control. Control group received the same mathematical examples and the same language supports as the treatment group, taught via standard teacher-modeled step-by-step instruction ("business as usual"). The key difference was the comparison structure and sentence frames, not the presence of language support.
Outcome MeasuresResearcher-designed pre/post unit assessments with two components: (1) calculation accuracy (1 point per correct item) and (2) explanation quality (0–1 scale: fully correct = 1, partially correct = 0.5, incorrect = 0). Secondary qualitative coding: 6-category explanation rubric (fully correct, partially correct, conceptually relevant but incorrect, irrelevant, uninterpretable, blank). Cronbach's α = 0.70–0.84 across units/components. Inter-rater reliability > 85% on one-third of data.
Duration + Follow-upTwo algebra units (Linear Equations, Functions); no long-term follow-up after intervention ended; the Unit 2 pretest provided one transfer measure for Unit 1 intervention students
Funding / COIFunding source not disclosed in the article. No conflicts of interest declared. One author (Ke) is from the participating School District of Philadelphia; the other (Newton) is from Temple University. Ke's institutional affiliation with the district is a potential source of bias worth noting.

Evidence Quality

  • ⚠️ Sample size adequate — N = 78 is small for regression analysis across two units with multiple predictors. The authors acknowledge this as a limitation and call results "preliminary." Effect sizes are large (f² = 0.96–2.03), which helps, but replication with larger N is essential before treating these as stable estimates.
  • ⚠️ Groups comparable at baseline — Not randomized; condition was assigned by teacher. The control group started with significantly higher explanation scores at Unit 1 pretest (p = .020, d = 0.585), which the authors controlled for statistically. Appropriate remedy, but the baseline imbalance means the treatment and control teachers' classrooms were genuinely different — and those differences may extend beyond measured prior knowledge.
  • Attrition handled properly — Two students (from Teacher 1's class) who missed more than 50% of instructional time were excluded from analysis; this is disclosed and the exclusion rule is pre-specified and reasonable. No other reported attrition. Final N = 78 from 80 original participants — minimal and transparent.
  • ⚠️ Outcome measures validated — Researcher-designed assessments, not externally validated measures. Internal consistency is acceptable to good (α = 0.70–0.84). No test of construct validity beyond internal consistency. For preliminary research this is acceptable; for policy-level claims it would need external validation.
  • Effect sizes reported — Cohen's f² and power are reported for all regression analyses. Effect sizes are large by any convention (f² = 0.96 is very large). The explanation component shows especially strong effects (f² = 2.03 for Unit 1). Statistical significance and practical significance align here.
  • Pre-registration or published protocol — No pre-registration mentioned. Given the COVID context and crossover design, this is understandable but still a gap. We cannot rule out that some analytical choices (e.g., the decision not to use HLM, the choice of CLT threshold) were made after observing data patterns.
  • ⚠️ Funding independent of findings — Funding source undisclosed. One author is affiliated with the school district that hosted the study. Not a disqualifying conflict, but worth flagging: district employees have institutional reasons to report positive outcomes from district-supported programs.

⚑ Red Flags & Questionable Logic

What happened: The waitlist/crossover design means each teacher was both treatment and control, but at different times for different units. This is explicitly acknowledged (p. 839) but the discussion doesn't fully address a genuine threat: the teacher who served as treatment in Unit 1 may have transferred some comparison techniques to her Unit 2 control instruction — and the lesson recordings were used to verify this did NOT happen (p. 840). The authors address this head-on, which is good, but the verification relies on teacher self-report behavior in recordings, not blind coding of pedagogical strategy.

Why it matters: If the Unit 2 "control" teacher had absorbed some comparison orientation from Unit 1, it would deflate the apparent treatment effect in Unit 2 — meaning the true effect might actually be larger than reported. But it could also mean contamination in the other direction. The design doesn't allow us to cleanly separate teacher effect from intervention effect.

The correct approach: Future replications should use a parallel design (different teachers in treatment and control simultaneously for the same unit) rather than a crossover. The crossover was likely a pragmatic choice given school constraints, but it confounds teacher and condition.

What happened: English proficiency was measured via student self-report, not official ACCESS (WIDA assessment) scores (p. 875). The authors explain that ACCESS data was unavailable due to COVID staffing issues, and that teacher review and adjustment were used to validate the self-reports.

Why it matters: The second research question — whether effectiveness varies by English proficiency — relies entirely on the accuracy of this measure. Self-reported proficiency on a 1–5 scale likely conflates multiple dimensions and may reflect students' confidence rather than their actual proficiency level. The proficiency-independence finding (one of the paper's strongest claims) rests on this compromised variable.

The correct approach: Replication should use official WIDA ACCESS scores or equivalent standardized proficiency measures. The authors are transparent about this limitation; it doesn't invalidate the finding but substantially weakens confidence in the proficiency-independence claim specifically.

Where More Evidence Is Needed

  • Replication: This is a single study with N = 78, two teachers, one district, conducted during COVID (all virtual). It needs replication with larger samples, in-person conditions, and multiple districts before any strong causal claims hold.
  • Population gaps: The study used mixed-proficiency sheltered classes. Newcomers (Level 1 only), long-term ELs, and students with interrupted formal education were all present but not analyzed separately. Effects may vary substantially across these subgroups.
  • Duration: No follow-up data beyond the Unit 2 pretest transfer measure. Do the written explanation gains persist at the end of the school year? Into the following year? The intervention's long-term value is entirely unknown.
  • Mechanism: Was it the sentence frames? The comparison structure? The native language permission? The example-before-definition sequencing? The study shows the package works; it does not isolate which ingredient is the active one. For MASL design purposes, this is a critical gap — if the frames don't drive the effect and the comparison does, that has design implications.
  • Implementation fidelity: Teachers had flexibility to implement at any point in the lesson (opening, example, or closing) and did not use small groups (unusual for WEPs). Real-world implementation with prescribed timing and partner work might produce different effects.
  • Spoken vs. written language: All assessment was written. The study measured written explanations and calculation accuracy. Whether MLS-WEPs improves spoken mathematical language — which is MASL's specific target — remains completely unmeasured and unknown.

Key Vocabulary

Terms used centrally in the article, sorted A–Z.

Cognitive Load Theory (CLT)
Simply: Your brain's working memory is like a tiny desk — you can only have a few things on it at once. Instruction that piles too much on the desk causes failure, not learning.
A framework developed by Sweller (1988, 1989) holding that working memory has severely limited capacity (roughly 4 novel elements simultaneously, retained for ~30 seconds), while long-term memory is effectively unlimited. Effective instruction manages cognitive load by reducing extraneous demands so working memory can focus on schema formation. For ELs in algebra, CLT predicts that dual language processing and content demands compete for the same limited working memory.
English Learner (EL)
Simply: A student in a U.S. school whose home language isn't English — often still developing academic English proficiency while simultaneously learning content in that language.
Any student in a U.S. public school setting whose native language is not English (Kersaint et al., 2008). In this study, ELs are enrolled in sheltered algebra classes and have WIDA proficiency levels 1–5. The study population is predominantly Spanish-speaking (47%), with significant French, Portuguese, and Chinese L1 populations. ELs are historically underrepresented in STEM fields.
ESOL (English to Speakers of Other Languages)
Simply: The program setting — a class explicitly designed for students still developing English; not a general education or mainstream classroom.
Refers to both the pedagogical approach and the classroom setting in which English language development is integrated with content instruction. In this study, ESOL context refers to sheltered algebra classes composed entirely or predominantly of ELs. The article examines whether worked example comparison — previously only studied in mainstream settings — transfers to an ESOL context.
Mathematics Literacy
Simply: The ability to read, write, speak, and listen mathematically — not just the ability to calculate, but to communicate about mathematics across all four language domains.
As used in this article, mathematics literacy encompasses the four language domains — listening, reading, writing, and speaking — as they apply to mathematical content. This operationalization treats language proficiency as integral to mathematical proficiency, not as a separate skill developed elsewhere. The MLS-WEPs intervention specifically targets written explanation as a component of mathematics literacy.
MLS-WEPs (Modified for Language Support–Worked Example Pairs)
Simply: A worked example curriculum built for ELs — two solved problems shown side by side for comparison, with sentence frames, simplified question language, and permission to use home languages added on top of the standard format.
The intervention studied in this paper, adapted by Ke and Newton from Durkin et al. (2023)'s standard WEP curriculum. Modifications include: sentence frames to scaffold written and oral explanations, simplified prompt language, example-before-definition sequencing, and explicit native language permission. MLS-WEPs are used as supplemental materials, implemented after corresponding topics have been introduced in the main curriculum.
Procedural Knowledge
Simply: Knowing how to do it — the step-by-step execution of mathematical procedures, like solving a linear equation by applying the correct sequence of operations.
The ability to perform a series of actions to solve a problem, including transferring a known procedure to a new problem (Rittle-Johnson & Star, 2007). Distinguished from conceptual knowledge (understanding why). The Unit 1 assessment (linear equations) measured predominantly procedural knowledge. The Unit 2 assessment (functions) measured both procedural and conceptual knowledge.
Conceptual Knowledge
Simply: Knowing why it works — the understanding of mathematical ideas, relationships, and structure, not just the execution of steps.
The ability to explain understanding of concepts in a field and the interconnections between those concepts or ideas (Star et al., 2015). In this study, conceptual knowledge was assessed by asking students to determine and explain whether a given representation is a function, which requires understanding the definition and applying it to varied representations, not executing a procedure.
Sentence Frames
Simply: Partial sentences with intentional blanks — like a Mad Libs for math discourse — that give students a grammatical launching pad so they can focus on the content they're expressing, not the English grammar they're struggling with.
Syntactic structures that provide grammatical scaffolding for students' written and oral mathematical communication. Example: "The similarity between the two methods is ___." Sentence frames allow ELs to focus on mathematical content by relieving the cognitive load of generating grammatical form from scratch (Donnelly & Roe, 2010; Buffington et al., 2017). In MLS-WEPs, frames scaffold comparison descriptions, similarity/difference identification, and reasoned judgment tasks.
Sheltered Math Class
Simply: An algebra class composed entirely or mostly of ELs, where both mathematical content and English language development are taught together — a separate track designed to transition students into general education courses.
An instructional approach in which ELs are taught content (here, algebra) with simultaneous English language development support. In the study's district, sheltered classes are organized by proficiency level: Levels 1.0–1.9 together, Levels 2.0–3.5 in sheltered or ESL-friendly settings. The MLS-WEPs curriculum was implemented in sheltered algebra classes where all or most students were ELs.
Waitlist (Crossover) Design
Simply: A study structure where everyone eventually gets the treatment — Group A gets it first while Group B waits and serves as the comparison, then they swap for the next unit. No one is left out, which is more ethical, but it complicates the analysis.
A quasi-experimental design in which all participants eventually receive the intervention, with the control group receiving it in a later phase. In this study, Teacher 1 implemented MLS-WEPs in Unit 1 (Teacher 2 was control) and Teacher 2 implemented MLS-WEPs in Unit 2 (Teacher 1 was control). This design is used when withholding treatment entirely is ethically or practically problematic, but it confounds teacher identity with treatment condition.
Worked Example Pairs (WEPs)
Simply: Two solved math problems shown side by side, where the whole point is comparing them — you're not supposed to just absorb one right answer; you're supposed to look at both and figure out what's different and why.
A curriculum design in which two worked examples (step-by-step solutions) are presented simultaneously for analogical comparison. Based on Rittle-Johnson and Star (2007)'s finding that comparing two worked examples facilitates algebra learning more than studying each sequentially. WEPs come in four types: Which is better? Why does it work? Which is correct? How do they differ? Each comparison type promotes different aspects of mathematical understanding.

🎯 MASL Connection

This Study Supports:

  • Worked Example: Language Frames (most direct support): Ke & Newton provide the only existing empirical study of worked examples combined with sentence frames for secondary ELs in algebra — the exact population and the exact structural design of MASL's Language Frames activity. The effect on written explanation quality (d ≈ 0.70, f² = 1.44–2.03) directly supports MASL's claim that structured language prompts during worked examples improve ELs' ability to articulate mathematical reasoning. The sentence frames in MLS-WEPs scaffold comparative discourse about procedure ("The similarity between the two methods is ___"); MASL's frames scaffold notation-register discourse ("I read this symbol as ___ because ___").
  • The proficiency-independence claim: MLS-WEPs worked approximately equally for ELs at proficiency levels 1–5 (controlling for prior math knowledge). This supports MASL's position that notation-language instruction is appropriate for all secondary ELs, not just those with higher English proficiency — and more broadly, that structured language frames are not remedial but a mechanism all students benefit from.
  • Transfer of explanation skills across mathematical concepts: Students who received the Unit 1 intervention showed significantly higher explanation scores on the Unit 2 pretest for a completely different mathematical topic. This supports MASL's design assumption that language frames for notation are generative skills, not topic-specific drills — once students have the register for describing symbolic operations, that skill is available across content domains.

Design Implications:

  • Phrase bank framing matters more than size alone: MLS-WEPs sentence frames were discourse-oriented ("The similarity between...," "I chose this method because...") — they scaffold comparative reasoning, not just fill-in-the-blank vocabulary. MASL's phrase bank of 9 terms should similarly foreground reasoning structure. The MASL capstone already responds to Barko-Alva & Chang-Bacon's overframing critique by targeting the reasoning register, not the conclusion register; Ke & Newton provide indirect support for that design choice.
  • Example-before-definition sequencing: MLS-WEPs explicitly sequenced a student-generated example before the conceptual explanation (e.g., "give an example of like terms" before "define like terms"). MASL could adopt this for notation-language instruction: ask students to write how they'd say a symbol before showing the MASL standard form. This primes the irregular-form correction and creates the "disorienting dilemma" Mezirow describes.
  • Native language permission: MLS-WEPs allowed and encouraged students to reason in their home language before presenting in English. MASL's Language Frames activity should explicitly permit this — not as accommodation but as cognitive strategy. Students reasoning through why "f(x) is NOT f times x" may do that reasoning most efficiently in Spanish, Portuguese, or Mandarin.
  • Supplemental, not replacement: MLS-WEPs were implemented after the corresponding mathematical topic was introduced. MASL's Language Frames should follow the same logic — introduce the notation in context first, then deploy the frame activity to consolidate the spoken register. The frame is not the introduction; it's the consolidation mechanism.

Evidence Strength for MASL:

This is the strongest single empirical study in MASL's research base — population match is excellent (secondary ELs in algebra), intervention structure is directly analogous (worked examples + sentence frames), and the outcomes include both procedural performance and explanation quality. However, the key gap is critical: MLS-WEPs targets mathematical discourse broadly (explaining procedures, comparing methods, describing concepts) — not spoken notation specifically. None of the sentence frames in the appendices target the spoken register of algebraic symbols. There is no measurement of how students name "f(x)" or "x²" or "±." The evidence bridges to MASL's Language Frames activity at the structural level (worked examples + sentence frames + EL population = positive outcomes) but does not bridge to MASL's specific claim about spoken notation being a distinct, teachable target. That gap is MASL's original contribution — and Ke & Newton's silence on it is exactly what justifies a new study.

Connections to MASL Framework (click to expand)
  • MASL Trio (Math / We Say / Meaning cards): The WEP comparison structure — particularly "Why does it work?" and "How do they differ?" types — activates the same analogical reasoning the card sort requires. Students in WEPs had to explain why two methods work; card sort students explain why "x squared" and "x to the second power" mean the same thing. Structural parallel, different linguistic target.
  • Sentence frames: Most direct parallel. MLS-WEPs sentence frames scaffold comparative discourse; MASL frames scaffold notation-reading discourse. Both share the design logic of providing grammatical structure so cognitive resources go to content reasoning. The key difference: MLS-WEPs frames are about procedure, MASL frames are about symbol-to-speech mapping.
  • Irregular forms instruction: MLS-WEPs does not address notation irregularity. None of the four appendix examples show frames targeting f(x) vs. "f times x" or x² vs. "x to the 2." This is the gap MASL fills — Ke & Newton give you the method; MASL gives you a target the method has never been aimed at.
  • Scaffolding fading: MLS-WEPs does not describe a fading plan. Sentence frames were present throughout both units. MASL's fading protocol (full → partial → blank over Lessons N to N+4) is a design upgrade Ke & Newton did not test, though the expertise reversal effect literature supports it strongly.

💬 Key Quotes

Copy-paste ready quotes for papers, discussions, and the MASL capstone.

"The results indicated that worked example comparison not only enhanced ELs' ability to solve mathematical problems, but also improved their written explanation skills and enabled them to transfer such skills to different mathematical concepts."
p. 847 Thesis
Why this quote: The paper's core empirical finding in one sentence — use as the primary citation whenever MASL claims that structured language activities transfer across mathematical domains.
"The general effectiveness of the MLS-WEPs intervention did not appear to vary by ELs' English language proficiency. This is an exciting finding because it demonstrated that learners with low English language proficiency can also benefit from comparison in mathematics before they develop their English proficiency."
p. 850 Data
Why this quote: Directly counters the common assumption that structured language activities require high English proficiency to be beneficial — critical for MASL's claim that the Language Frames activity serves the full EL population.
"Sentence frames enable ELs to focus more on the academic content, because they do not need to think about how to form answers on their own. In addition, ELs are often confused about what is being asked, and a sentence frame can provide clarification."
p. 836 Practical
Why this quote: The functional rationale for sentence frames — they reduce language production load, freeing working memory for content reasoning. Direct theoretical support for MASL's Language Frames design logic.
"Teachers tend to use computational tasks and fail to provide opportunities for students to explain solutions/rationales when teaching math to ELs and other marginalized groups."
p. 831 Challenge
Why this quote: Names the status quo the study is working against — and the practice MASL must also displace. Useful for framing the problem statement in the capstone introduction.
"Learning mathematics in a foreign language is more likely to result in a heavier cognitive load on learners, and implementing an instructional method that reduces ELs' cognitive load in WM is imperative. The goal is not to reduce the cognitive demands of the task, but to help ELs fully utilize their bilingual strengths to learn mathematics."
p. 833 Foundational
Why this quote: Articulates the crucial distinction between dumbing down and scaffolding — the intervention reduces cognitive load on language production, not on mathematical thinking. This is exactly the argument MASL makes about notation-language frames.
"It appears that once ELs knew how to perform written explanations, this skill was maintained. The findings suggest that the written explanation skills were transferable across mathematical concepts. However, why and how these skills can be transferred to different concepts remains unexplained."
p. 848 Data
Why this quote: Acknowledges both the transfer finding and its mechanism mystery — intellectually honest, and opens the door for MASL to theorize the mechanism (notation-register acquisition as a generalizable language register, not topic-specific vocabulary).
"MLS-WEPs is a supplemental curriculum for ELs that requires them to compare worked examples in order to deepen their understanding of mathematics through rich and structured mathematical discourse and writing."
p. 832 Definition
Why this quote: Clean definitional quote — useful for introducing MLS-WEPs in a literature review when explaining how MASL's Language Frames activity is positioned relative to the existing research base.

📚 References & Further Reading

Key sources from the paper's reference list, assessed for MASL relevance.

Core Worked Example Research (WEPs Lineage)
Rittle-Johnson, B., & Star, J. R. (2007). Does comparing solution methods facilitate conceptual and procedural knowledge? Journal of Educational Psychology, 99(3), 561–574.
Foundational

What it is: The original experimental study establishing that comparing two worked examples in algebra produces learning gains beyond sequentially studying each. Tone: Technical experimental report. Why it matters: The entire WEP lineage — including MLS-WEPs — rests on this finding. Buzz: Highly cited (800+); foundational to all subsequent WEP research. Verdict: Required background for any worked example citation; dense but the results section is readable and the implications are clear.

Durkin, K., Rittle-Johnson, B., Star, J. R., & Loehr, A. (2023). Comparing and discussing multiple strategies: An approach to improving algebra instruction. Journal of Experimental Education, 91(1), 1–19.
Must Read

What it is: The most recent update to the WEP curriculum that MLS-WEPs adapted. Tone: Accessible experimental report with practical implications. Why it matters: If you're citing Ke & Newton, you should also know what the base curriculum looks like — this is it. Verdict: Read before presenting MLS-WEPs in a capstone literature review.

Barbieri, C. A., Booth, J. L., Begolli, K. N., & McCann, N. (2021). The effect of worked examples on student learning and error anticipation in algebra. Instructional Science, 49, 419–439.
Worth Reading

What it is: Recent study extending worked examples to error anticipation in real algebra classrooms. Tone: Standard empirical report. Why it matters: Part of the "AlgebraByExample" strand directly relevant to MASL's Suggest Improvements activity. Verdict: Worth reading if building the evidence base for erroneous examples.

Cognitive Load Theory (CLT) Core References
Sweller, J. (1988). Cognitive load during problem solving: Effects on learning. Cognitive Science, 12, 257–285.
Foundational

What it is: The paper that introduced cognitive load theory — working memory limits, intrinsic vs. extraneous vs. germane load. Tone: Dense theoretical; early 1980s cognitive psychology style. Why it matters: The theoretical engine for all worked example research. Verdict: Skim the introduction and conclusions; the specific findings on problem-solving vs. worked example studies are what matter for MASL.

Paas, F., Renkl, A., & Sweller, J. (2004). Cognitive load theory: Instructional implications of the interaction between information structures and cognitive architecture. Instructional Science, 32, 1–8.
Quick Read

What it is: A brief accessible synthesis of CLT principles for instructional designers. Tone: Relatively accessible overview. Why it matters: Good entry point for CLT if you need to explain the theoretical basis for MASL scaffolding without reading the dense originals. Verdict: 8 pages — read this before citing CLT.

Mathematics Learning for English Learners
Moschkovich, J. N. (2015). Academic literacy in mathematics for English learners. Journal of Mathematical Behavior, 40, 43–62.
Must Read

What it is: Moschkovich's synthesis of Academic Literacy in Mathematics (ALM) framework — one of MASL's eight core theoretical anchors. Tone: Accessible theoretical synthesis. Why it matters: Establishes that mathematical proficiency = mathematical practices + discourse + content inseparably; grounds the claim that language instruction benefits all students. Verdict: Required for MASL capstone.

Kersaint, G., Petkova, M., & Thompson, D. R. (2008). Teaching mathematics to English language learners. Routledge.
Worth Reading

What it is: Comprehensive practitioner-oriented book on math instruction for ELs; heavily cited in Ke & Newton. Tone: Practitioner-friendly with research backing. Why it matters: Provides the EL math instruction framework underlying MLS-WEPs design decisions. Verdict: Worth having as a reference; not necessary to read cover-to-cover for MASL.

Donnelly, W., & Roe, C. (2010). Using sentence frames to develop academic vocabulary for English learners. The Reading Teacher, 64(2), 131–136.
Quick Read

What it is: Short practitioner-facing article on sentence frames for EL academic language. Tone: Accessible; designed for classroom teachers. Why it matters: Primary citation Ke & Newton use to justify sentence frame design — useful if you need to defend the sentence frame choice in MASL. Verdict: 5-minute read; worth having in your citation toolkit.

Buffington, P., Knight, T., & Tierny-Fife, P. (2017). Supporting mathematics discourse with sentence starters and sentence frames. EDC.
Worth Reading

What it is: EDC practitioner guide on sentence frames specifically for mathematics discourse. Tone: Practitioner guide. Why it matters: The most practice-oriented source on math-specific sentence frame design — directly applicable to MASL phrase bank construction. Verdict: Read if designing the specific sentence frames for MASL Language Frames activity.

🧠 Quiz — Test Your Understanding

Six conceptual questions about the ideas — not the statistics.

1. Why did Ke and Newton add sentence frames to the standard Worked Example Pairs (WEPs) design for their EL version?

2. One of the study's most striking findings was that MLS-WEPs effectiveness "generally did not vary by English language proficiency." What does this mean for instructional design?

3. Students who received the Unit 1 MLS-WEPs intervention scored significantly higher on the Unit 2 explanation pretest — before any Unit 2 instruction. What does this transfer finding suggest?

4. The four types of Worked Example Pairs (WEPs) — "Which is better?", "Why does it work?", "Which is correct?", and "How do they differ?" — each serve different purposes. Which type is MOST focused on developing conceptual understanding rather than procedural flexibility?

5. The study found that English language proficiency DID significantly affect explanation scores in Unit 2 (Functions) but NOT in Unit 1 (Equations). The authors hypothesize this is because Unit 2 assessed conceptual knowledge while Unit 1 assessed mostly procedural knowledge. What's the instructional implication?

6. The study showed that the quality of MLS-WEPs written explanations improved substantially in the treatment group. "Blank" responses dropped from 45% to 9% in Unit 1. But the "uninterpretable explanation" category increased in the treatment group and decreased in the control group. What's the most likely explanation for this unexpected pattern?

🔬 Research Quiz

Six questions about the study design — not the content. Can you read past the authors' framing?

1. The study uses a "waitlist crossover design." What type of study is this, and what does that mean for causal claims?

2. What was the actual population studied, and who is notably absent from this sample?

3. What did the control group actually receive? This is important because it determines how large a comparative advantage the treatment really represents.

4. English language proficiency was a key variable in the study's second research question. How was it measured, and why does this matter?

5. The regression analysis for Unit 1 found Cohen's f² = 0.96 (calculation) and f² = 2.03 (explanation). How do you interpret these effect sizes in practical terms?

6. [Red Flag] At the beginning of Unit 1, the control group scored significantly higher on the explanation pretest than the treatment group (p = .020, d = 0.585). The authors controlled for this statistically and found treatment effects at posttest. What's the methodological concern, and why does it matter beyond the statistical control?

🃏 Match the Concepts

Drag each term from the left column to its matching description on the right.

Terms & Concepts

Four WEP comparison types
Cognitive Load Theory
Sentence frames
Transfer finding
Proficiency independence
Waitlist crossover design
MLS-WEPs
Blank → attempted explanation
Example-before-definition
Conceptual vs. procedural gap

Descriptions

Which is better? / Why does it work? / Which is correct? / How do they differ? — each targeting different types of mathematical comparison
Framework predicting that ELs face heavier cognitive demands because working memory must handle math content AND language processing simultaneously
Partial sentence structures that reduce language-production cognitive load so students focus on mathematical content
Unit 1 intervention students scored significantly higher on Unit 2 explanation pretest — a different mathematical topic — without any Unit 2 instruction yet
MLS-WEPs effectiveness (controlling for prior math knowledge) did not vary by EL proficiency level — one design served the full range
Each teacher was both treatment and control in different units; ethical (everyone gets the intervention) but confounds teacher identity with condition
Standard WEP curriculum adapted for ELs with sentence frames, simplified prompts, native language permission, and example-before-definition sequencing
The striking reduction in unanswered items in the treatment group — students were motivated to attempt explanations even before they had the tools to do them well
MLS-WEPs design choice: ask students to provide a mathematical example (e.g., "name like terms") before asking for the definition or explanation
Proficiency mattered more for Unit 2 (functions, conceptual) than Unit 1 (equations, procedural) — explaining why frames for conceptual tasks need extra precision

Replication

✅ What They Got Right

  • Active, not passive, comparison condition. The control group received the same language supports and the same mathematical examples — just via traditional instruction. This design isolates the comparison structure as the active ingredient, rather than confounding "any language support vs. none." This is methodologically stronger than most intervention studies that compare against truly unsupported instruction.
  • Transparent about limitations. The small N, self-reported proficiency, COVID conditions, and virtual-only implementation are all named explicitly as constraints. The authors call findings "preliminary" and explicitly invite replication. This intellectual honesty is rare and should be cited when using this study — the claims stay within the data.
  • Transfer measure built into the design. The waitlist structure accidentally created a transfer test: Unit 1 intervention students' performance on the Unit 2 explanation pretest provides a cross-topic transfer measure that most worked example studies don't include. This produced one of the study's most valuable findings at no additional cost.
  • Six-category explanation rubric with inter-rater reliability. Rather than binary correct/incorrect, the explanation coding captured qualitative variation in student responses — blank, irrelevant, uninterpretable, concept-relevant-but-incorrect, partially correct, fully correct. The > 85% inter-rater reliability and resolution-through-discussion protocol are appropriate for this type of rubric.
  • Pilot study with student feedback before finalizing design. The MLS-WEPs modifications (sentence frames, simplified prompts, native language permission) were based on a pilot study with EL students in one-on-one tutorials, not researcher assumptions. This grounding in actual student experience is the right design sequence.

🔧 Suggested Improvements

  • Use official WIDA ACCESS scores, not self-reported proficiency.Why: The entire second research question (does effectiveness vary by proficiency?) depends on the validity of the proficiency measure. Self-reported scores during COVID are the weakest possible proxy; ACCESS scores would allow genuine proficiency-subgroup analysis and make the proficiency-independence finding publishable as an established finding rather than a preliminary one.
  • Implement partner/small-group work as the WEP design originally specifies.Why: The standard WEP model involves paired student comparison and discussion — this was abandoned because teachers were uncomfortable with virtual small groups. MASL and most real-world implementations will use partner work; a replication under in-person conditions with partner structures would better represent how the intervention actually functions.
  • Add a spoken language outcome measure alongside the written one.Why: MLS-WEPs targets four language domains (reading, writing, listening, speaking) but only writing was measured in the assessments. Spoken mathematical explanation — the specific target of MASL's Language Frames activity — remains unmeasured. An audio or video-coded oral explanation task would capture whether the benefits extend to spoken register.
  • Use parallel groups (same unit, same time, different teachers) rather than crossover.Why: The crossover design confounds teacher identity with condition — Teacher 1's and Teacher 2's classrooms were not equivalent at baseline (d = 0.585 on explanation pretest). A parallel design with multiple teacher-pairs randomly assigned to condition within the same instructional unit would provide cleaner causal inference.
  • Systematically document and vary frame design features (number of frames, frame type, fading schedule).Why: The current study treats sentence frames as a single ingredient; it doesn't tell us how many frames, what linguistic structures, or whether fading frames after initial exposure changes outcomes. For MASL design, these are the critical parameters — a dismantling study would isolate which frame features drive the explanation quality gains.