Education Research and The Problem of a Single Study in Isolation

Recently, I have had some conversations online about grading practices where my use of grades and rubrics has been challenged. Most often my own defense of the letter grade or 4 point scale is often shot down by just one study: Ruth Butlers 1998 study on intrinsic motivation. This is the study that shows that feedback without a grade, just feedback alone, is the most beneficial and impactful for student learned. Now it is important to note that I absolutely love this study.

But this study, like most educational research studies, looks in isolation at one component of teaching and does not take into consideration everything else. One of the big red flags with this study is that it looks at the impact of grades, grades with comments, and comments alone have on student motivation "when no further evaluation was anticipated." It is not looking at grades and comments in a cycle of inquiry or as part of the process of the feedback loop, it is not looking at formative assessments versus summative assessments, it looks at these three feedback systems when students no longer anticipate they will be evaluated on their work and so this becomes a really tricky component of research to implement effectively and it makes me ask the question "well, what about when they DO anticipate another evaluation?"

A lot of people have taken this research to mean that the grade cannot be issued until the end of the grading term because it is demotivating to students but this always feels like we are making grades the enemy instead of taking advantage of an opportunity to empower students to understand and reflect on the progress they are making and the growth they are experiencing. When I think about how to implement the Butler research in my own classroom, it doesn't make me want to throw out grades but it does want me to improve on specific, actionable, & measurable feedback I can give my students throughout the learning process.

In a recent conversation, I was asked to show the research on my own grading practices and I took this as an opportunity to sit down and ensure that what I do has research to back it. I looked back through the major books I read that inspired me to make changes (Vatterott, Wormeli, Marzano, and Fisher & Frey to name a few). And while I feel like I need to dig deeper into the research behind the practices I also feel the need to explain what I know about the big practices that are a part of my classroom. So here are three big things I believe in and do in my classroom and some of the research to support it.

Separating Out Academic and Non-Academic Feedback 

This is one of the concepts I am most passionate about, non-academic behaviors are so often a part of the grade and this does absolutely nothing for learning and instead corrupts the letter grade from a measure of learning to a measure of compliance. By going standards-based with clear learning targets and rubrics to show mastery for their grade level I am able to ensure the feedback they get on learning is actually about the learning while the feedback they get on behaviors in measured through a citizenship grade and rubric that never becomes a part of their academic achievement grade.

Marzano Research lists a study in math classes where they separated out academic and non-academic feedback and the effect size was 9.25.

This practice alone does not make good teaching, teaching is not just one good practice on repeat. It is the blend of sound research-based practices and it requires teachers to know their students and know their content to make the most of the research.

Scoring Guides and Feedback 

In multiple studies conducted by Marzano Research, the use of scoring guides or rubrics mixed with feedback has proven to have a substantial positive effect. A study of science classrooms showed an effect rate of 3.6. Another study with math teachers showed an impact of 3.2 and a third math study just on the impact of the use of a scoring guide had an effect size of 2.4. Additionally, studies on feedback alone done by Marzano without scoring guides or other learning strategies measured as part of the study had smaller effect sizes. A study on specific vs. non-specific feedback in math had an effect size of 1.87.

The reality is that it is never just one practice in isolation that we need to look at in our classroom but how the practices work collectively to help students learn. This research shows there is a lot of work to still be done to determine the best approach to feedback and scoring guides or rubrics but I think it is too early to throw out the practices when evidence shows they can have a significant positive impact.

Learning Goals 

In a study on repeated learning goals, it showed an effect size of 2.18 on student learning. In a study of Language Arts classrooms, the use of learning goal tracking folders by students had an effect of 1.51. In my own personal narrative experience, the ownership students take of their learning with the use of learning goals and reflection is one of the most impactful practices I have tried but I have tried it in combination with the above practices so it is hard to say if on it's own this practice is the most impactful or if it is the creation of a Standards-Based learning environment that makes all of these practices combined so meaningful to student learning.

The Problem With Educational Research

For almost every study I listed you can find one that shows a negative effect of the same practices. So how do we as educators move forward with the conflicting data? We have to focus on the studies where the impact has the highest effect and attempt to further replicate and study those practices more. For example, when we talk about studies on "feedback" that is way too vague of a term and each study could be measuring wildly different feedback practices. To really compare studies we need to know when and how was the feedback given? Were students given time to engage with the feedback? Was feedback only narrative? Was feedback only a rubric? And what other practices in the classroom support that strategy or practice that might not be mentioned in the study?

Teaching really is an incredibly intricate blend of science and art. It is important that we engage in conversations on what works and what doesn't both from the lense of research and from the lense of what we do and see in our own classrooms. If we know that feedback without comments is one of the strongest tools we have for student motivation according to Butler's research, how do we balance that with the research on learning goals, separating behaviors, and scoring guides?

For my classroom, I find value in making sure students know where they stand in their own personal growth. I want them to be able to articulate as easily as I can what they know, what they are struggling with, and what they are expected to do. I could get rid of letters or numbers to do this but in the end, whether you call it an A, 4, or Meets Expectations it is all the same: a measure of accomplishment. The culture I create in my classroom is one of growth, working with feedback to improve, and trying again. Since making these shifts, I haven't seen the same level demotivating impact of letter grades because we talk about grades differently in my classroom. I absolutely still have work to do, I always have that one student asking about extra credit that I have to remind that there is no purpose in extra credit when you can try again as often as you need to show you have met mastery.

Note on the research presented: I used the Marzano Research Database to create this post. I made the decision to use this after looking through the list of books I have read on grading practices and determining that action research from real classrooms is the best starting point in response to the challenge for research. I hope to follow up this post with more research soon but also feel the need to point to this Rick Wormeli piece.


  1. A thoughtful and thought-provoking post! It's challenging me to think through my practices.

    I do summative conferences three times a semester, so students know where they stand. But I have wanted to include more formative assessments between conversations, so students can see week over week how they are doing and what they are learning, and what they need work on (all standards-based, of course).

    A couple elements of your post leave me with questions. This might be more for me than most other teachers, but I still don't know what an effect size is, how one is calculated, and why it matters (Outside of Hattie, do the others you mention use effect size in the way Hattie does?). I have read the opening of Visible Learning for Teachers multiple times too, and I won't pretend I get it at all. I really like your honesty in noting that you're layering of research-backed practices makes it difficult to know what is having the effect, which makes me wonder if there are other variables in those studies you quote that lead to the results.

    Everything you're proposing sounds really good, and I try my best to include those in my classes as well, especially the uncoupling of grades from non-academic, compliance-based activities. These practices really help me and my students relate around their learning in a completely different manner, one that helps students build their confidence in academic pursuits.

    1. Not sure why this listed me as unknown, but this is Jeffery E. Frieden @MakeThemMastrIt.

    2. Here's a bit more info on effect sizes: I think it could be interesting to go back and review Hattie's work again with a more critical statistical eye. I find it hard to believe he was able to compile tight, matching educational contexts in a meta-analysis. I'd also like to review the significance of each effect size.

    3. If you use the Marzano Research Database it breaks down effect size for you. "The calculated effect size is the standardized mean difference between treatment and control groups. Stated differently, it is the difference in the average score of the treatment group and the control group stated in standard deviation units."

      Because of the nature of teaching, there is no perfect way to measure these things but positive effect size consistently showing over a series of classrooms using the same practices is the most solid indication we have.

  2. I really appreciate this thoughtful reflection and exposition on our Twitter discussion. If you look at Hattie's work, it is Collective Teacher Efficacy that has the highest result on a child's success in school--even greater than poverty, feedback, or any other teaching strategy. This is critical to our understanding of what helps students succeed.

    Though I think it is far better to avoid grades/marks at all cost and that narrative reports on learning honors a student far better, what matters most is that the a teacher has confidence in his/her practice. It is important that administration has confidence in the teacher. It is important that parents have confidence in the teacher. And, ultimately, it is important that students have confidence in the teacher.

    I have confidence in you and your teaching and it is clear that you have confidence in yourself.


Post a Comment

Thank you so much for commenting! You can also reach out to me on twitter: @mrsbyarshistory

Popular posts from this blog

Grading Practices Mega Post

Why I Am Leaving Canvas LMS for Google Classroom

Hacking the System: Using a 4 Point Scale with a Traditional Online Gradebook