Education Research and The Problem of a Single Study in Isolation
Recently, I have had some conversations online about grading practices where my use of grades and rubrics has been challenged. Most often my own defense of the letter grade or 4 point scale is often shot down by just one study: Ruth Butlers 1998 study on intrinsic motivation. This is the study that shows that feedback without a grade, just feedback alone, is the most beneficial and impactful for student learned. Now it is important to note that I absolutely love this study.
But this study, like most educational research studies, looks in isolation at one component of teaching and does not take into consideration everything else. One of the big red flags with this study is that it looks at the impact of grades, grades with comments, and comments alone have on student motivation "when no further evaluation was anticipated." It is not looking at grades and comments in a cycle of inquiry or as part of the process of the feedback loop, it is not looking at formative assessments versus summative assessments, it looks at these three feedback systems when students no longer anticipate they will be evaluated on their work and so this becomes a really tricky component of research to implement effectively and it makes me ask the question "well, what about when they DO anticipate another evaluation?"
A lot of people have taken this research to mean that the grade cannot be issued until the end of the grading term because it is demotivating to students but this always feels like we are making grades the enemy instead of taking advantage of an opportunity to empower students to understand and reflect on the progress they are making and the growth they are experiencing. When I think about how to implement the Butler research in my own classroom, it doesn't make me want to throw out grades but it does want me to improve on specific, actionable, & measurable feedback I can give my students throughout the learning process.
In a recent conversation, I was asked to show the research on my own grading practices and I took this as an opportunity to sit down and ensure that what I do has research to back it. I looked back through the major books I read that inspired me to make changes (Vatterott, Wormeli, Marzano, and Fisher & Frey to name a few). And while I feel like I need to dig deeper into the research behind the practices I also feel the need to explain what I know about the big practices that are a part of my classroom. So here are three big things I believe in and do in my classroom and some of the research to support it.
Separating Out Academic and Non-Academic Feedback
This is one of the concepts I am most passionate about, non-academic behaviors are so often a part of the grade and this does absolutely nothing for learning and instead corrupts the letter grade from a measure of learning to a measure of compliance. By going standards-based with clear learning targets and rubrics to show mastery for their grade level I am able to ensure the feedback they get on learning is actually about the learning while the feedback they get on behaviors in measured through a citizenship grade and rubric that never becomes a part of their academic achievement grade.
Marzano Research lists a study in math classes where they separated out academic and non-academic feedback and the effect size was 9.25.
This practice alone does not make good teaching, teaching is not just one good practice on repeat. It is the blend of sound research-based practices and it requires teachers to know their students and know their content to make the most of the research.
Scoring Guides and FeedbackIn multiple studies conducted by Marzano Research, the use of scoring guides or rubrics mixed with feedback has proven to have a substantial positive effect. A study of science classrooms showed an effect rate of 3.6. Another study with math teachers showed an impact of 3.2 and a third math study just on the impact of the use of a scoring guide had an effect size of 2.4. Additionally, studies on feedback alone done by Marzano without scoring guides or other learning strategies measured as part of the study had smaller effect sizes. A study on specific vs. non-specific feedback in math had an effect size of 1.87.
The reality is that it is never just one practice in isolation that we need to look at in our classroom but how the practices work collectively to help students learn. This research shows there is a lot of work to still be done to determine the best approach to feedback and scoring guides or rubrics but I think it is too early to throw out the practices when evidence shows they can have a significant positive impact.
In a study on repeated learning goals, it showed an effect size of 2.18 on student learning. In a study of Language Arts classrooms, the use of learning goal tracking folders by students had an effect of 1.51. In my own personal narrative experience, the ownership students take of their learning with the use of learning goals and reflection is one of the most impactful practices I have tried but I have tried it in combination with the above practices so it is hard to say if on it's own this practice is the most impactful or if it is the creation of a Standards-Based learning environment that makes all of these practices combined so meaningful to student learning.
The Problem With Educational ResearchFor almost every study I listed you can find one that shows a negative effect of the same practices. So how do we as educators move forward with the conflicting data? We have to focus on the studies where the impact has the highest effect and attempt to further replicate and study those practices more. For example, when we talk about studies on "feedback" that is way too vague of a term and each study could be measuring wildly different feedback practices. To really compare studies we need to know when and how was the feedback given? Were students given time to engage with the feedback? Was feedback only narrative? Was feedback only a rubric? And what other practices in the classroom support that strategy or practice that might not be mentioned in the study?
Teaching really is an incredibly intricate blend of science and art. It is important that we engage in conversations on what works and what doesn't both from the lense of research and from the lense of what we do and see in our own classrooms. If we know that feedback without comments is one of the strongest tools we have for student motivation according to Butler's research, how do we balance that with the research on learning goals, separating behaviors, and scoring guides?
For my classroom, I find value in making sure students know where they stand in their own personal growth. I want them to be able to articulate as easily as I can what they know, what they are struggling with, and what they are expected to do. I could get rid of letters or numbers to do this but in the end, whether you call it an A, 4, or Meets Expectations it is all the same: a measure of accomplishment. The culture I create in my classroom is one of growth, working with feedback to improve, and trying again. Since making these shifts, I haven't seen the same level demotivating impact of letter grades because we talk about grades differently in my classroom. I absolutely still have work to do, I always have that one student asking about extra credit that I have to remind that there is no purpose in extra credit when you can try again as often as you need to show you have met mastery.
Note on the research presented: I used the Marzano Research Database to create this post. I made the decision to use this after looking through the list of books I have read on grading practices and determining that action research from real classrooms is the best starting point in response to the challenge for research. I hope to follow up this post with more research soon but also feel the need to point to this Rick Wormeli piece.