[The following research is the subject of my dissertation]
You might have had teachers tell you not to cram for tests because you’d forget the info soon after. This alludes to the fact that spaced retrieval practice is better than unspaced practice (similar to cramming) if the goal is keeping the information accessible in the long term. Retrieval practice refers to studying structured like flash-cards (prompt and response), and is the most effective way to learn declarative information.
There is an interesting dichotomy between accuracy in the early retrieval attempts and accuracy at some point far in the future. It seems that the lower the accuracy of the early retrieval attempts -that is, the more difficult they were- the more likely that information is to be accessible in the long term. Conversely, high accuracy and low difficulty for the early retrieval attempts, as is the case in cramming, produces poor long-term retention.
This is the core of the desirable difficulty hypothesis: that there exists some ideal difficulty which increases the future likelihood of retrieval more than other difficulties. Too easy or too hard and you wouldn’t get the full benefits of practice. This suggests an effect of relative spacing on retention, not just total spacing.
I set out to investigate the desirable difficulty hypothesis in greater depth than the current literature explored. In my opinion too few schedules were contrasted to paint a full picture of how competing schedules differed from one another. Additionally, no studies collected subjective difficulty ratings for retrieval attempts, instead relying on response time as a proxy for difficulty. I designed a study to overcome these perceived shortcomings.
One of the most notable early findings from this study was the correlation between retrieval accuracy and both 1) subjective retrieval difficulty and 2) response time. Though response time was significantly correlated with retrieval accuracy (R2 = .38), subjective difficulty was far more significantly correlated with retrieval accuracy (R2 = .88). It appears that the proxy most commonly used for retrieval difficulty, response time, is greatly inferior to subjective difficulty estimates. I developed the subjective difficulty response scale specifically for this study, and have further validate with 2 subsequent studies.
As with my Typeface Investigation, I built and hosted this study online and participants completed it within their browser, at their convenience. Given that it was a multi-day study requiring 3 hours in total, this remote administration saved hundreds of labor-hours.
The data generated are still being analyzed, but initial findings show a clear effect of relative spacing, going against previous findings. We see below that practice schedules where the retrieval interval increases between each retrieval result in greater learning gains than contracting schedules.

We also see interesting differences in the learning curves, or the progression of retrieval difficulties. There are glaring differences in the learning curves generated by expanding and contracting interval schedules, as can be seen below (the lower values at the top of the Y-axis reflect easier retrievals). These may give insight to the performance differences observed between schedules with different relative spacings.


Even forgetting the benefits the expanding schedules produce in the long term, we can see how such schedules are also more useful to the learner in the short term: as the average difficulty of each retrieval is low, the retrieval accuracy is high. Imagine learning a new language. With an expanding schedule, every retrieval attempt is likely to be successful, so you’re more likely to remember the word for ice cream at that Italian restaurant, even if you’ve only practiced the word a few times (hint: it’s gelato).
The structure of these data allowed the use of multi-level modeling analyses. One such analysis, found below, reveals an as yet unreported rolling window of influence. Retrieval attempts going as far as 5 attempts back were each shown to make a unique contribution to the present retrieval likelihood. Notably, all of these effects were in the same direction; the easier the retrieval, the greater the long term accessibility. This is evidence against a desirable difficulty framework.

Since this work was undertaken, these data been leveraged to create an algorithm to optimize the spacing of retrieval schedules, taking into account the difficulty of each successive retrieval. This work is detailed in my dissertation.