The compiler flag suggestion was a long shot idea, and wouldn't fix the race condition anyway.
For a quick test, I suggest putting something like
at the end of the ISR and just rerun your tests. You should still see gaps in the timing, but they will be on the order of a single step instead of 32 ms. If that works, then 0 could be increased to "OCR1A - x" where x is enough clock ticks such that there is enough time for the ISR to return and have interrupts be enabled again, which would reduce any gaps to a minimum. Alternatively, you could put in that test and reset of TCNT1 at the end of a do-while loop to keep servicing steps until it stopped overrunning the timer, but I think it's generally considered bad form to tie things up in an interrupt handler like that.
Of course, this whole situation is pretty marginal, and if it turns out this is what is causing the problem, it's just because you're running at the ragged edge of what is feasible on the hardware.