So here is a baseline, the basic axis mirroring with auto squaring, but without I2C. Lag is 600ns after a lot of messing about I just stuck the logic in a lookup table and deleted all my code ;-) Yellow trace is step output from DDCS running at 10000mm/min, blue trace is the arduino step output, lag is shown in 'dt' bottom right.

Click image for larger version. 

Name:	hantek10_2.png 
Views:	742 
Size:	24.3 KB 
ID:	28337