Clock of DDCS is 500khz, my code mirrors directly off the port so low overhead compared to arduino libs. I will verify propagation delay with the scope. PC817s are only on switches not step/dir, this is also the opto inside the DDCS, i've just had a look. Agree easy to build part of it with multiplexers but I'm not sure it is required. I've got a box full of ESPs here, again not required for this.

With 16x ustepping and 10mm pitch screws you are at 320 steps/mm so 500kHz gives 1500mm/s and a screw speed of 10k rpm, we only need 1/10th of this tops so I don't think frequency will be an issue.