Tuning is for helping set axis parameters, I.e acceleration, max speed, following error...

With KMotion, you can carry out test moves, and then show the resultant data in a few different ways, so you can see exactly how the axis is responding. On a closed loop step/dir system, it would help establish the maximum acceleration, before the servo couldn't keep up, as you can see the following error, rather than relying on the servo drive faulting because the following error has been exceeded.
Without access to that kind of information, you've just got rely on establishing tuning figures that work, and hope following error is kept to a minimum. It is worth mentioning that there are plenty machines that run without any accuracy issues, that have solely been setup without any kind of data plotting.

Being able to maintain position is a valid reason, but I'll be honest and say it's something that sounds good to have, but in reality isn't much use. I have that ability on my lathe, but after an E-stop I'll always re-home it anyway just to make sure it's where it thinks it is. The CS-Labs CSMIO-IP/A analogue interface also has this functionality.

I've never used Mach 4, so I have no idea what it can or can't currently do.
KFlop is probably the most powerful and adaptable controller in it's price range, but you have to understand C Programming to make the most of it.
Whereas Mach 4 does most of the controller functions, handles the inputs and outputs, and tells the controller how to respond, KFlop relies on having C programs loaded to it, and will handle the required inputs/outputs internally to complete what KMotionCNC has requested.

A good example would be a tool changer.
Mach 4 will tell the controller when to activate any outputs, will monitor the inputs, and control the tool changer motion directly within Mach 4. A key thing is any response to inputs it reliant on the speed on communication between the controller and Mach 4. I know with Mach 3 this was limited to a 10Hz response speed, but have no idea how fast the scripting runs in Mach 4.
KMotionCNC will rely on a C Program within the KFlop, and simply tell the KFlop to load the required tool. The KFlop will activate any outputs, and monitor the inputs with no further input from KMotionCNC. Response time to any input changes is only limited by the KFlop C Program slice time (you can have up to 7 C programs running, with each given an equal time allocation in sequence), but even with the maximum 7 C programs running, response time is still sub millisecond.

What's suitable for you depends on what you are trying to achieve.