Saturday, October 02, 2021

Main-Loop Accurate Delay

I talked in the earlier post for today about trying out a stochastic approach towards computing the number of vertical retrace cycles needed to enact a specific main-loop processing rate. What I missed out is a lot on why.

Think of the main-loop using any conceptual framework: I'll use OODA because it corresponds closely to what the main-loop of a game does. Roughly, we process user input, update internal data structures, draw sprites and/or compute collision, wait for vertical retrace, flip page, in that order per iteration of the main-loop. For most of the time, each successful iteration through this main-loop counts as a frame of [logical] video output.

The easiest way of programming the main-loop is to lock-step the physics engine (typified here as ``update internal data structures'' and ``compute collision'') with the actual rendering pipeline. That's why for many older games, when the frame rate is unlocked to allow hundreds of frames per second (fps), the physics can get... wonky.

I know I say older games, but truth is, this particular main-loop design is easy to build and thus shows up almost everywhere. Anytime a ``frame-rate dependent physics engine abuse'' type bug/feature shows up, this should be the first thing that comes to mind.

This is also a reason on why some games seem to react more quickly when the frame rate limit is raised.

But I'm not here to talk about the merits of lock-stepping the physics with the rendering pipeline. I'm just assuming that it is the case.

For consistency in behaviour, it is often a good idea to limit the processing rate of the main-loop. This means that it is necessary to ensure that each iteration takes roughly the same time, which means padding it with some sort of delay mechanism. I'm working in retro-world, so I have few modern day options like using a high-precision sleep() command that gives micro-second precision. In LED(II), I used empty FOR-loops, relying on a pre-main-loop calibration process counting up the number of iterations to complete in a given amount of time---this is in contrast with the prevalent method then of counting a fixed number of loops and determining the duration takken, a method that led to ``divide by zero'' errors when run on fast enough PCs (we're talking the difference between the program running well on an 80386 and having a ``divide by zero'' error on a Pentium I).

In LED2-20, I'm using a different method: use a known higher-precision timing mechanism as the workload in a busy-wait loop instead. In this case, I'm relying on the VGA port of 0x3da that yields a return time of exactly the vertical retrace duration. The duration is well-documented: for my selected screen mode, the vertical retrace rate is 70 Hz, which has a vertical retrace duration of 0.014 s. Comparing this to the 18.2 Hz of the built-in timer, it is nearly 4× more precise.

This delay per iteration is not necessarily constant, since it depends on how much time the actual main-loop workload consumes. I do have a low-precision timer (the 18.2 Hz one) running as an event trap in the background that helps me recalibrate the delay according to the actual number of iterations fully completed in between the timer interrupts, and it is from there that I discovered that the loop count was wildly unstable. I dug in and realised that I faced the problem of needing delays that were more precise than 0.014 s. Is it possible using this fixed clock of 70 Hz?

The answer is yes, if I'm allowing myself to expand my definition of the target delay to the average delay over time. So if I need a delay of about 0.05 s which requires 3.5 ticks of the 70 Hz, I can achieve it by counting 3 ticks 50% of the time, and 4 ticks 50% of the time.

A wholly deterministic method to obtain the same result can be seen in the theory of pulse-width modulation (PWM), but I cannot do it since the delays can change between recalibration, but PWM needs a defined pulsing cycle. Moreover, the function that translates the duration in floating point seconds to screen ticks does not use any static variables to assist in tracking the pattern for the pulsing cycle [for the additional tick].

Hence, using a random number is the best ``independent'' sort of way. The idea is to return at least the smallest integer number of 70 Hz ticks that do not exceed the delay duration, and add 1 tick to it with probability equal to the remaining fraction of a tick that will yield us the correct one. The principle is therefore similar to that of PWM, except that I replace determinism with a stochastically-driven behaviour instead.

It seems to work as the main-loop processing rate is definitely within the range of whatever I set it to be after one recalibration cycle.

That's enough for now.

No comments: