Not much of a gain in terms of usability (a bit of a loss in fact, things are
a bit slow and glitchy), but it's a necessary move if we want to use upcoming
grid-enabled userspace apps, such as a visual text editor.
Sending the escape after its target made things complicated for upcoming
stuff I want to add. Although it makes `recv.asm` slightly larger, it's really
worth it.
* String functions optimised
A few functions have been tweaked, but the biggest changes are in strlen, strskip and toWS, which take around two third of the cycles they used to (although strskip has more overhead). 10 bytes saved total.
toWS had two bytes added inlining the isWS call, and a jump to unsetZ was inlined too, saving a byte. This saved 29 cycles, with the original function being 90 cycles. I looked at other uses of isWS and it's difficult to inline it effectively in every situation, so I haven't inlined it elsewhere.
rdWS had a byte and two cycles saved by inlining a jump to unsetZ.
strskip is the same size, with the loop cut down from 35 cycles to 21 cycles, but 18 cycles are added outside the loop. I expect one character strings are in the minority, so this should save cycles overall.
strlen had 8 bytes saved, with the loop cut down from 38 cycles to 21 cycles, and 18 cycles removed outside the loop.
* Fixed strskip
Strskip wasn't preserving a properly. The new code uses the shadow af register, so whilst a byte and 4 cycles have been added outside the loop, it's safer and cleaner. The flags register isn't affected, but since the search goes for up to 64Kb I think it's safe to say the end of the string will always be reached.
* Remove inlining of isWS
I've tweaked nearly every function in this file, so I'll go through them one by one.
parseDecimal has been reworked a little so that `a` can be used instead of `b` for checking for overflow. I had originally intended to redo it to work like the old parseDecimal, but I think the current method (once reworked a little) is cleaner and smaller, and should be just as fast. 7 bytes and 27 cycles saved.
parseHexadecimal has been changed to load hex digits into `b` `d` `c` `e` from the right (so all the digits move along to the left so the new digit can be inserted on the right), and then only at the end is any shifting done, using the faster `add a, a` to do left shifts. 9 bytes saved and 78 cycles saved inside the loop, and then 49 cycles added after the loop.
parseBinaryLiteral had a few instructions moved around, saving two bytes and 5 cycles inside the loop, and a further 15 cycles saved on error.
parseLiteral has been reworked slightly, the isDigit call has been replaced with an inline parseDecimalDigit, saving a byte and around 20-30 cycles, with around 16 more cycles saved if the number is a decimal. The .char routine has been reduced by a byte, and 6 cycles saved on success, but 5 cycles added on error.
isDigit has been reduced by 4 bytes and 10 cycles on success, with a few more cycles saved on fail (hard to estimate due to branching).
I implement the screen using XCB which is much more friendly
than z80e's SDL+CMake for development machines that want to install
minimal dependencies (for example, a port-less OpenBSD rig).
Sub-parsers are seldom used by themselves, except for parseDecimal.
I'm tightening the code of this unit for two reasons:
1. Optimization
2. Upcoming API change where HL won't be preserved anymore, but will
point to char following the last parse char. This will allow us
to simplify lib/expr.