My approach with RS was slightly wrong: RS' TOP was always containing current
IP. It worked, but it was problematic when came the time to introduce
RS-modifying words: it's impossible to modify RS in a word without immediately
messing your flow.
Therefore, what used to be RS' TOS has to be a variable that isn't changed
midway by RS-modifying words. I guess that's why RS is called *return* stack...
Big one.
This allows us to write higher order words directly in Forth, which is much
more convenient than writing post-immediate (see "NOT" structure in diff if
you want to see what I mean) structures in ASM.
These structures can then be written to ROM (rather than loaded in RAM for
definitions loaded at run-time).
That's quite a bit of tooling that was added, 2 compilations stages, but I
think it's well worth it.
Things are better now, but immediates inside colons are broken. However,
IF/THEN/ELSE are now immediates and it's much cleaner this way. Still, this
commit has too much stuff in it, I need to commit, I don't want to lose this
step.
The goal was to be able to implement "(" in forth, but I realised that my
INTERPRET approach was wrong. Compiling the line beforehand is, after all,
not good. I'll have to change it again.
Previous approach was broken with regards to defined word using CREATE.
Also, reduce name length by one to make space for a new flags field for
"immediate" (which isn't used yet).
This scheme of "when we handle line-by-line, compile one word at a time then
execute" so that we could allow words like "CREATE" to call "readword" before
continuing was a bad scheme. It made things like branching outside of a colon
definition impossible.
This commit implement a new "litWord". When an undefined word is encountered at
compile time, it is included as-is in a string literal word. It is at run time
that we decide what to do with it.
* String functions optimised
A few functions have been tweaked, but the biggest changes are in strlen, strskip and toWS, which take around two third of the cycles they used to (although strskip has more overhead). 10 bytes saved total.
toWS had two bytes added inlining the isWS call, and a jump to unsetZ was inlined too, saving a byte. This saved 29 cycles, with the original function being 90 cycles. I looked at other uses of isWS and it's difficult to inline it effectively in every situation, so I haven't inlined it elsewhere.
rdWS had a byte and two cycles saved by inlining a jump to unsetZ.
strskip is the same size, with the loop cut down from 35 cycles to 21 cycles, but 18 cycles are added outside the loop. I expect one character strings are in the minority, so this should save cycles overall.
strlen had 8 bytes saved, with the loop cut down from 38 cycles to 21 cycles, and 18 cycles removed outside the loop.
* Fixed strskip
Strskip wasn't preserving a properly. The new code uses the shadow af register, so whilst a byte and 4 cycles have been added outside the loop, it's safer and cleaner. The flags register isn't affected, but since the search goes for up to 64Kb I think it's safe to say the end of the string will always be reached.
* Remove inlining of isWS
I've tweaked nearly every function in this file, so I'll go through them one by one.
parseDecimal has been reworked a little so that `a` can be used instead of `b` for checking for overflow. I had originally intended to redo it to work like the old parseDecimal, but I think the current method (once reworked a little) is cleaner and smaller, and should be just as fast. 7 bytes and 27 cycles saved.
parseHexadecimal has been changed to load hex digits into `b` `d` `c` `e` from the right (so all the digits move along to the left so the new digit can be inserted on the right), and then only at the end is any shifting done, using the faster `add a, a` to do left shifts. 9 bytes saved and 78 cycles saved inside the loop, and then 49 cycles added after the loop.
parseBinaryLiteral had a few instructions moved around, saving two bytes and 5 cycles inside the loop, and a further 15 cycles saved on error.
parseLiteral has been reworked slightly, the isDigit call has been replaced with an inline parseDecimalDigit, saving a byte and around 20-30 cycles, with around 16 more cycles saved if the number is a decimal. The .char routine has been reduced by a byte, and 6 cycles saved on success, but 5 cycles added on error.
isDigit has been reduced by 4 bytes and 10 cycles on success, with a few more cycles saved on fail (hard to estimate due to branching).
Sub-parsers are seldom used by themselves, except for parseDecimal.
I'm tightening the code of this unit for two reasons:
1. Optimization
2. Upcoming API change where HL won't be preserved anymore, but will
point to char following the last parse char. This will allow us
to simplify lib/expr.
Instead of going left and right, finding operators chars and replacing them
with nulls, we parse expressions in a more orderly manner, one chunk at a
time. I think it qualifies as "recursive descent", but I'm not sure.
This allows us to preserve the string we parse and should also make the
implementation of parens much easier.