Totally reworked both parseDecimal and parseDecimalDigit
parseDecimalDigit no longer exists, as it could be replaced by an inline alternative in the 4 places it appeared. This saves one byte overall, as the inline version is 4 bytes, 1 byte more than a call, and removing the function saved 5 bytes. It has been reduced from between 52 and 35 cycles (35 on error, so we'd expect 52 cycles to be more common unless someone's really bad at programming) to 14 cycles, so 2-3 times faster.
parseDecimal has been reduced by a byte, and now the main loop is just about twice as fast, but with increased overhead. To put this into perspective, if we ignore error cases:
For decimals of length 1 it'll be 1.20x faster, for decimals of length 2, 1.41x faster, for length 3, 1.51x faster, for length 4, 1.57x faster, and for length 5 and above, at least 1.48x faster (even faster if there's leading zeroes or not the worst case scenario).
I believe there is still room for improvement, since the first iteration can be nearly replaced with "ld l, c" since 0*10=0, but when I tried this I could either add a zero check into the main loop, adding around 40 cycles and 10 bytes, or add 20 bytes to the overhead, and I don't think either of those options are worth it.
UnsetZ has been reduced by a byte, and between 17 and 28 cycles saved based on branching. Since branching is based on a being 0, it shouldn't have to branch very often and so be 28 cycles saved most the time. Including the initial call, the old version was 60 cycles, so this should be nearly twice as fast.
fmtHex has been reduced by 4 bytes and between 3 and 8 cycles based on branching.
fmtHexPair had a redundant "and" removed, saving two bytes and seven cycles.
parseHex has been reduced by 7 bytes. Due to so much branching, it's hard to say if it's faster, but it should be since it's fewer operations and now conditional returns are used which are a cycle faster than conditional jumps. I think there's more to improve here, but I haven't come up with anything yet.
* Fix for tools/zasm.sh being dependent on readlink -f (an issue on macOS, preventing builds)
* Wrap zasm.sh shebang in /usr/bin/env ; remove comment about BSDs
* Optimised intoXX functions
Rewrote intoXX functions to mainly rely on intoHL, as the HL instructions are smaller and faster. Also removed some redundant push and pop instructions. I edited the given unit tests to test these, and they seem to work as expected.
* Doesn't use self-modifying code
The number of bytes is the same as my previous attempt, with 11 more cycles in intoHL, so although I don't feel as clever this time it's still a good optimisation. I found an equivalent method for intoDE, however relying on intoHL still allows for `ex (sp), hl` to be used in intoIX, which is smaller and faster.
* Update core.asm
* Tried harder to follow coding convention
Added tabs between mnemonics and operands, and replaced a new line I accidentally removed.
I've tested RAM usage when self-assembling and there weren't as high
as I thought. zasm's defaults now use less than 0x1800 bytes of RAM,
making it possible, theoretically for now, for a Sega Master System
to assemble Collapse OS from within itself.
Yup, that's ultimately why I've just made this whole big zasm
refactoring in the previous commits. To allow for this.
But also, zasm is in much better shape now...
I'm about to split the global registry in two (labels and consts)
and the previous state of registry selection made things murky.
Now it's much better.
The dual scraptchpad thing doesn't work. Things become very
complicated when it's time to write that back to the file. We
overwrite our contents and end up with garbage.
This hard-binds ed to the filesystem (I liked the idea of working
only with blockdevs though...), but this is necessary for the
upcoming `w` command. We need some kind of way to tell the
destination to write to truncate itself.
This only has a meaning in the filesystem, but it's necessary to
let the file know that its registered file size has possibly
shrunk.
I thought of alternatives that would have allowed me to keep ed
blkdev-centered, but they were all too hackish to my own taste.
Hence, this new hard-bind on files.