UnsetZ has been reduced by a byte, and between 17 and 28 cycles saved based on branching. Since branching is based on a being 0, it shouldn't have to branch very often and so be 28 cycles saved most the time. Including the initial call, the old version was 60 cycles, so this should be nearly twice as fast.
fmtHex has been reduced by 4 bytes and between 3 and 8 cycles based on branching.
fmtHexPair had a redundant "and" removed, saving two bytes and seven cycles.
parseHex has been reduced by 7 bytes. Due to so much branching, it's hard to say if it's faster, but it should be since it's fewer operations and now conditional returns are used which are a cycle faster than conditional jumps. I think there's more to improve here, but I haven't come up with anything yet.
* Optimised intoXX functions
Rewrote intoXX functions to mainly rely on intoHL, as the HL instructions are smaller and faster. Also removed some redundant push and pop instructions. I edited the given unit tests to test these, and they seem to work as expected.
* Doesn't use self-modifying code
The number of bytes is the same as my previous attempt, with 11 more cycles in intoHL, so although I don't feel as clever this time it's still a good optimisation. I found an equivalent method for intoDE, however relying on intoHL still allows for `ex (sp), hl` to be used in intoIX, which is smaller and faster.
* Update core.asm
* Tried harder to follow coding convention
Added tabs between mnemonics and operands, and replaced a new line I accidentally removed.
Sure, it's a bit slower, but it prevents a lot of hard to debug
problems. I don't have to want to remember "don't use IX if you
have any blk* calls". Let's optimize I/O later.
When there's a mismatch, retry up to a certain number of times.
This makes random problem related to assembling big kernels go away! But
it also make SD card reading much slower...
For now, this achieves nothing else than wasting cycles, but this is the
first step in enabling CRC verifications (CMD59).
I think that this is where my random problems with assembling large
kernels from SDC come from: bad data that isn't detected. If that
happens when PGM loads programs in memory, then anything can happen.
`sdct`, when ran often enough, will error out or corrupt away (go
crazy)...