* addHL and subHL affect flags, and are smaller
Most importantly, addHL and subHL now affect the flags as you would expect from a 16 bit addition/subtraction. This seems like it'd be preferred behaviour, however I realise any code relying on it not affecting flags would break. One byte saved in addHL, and two bytes saved in subHL. Due to the branching nature of the original code, it's difficult to compare speeds, subHL is either 1 or 6 cycles faster depending on branching, and addHL is between -1 and 3 cycles faster. If the chance of a carry is 50%, addHL is expected to be a cycle faster, but for a chance of carry below 25% (so a < 0x40) this will be up to a cycle slower.
* Update core.asm
* Reworked one use of addHL
By essentially inlining both addHL and cpHLDE, 100 cycles are saved, but due to the registers not needing preserving, a byte is saved too.
* Corrected spelling error in comment
* Reworked second use of addHL
43 cycles saved, and no more addHL in critical loops. No bytes saved or used.
* Fixed tabs and spacing, and made a comment clearer.
* Clearer comments
* Adopted push/pop notation
Pretty major improvements to both of these, cpHLDE is now 5 bytes shorter and between 9 and 12 cycles faster due to branching, and writeHLinDE is now 2 bytes shorter and 21 cycles faster.
* Optimised intoXX functions
Rewrote intoXX functions to mainly rely on intoHL, as the HL instructions are smaller and faster. Also removed some redundant push and pop instructions. I edited the given unit tests to test these, and they seem to work as expected.
* Doesn't use self-modifying code
The number of bytes is the same as my previous attempt, with 11 more cycles in intoHL, so although I don't feel as clever this time it's still a good optimisation. I found an equivalent method for intoDE, however relying on intoHL still allows for `ex (sp), hl` to be used in intoIX, which is smaller and faster.
* Update core.asm
* Tried harder to follow coding convention
Added tabs between mnemonics and operands, and replaced a new line I accidentally removed.
Sure, it's a bit slower, but it prevents a lot of hard to debug
problems. I don't have to want to remember "don't use IX if you
have any blk* calls". Let's optimize I/O later.
When there's a mismatch, retry up to a certain number of times.
This makes random problem related to assembling big kernels go away! But
it also make SD card reading much slower...
For now, this achieves nothing else than wasting cycles, but this is the
first step in enabling CRC verifications (CMD59).
I think that this is where my random problems with assembling large
kernels from SDC come from: bad data that isn't detected. If that
happens when PGM loads programs in memory, then anything can happen.
`sdct`, when ran often enough, will error out or corrupt away (go
crazy)...