1
0
mirror of https://github.com/hsoft/collapseos.git synced 2025-04-05 22:58:40 +11:00
Commit Graph

7 Commits

Author SHA1 Message Date
Clanmaster21
edbd775642
Fixed more errors, clearer choice of constants 2019-10-21 02:13:21 +01:00
Clanmaster21
ab4fca334d
Fixed erroring out for all number >0x1999
I fixed the errors for numbers >0x1999, sadly it is now 6 bytes bigger, so 5 bytes larger than the original, but the speed increases should still hold.
2019-10-20 23:56:31 +01:00
Clanmaster21
67adc6fcfc
Removed skip leading zeroes, added skip first multiply
Now instead of skipping leading zeroes, the first digit is loaded directly into hl without first multiplying by 10. This means the first loop is skipped in the overhead, making the method 2-3 times faster overall, and is now faster for the more common fewer digit cases too. The number of bytes is exactly the same, and the inner loop is slightly faster too thanks to no longer needing to load a into c.
To be more precise about the speed increase over the current code, for decimals of length 1 it'll be 3.18x faster, for decimals of length 2, 2.50x faster, for length 3, 2.31x faster, for length 4, 2.22x faster, and for length 5 and above, at least 2.03x faster. In terms of cycles, this is around 100+(132*length) cycles saved per decimal.
2019-10-20 22:26:07 +01:00
Clanmaster21
9797405789
Major parsing optimisations
Totally reworked both parseDecimal and parseDecimalDigit
parseDecimalDigit no longer exists, as it could be replaced by an inline alternative in the 4 places it appeared. This saves one byte overall, as the inline version is 4 bytes, 1 byte more than a call, and removing the function saved 5 bytes. It has been reduced from between 52 and 35 cycles (35 on error, so we'd expect 52 cycles to be more common unless someone's really bad at programming) to 14 cycles, so 2-3 times faster.
parseDecimal has been reduced by a byte, and now the main loop is just about twice as fast, but with increased overhead. To put this into perspective, if we ignore error cases:
For decimals of length 1 it'll be 1.20x faster, for decimals of length 2, 1.41x faster, for length 3, 1.51x faster, for length 4, 1.57x faster, and for length 5 and above, at least 1.48x faster (even faster if there's leading zeroes or not the worst case scenario).
I believe there is still room for improvement, since the first iteration can be nearly replaced with "ld l, c" since 0*10=0, but when I tried this I could either add a zero check into the main loop, adding around 40 cycles and 10 bytes, or add 20 bytes to the overhead, and I don't think either of those options are worth it.
2019-10-13 13:50:07 +01:00
Virgil Dupras
eefadc3917 ed: add support for 'a' and 'i' 2019-07-14 17:35:21 -04:00
Virgil Dupras
951dd2206d apps/ed: add the concept of "current line" 2019-07-13 15:28:44 -04:00
Virgil Dupras
6dbbfa837d apps/ed: add (dummy) line number processing
Starting to feel interactive...
2019-07-13 11:53:30 -04:00