collapseos/apps/lib/parse.asm

239 lines
6.2 KiB
NASM
Raw Normal View History

; *** Requirements ***
; lib/util
; *** Code ***
; Parse the hex char at A and extract it's 0-15 numerical value. Put the result
; in A.
;
; On success, the carry flag is reset. On error, it is set.
parseHex:
; First, let's see if we have an easy 0-9 case
add a, 0xc6 ; maps '0'-'9' onto 0xf6-0xff
sub 0xf6 ; maps to 0-9 and carries if not a digit
ret nc
and 0xdf ; converts lowercase to uppercase
add a, 0xe9 ; map 0x11-x017 onto 0xFA - 0xFF
sub 0xfa ; map onto 0-6
ret c
; we have an A-F digit
add a, 10 ; C is clear, map back to 0xA-0xF
ret
; Parse string at (HL) as a decimal value and return value in DE.
; Reads as many digits as it can and stop when:
; 1 - A non-digit character is read
; 2 - The number overflows from 16-bit
; HL is advanced to the character following the last successfully read char.
; Error conditions are:
; 1 - There wasn't at least one character that could be read.
; 2 - Overflow.
; Sets Z on success, unset on error.
Decimal parse optimisations (#45) * Optimised parsing functions and other minor optimisations UnsetZ has been reduced by a byte, and between 17 and 28 cycles saved based on branching. Since branching is based on a being 0, it shouldn't have to branch very often and so be 28 cycles saved most the time. Including the initial call, the old version was 60 cycles, so this should be nearly twice as fast. fmtHex has been reduced by 4 bytes and between 3 and 8 cycles based on branching. fmtHexPair had a redundant "and" removed, saving two bytes and seven cycles. parseHex has been reduced by 7 bytes. Due to so much branching, it's hard to say if it's faster, but it should be since it's fewer operations and now conditional returns are used which are a cycle faster than conditional jumps. I think there's more to improve here, but I haven't come up with anything yet. * Major parsing optimisations Totally reworked both parseDecimal and parseDecimalDigit parseDecimalDigit no longer exists, as it could be replaced by an inline alternative in the 4 places it appeared. This saves one byte overall, as the inline version is 4 bytes, 1 byte more than a call, and removing the function saved 5 bytes. It has been reduced from between 52 and 35 cycles (35 on error, so we'd expect 52 cycles to be more common unless someone's really bad at programming) to 14 cycles, so 2-3 times faster. parseDecimal has been reduced by a byte, and now the main loop is just about twice as fast, but with increased overhead. To put this into perspective, if we ignore error cases: For decimals of length 1 it'll be 1.20x faster, for decimals of length 2, 1.41x faster, for length 3, 1.51x faster, for length 4, 1.57x faster, and for length 5 and above, at least 1.48x faster (even faster if there's leading zeroes or not the worst case scenario). I believe there is still room for improvement, since the first iteration can be nearly replaced with "ld l, c" since 0*10=0, but when I tried this I could either add a zero check into the main loop, adding around 40 cycles and 10 bytes, or add 20 bytes to the overhead, and I don't think either of those options are worth it. * Inlined parseDecimalDigit See previous commit, and /lib/parse.asm, for details * Fixed tabs and spacing * Fixed tabs and spacing * Better explanation and layout * Corrected error in comments, and a new parseHex 5 bytes saved in parseHex, again hard to say what that does to speed, the shortest possible speed is probably a little slower but I think non-error cases should be around 9 cycles faster for decimal and 18 cycles faster for hex as there's now only two conditional returns and no compliment carries. * Fixed the new parseHex I accidentally did `add 0xe9` without specifying `a` * Commented the use of daa I made the comments surrounding my use of daa much clearer, so it isn't quite so mystical what's being done here. * Removed skip leading zeroes, added skip first multiply Now instead of skipping leading zeroes, the first digit is loaded directly into hl without first multiplying by 10. This means the first loop is skipped in the overhead, making the method 2-3 times faster overall, and is now faster for the more common fewer digit cases too. The number of bytes is exactly the same, and the inner loop is slightly faster too thanks to no longer needing to load a into c. To be more precise about the speed increase over the current code, for decimals of length 1 it'll be 3.18x faster, for decimals of length 2, 2.50x faster, for length 3, 2.31x faster, for length 4, 2.22x faster, and for length 5 and above, at least 2.03x faster. In terms of cycles, this is around 100+(132*length) cycles saved per decimal. * Fixed erroring out for all number >0x1999 I fixed the errors for numbers >0x1999, sadly it is now 6 bytes bigger, so 5 bytes larger than the original, but the speed increases should still hold. * Fixed more errors, clearer choice of constants * Clearer choice of constants * Moved and indented comment about fmtHex's method * Marked inlined parseDecimalDigit uses * Renamed .error, removed trailing whitespace, more verbose comments.
2019-10-24 22:58:32 +11:00
parseDecimal:
; First char is special: it has to succeed.
ld a, (hl)
; Parse the decimal char at A and extract it's 0-9 numerical value. Put the
; result in A.
; On success, the carry flag is reset. On error, it is set.
Decimal parse optimisations (#45) * Optimised parsing functions and other minor optimisations UnsetZ has been reduced by a byte, and between 17 and 28 cycles saved based on branching. Since branching is based on a being 0, it shouldn't have to branch very often and so be 28 cycles saved most the time. Including the initial call, the old version was 60 cycles, so this should be nearly twice as fast. fmtHex has been reduced by 4 bytes and between 3 and 8 cycles based on branching. fmtHexPair had a redundant "and" removed, saving two bytes and seven cycles. parseHex has been reduced by 7 bytes. Due to so much branching, it's hard to say if it's faster, but it should be since it's fewer operations and now conditional returns are used which are a cycle faster than conditional jumps. I think there's more to improve here, but I haven't come up with anything yet. * Major parsing optimisations Totally reworked both parseDecimal and parseDecimalDigit parseDecimalDigit no longer exists, as it could be replaced by an inline alternative in the 4 places it appeared. This saves one byte overall, as the inline version is 4 bytes, 1 byte more than a call, and removing the function saved 5 bytes. It has been reduced from between 52 and 35 cycles (35 on error, so we'd expect 52 cycles to be more common unless someone's really bad at programming) to 14 cycles, so 2-3 times faster. parseDecimal has been reduced by a byte, and now the main loop is just about twice as fast, but with increased overhead. To put this into perspective, if we ignore error cases: For decimals of length 1 it'll be 1.20x faster, for decimals of length 2, 1.41x faster, for length 3, 1.51x faster, for length 4, 1.57x faster, and for length 5 and above, at least 1.48x faster (even faster if there's leading zeroes or not the worst case scenario). I believe there is still room for improvement, since the first iteration can be nearly replaced with "ld l, c" since 0*10=0, but when I tried this I could either add a zero check into the main loop, adding around 40 cycles and 10 bytes, or add 20 bytes to the overhead, and I don't think either of those options are worth it. * Inlined parseDecimalDigit See previous commit, and /lib/parse.asm, for details * Fixed tabs and spacing * Fixed tabs and spacing * Better explanation and layout * Corrected error in comments, and a new parseHex 5 bytes saved in parseHex, again hard to say what that does to speed, the shortest possible speed is probably a little slower but I think non-error cases should be around 9 cycles faster for decimal and 18 cycles faster for hex as there's now only two conditional returns and no compliment carries. * Fixed the new parseHex I accidentally did `add 0xe9` without specifying `a` * Commented the use of daa I made the comments surrounding my use of daa much clearer, so it isn't quite so mystical what's being done here. * Removed skip leading zeroes, added skip first multiply Now instead of skipping leading zeroes, the first digit is loaded directly into hl without first multiplying by 10. This means the first loop is skipped in the overhead, making the method 2-3 times faster overall, and is now faster for the more common fewer digit cases too. The number of bytes is exactly the same, and the inner loop is slightly faster too thanks to no longer needing to load a into c. To be more precise about the speed increase over the current code, for decimals of length 1 it'll be 3.18x faster, for decimals of length 2, 2.50x faster, for length 3, 2.31x faster, for length 4, 2.22x faster, and for length 5 and above, at least 2.03x faster. In terms of cycles, this is around 100+(132*length) cycles saved per decimal. * Fixed erroring out for all number >0x1999 I fixed the errors for numbers >0x1999, sadly it is now 6 bytes bigger, so 5 bytes larger than the original, but the speed increases should still hold. * Fixed more errors, clearer choice of constants * Clearer choice of constants * Moved and indented comment about fmtHex's method * Marked inlined parseDecimalDigit uses * Renamed .error, removed trailing whitespace, more verbose comments.
2019-10-24 22:58:32 +11:00
add a, 0xff-'9' ; maps '0'-'9' onto 0xf6-0xff
sub 0xff-9 ; maps to 0-9 and carries if not a digit
ret c ; Error. If it's C, it's also going to be NZ
; During this routine, we switch between HL and its shadow. On one side,
; we have HL the string pointer, and on the other side, we have HL the
; numerical result. We also use EXX to preserve BC, saving us a push.
parseDecimalSkip: ; enter here to skip parsing the first digit
exx ; HL as a result
Decimal parse optimisations (#45) * Optimised parsing functions and other minor optimisations UnsetZ has been reduced by a byte, and between 17 and 28 cycles saved based on branching. Since branching is based on a being 0, it shouldn't have to branch very often and so be 28 cycles saved most the time. Including the initial call, the old version was 60 cycles, so this should be nearly twice as fast. fmtHex has been reduced by 4 bytes and between 3 and 8 cycles based on branching. fmtHexPair had a redundant "and" removed, saving two bytes and seven cycles. parseHex has been reduced by 7 bytes. Due to so much branching, it's hard to say if it's faster, but it should be since it's fewer operations and now conditional returns are used which are a cycle faster than conditional jumps. I think there's more to improve here, but I haven't come up with anything yet. * Major parsing optimisations Totally reworked both parseDecimal and parseDecimalDigit parseDecimalDigit no longer exists, as it could be replaced by an inline alternative in the 4 places it appeared. This saves one byte overall, as the inline version is 4 bytes, 1 byte more than a call, and removing the function saved 5 bytes. It has been reduced from between 52 and 35 cycles (35 on error, so we'd expect 52 cycles to be more common unless someone's really bad at programming) to 14 cycles, so 2-3 times faster. parseDecimal has been reduced by a byte, and now the main loop is just about twice as fast, but with increased overhead. To put this into perspective, if we ignore error cases: For decimals of length 1 it'll be 1.20x faster, for decimals of length 2, 1.41x faster, for length 3, 1.51x faster, for length 4, 1.57x faster, and for length 5 and above, at least 1.48x faster (even faster if there's leading zeroes or not the worst case scenario). I believe there is still room for improvement, since the first iteration can be nearly replaced with "ld l, c" since 0*10=0, but when I tried this I could either add a zero check into the main loop, adding around 40 cycles and 10 bytes, or add 20 bytes to the overhead, and I don't think either of those options are worth it. * Inlined parseDecimalDigit See previous commit, and /lib/parse.asm, for details * Fixed tabs and spacing * Fixed tabs and spacing * Better explanation and layout * Corrected error in comments, and a new parseHex 5 bytes saved in parseHex, again hard to say what that does to speed, the shortest possible speed is probably a little slower but I think non-error cases should be around 9 cycles faster for decimal and 18 cycles faster for hex as there's now only two conditional returns and no compliment carries. * Fixed the new parseHex I accidentally did `add 0xe9` without specifying `a` * Commented the use of daa I made the comments surrounding my use of daa much clearer, so it isn't quite so mystical what's being done here. * Removed skip leading zeroes, added skip first multiply Now instead of skipping leading zeroes, the first digit is loaded directly into hl without first multiplying by 10. This means the first loop is skipped in the overhead, making the method 2-3 times faster overall, and is now faster for the more common fewer digit cases too. The number of bytes is exactly the same, and the inner loop is slightly faster too thanks to no longer needing to load a into c. To be more precise about the speed increase over the current code, for decimals of length 1 it'll be 3.18x faster, for decimals of length 2, 2.50x faster, for length 3, 2.31x faster, for length 4, 2.22x faster, and for length 5 and above, at least 2.03x faster. In terms of cycles, this is around 100+(132*length) cycles saved per decimal. * Fixed erroring out for all number >0x1999 I fixed the errors for numbers >0x1999, sadly it is now 6 bytes bigger, so 5 bytes larger than the original, but the speed increases should still hold. * Fixed more errors, clearer choice of constants * Clearer choice of constants * Moved and indented comment about fmtHex's method * Marked inlined parseDecimalDigit uses * Renamed .error, removed trailing whitespace, more verbose comments.
2019-10-24 22:58:32 +11:00
ld h, 0
ld l, a ; load first digit in without multiplying
Decimal parse optimisations (#45) * Optimised parsing functions and other minor optimisations UnsetZ has been reduced by a byte, and between 17 and 28 cycles saved based on branching. Since branching is based on a being 0, it shouldn't have to branch very often and so be 28 cycles saved most the time. Including the initial call, the old version was 60 cycles, so this should be nearly twice as fast. fmtHex has been reduced by 4 bytes and between 3 and 8 cycles based on branching. fmtHexPair had a redundant "and" removed, saving two bytes and seven cycles. parseHex has been reduced by 7 bytes. Due to so much branching, it's hard to say if it's faster, but it should be since it's fewer operations and now conditional returns are used which are a cycle faster than conditional jumps. I think there's more to improve here, but I haven't come up with anything yet. * Major parsing optimisations Totally reworked both parseDecimal and parseDecimalDigit parseDecimalDigit no longer exists, as it could be replaced by an inline alternative in the 4 places it appeared. This saves one byte overall, as the inline version is 4 bytes, 1 byte more than a call, and removing the function saved 5 bytes. It has been reduced from between 52 and 35 cycles (35 on error, so we'd expect 52 cycles to be more common unless someone's really bad at programming) to 14 cycles, so 2-3 times faster. parseDecimal has been reduced by a byte, and now the main loop is just about twice as fast, but with increased overhead. To put this into perspective, if we ignore error cases: For decimals of length 1 it'll be 1.20x faster, for decimals of length 2, 1.41x faster, for length 3, 1.51x faster, for length 4, 1.57x faster, and for length 5 and above, at least 1.48x faster (even faster if there's leading zeroes or not the worst case scenario). I believe there is still room for improvement, since the first iteration can be nearly replaced with "ld l, c" since 0*10=0, but when I tried this I could either add a zero check into the main loop, adding around 40 cycles and 10 bytes, or add 20 bytes to the overhead, and I don't think either of those options are worth it. * Inlined parseDecimalDigit See previous commit, and /lib/parse.asm, for details * Fixed tabs and spacing * Fixed tabs and spacing * Better explanation and layout * Corrected error in comments, and a new parseHex 5 bytes saved in parseHex, again hard to say what that does to speed, the shortest possible speed is probably a little slower but I think non-error cases should be around 9 cycles faster for decimal and 18 cycles faster for hex as there's now only two conditional returns and no compliment carries. * Fixed the new parseHex I accidentally did `add 0xe9` without specifying `a` * Commented the use of daa I made the comments surrounding my use of daa much clearer, so it isn't quite so mystical what's being done here. * Removed skip leading zeroes, added skip first multiply Now instead of skipping leading zeroes, the first digit is loaded directly into hl without first multiplying by 10. This means the first loop is skipped in the overhead, making the method 2-3 times faster overall, and is now faster for the more common fewer digit cases too. The number of bytes is exactly the same, and the inner loop is slightly faster too thanks to no longer needing to load a into c. To be more precise about the speed increase over the current code, for decimals of length 1 it'll be 3.18x faster, for decimals of length 2, 2.50x faster, for length 3, 2.31x faster, for length 4, 2.22x faster, and for length 5 and above, at least 2.03x faster. In terms of cycles, this is around 100+(132*length) cycles saved per decimal. * Fixed erroring out for all number >0x1999 I fixed the errors for numbers >0x1999, sadly it is now 6 bytes bigger, so 5 bytes larger than the original, but the speed increases should still hold. * Fixed more errors, clearer choice of constants * Clearer choice of constants * Moved and indented comment about fmtHex's method * Marked inlined parseDecimalDigit uses * Renamed .error, removed trailing whitespace, more verbose comments.
2019-10-24 22:58:32 +11:00
.loop:
exx ; HL as a string pointer
Decimal parse optimisations (#45) * Optimised parsing functions and other minor optimisations UnsetZ has been reduced by a byte, and between 17 and 28 cycles saved based on branching. Since branching is based on a being 0, it shouldn't have to branch very often and so be 28 cycles saved most the time. Including the initial call, the old version was 60 cycles, so this should be nearly twice as fast. fmtHex has been reduced by 4 bytes and between 3 and 8 cycles based on branching. fmtHexPair had a redundant "and" removed, saving two bytes and seven cycles. parseHex has been reduced by 7 bytes. Due to so much branching, it's hard to say if it's faster, but it should be since it's fewer operations and now conditional returns are used which are a cycle faster than conditional jumps. I think there's more to improve here, but I haven't come up with anything yet. * Major parsing optimisations Totally reworked both parseDecimal and parseDecimalDigit parseDecimalDigit no longer exists, as it could be replaced by an inline alternative in the 4 places it appeared. This saves one byte overall, as the inline version is 4 bytes, 1 byte more than a call, and removing the function saved 5 bytes. It has been reduced from between 52 and 35 cycles (35 on error, so we'd expect 52 cycles to be more common unless someone's really bad at programming) to 14 cycles, so 2-3 times faster. parseDecimal has been reduced by a byte, and now the main loop is just about twice as fast, but with increased overhead. To put this into perspective, if we ignore error cases: For decimals of length 1 it'll be 1.20x faster, for decimals of length 2, 1.41x faster, for length 3, 1.51x faster, for length 4, 1.57x faster, and for length 5 and above, at least 1.48x faster (even faster if there's leading zeroes or not the worst case scenario). I believe there is still room for improvement, since the first iteration can be nearly replaced with "ld l, c" since 0*10=0, but when I tried this I could either add a zero check into the main loop, adding around 40 cycles and 10 bytes, or add 20 bytes to the overhead, and I don't think either of those options are worth it. * Inlined parseDecimalDigit See previous commit, and /lib/parse.asm, for details * Fixed tabs and spacing * Fixed tabs and spacing * Better explanation and layout * Corrected error in comments, and a new parseHex 5 bytes saved in parseHex, again hard to say what that does to speed, the shortest possible speed is probably a little slower but I think non-error cases should be around 9 cycles faster for decimal and 18 cycles faster for hex as there's now only two conditional returns and no compliment carries. * Fixed the new parseHex I accidentally did `add 0xe9` without specifying `a` * Commented the use of daa I made the comments surrounding my use of daa much clearer, so it isn't quite so mystical what's being done here. * Removed skip leading zeroes, added skip first multiply Now instead of skipping leading zeroes, the first digit is loaded directly into hl without first multiplying by 10. This means the first loop is skipped in the overhead, making the method 2-3 times faster overall, and is now faster for the more common fewer digit cases too. The number of bytes is exactly the same, and the inner loop is slightly faster too thanks to no longer needing to load a into c. To be more precise about the speed increase over the current code, for decimals of length 1 it'll be 3.18x faster, for decimals of length 2, 2.50x faster, for length 3, 2.31x faster, for length 4, 2.22x faster, and for length 5 and above, at least 2.03x faster. In terms of cycles, this is around 100+(132*length) cycles saved per decimal. * Fixed erroring out for all number >0x1999 I fixed the errors for numbers >0x1999, sadly it is now 6 bytes bigger, so 5 bytes larger than the original, but the speed increases should still hold. * Fixed more errors, clearer choice of constants * Clearer choice of constants * Moved and indented comment about fmtHex's method * Marked inlined parseDecimalDigit uses * Renamed .error, removed trailing whitespace, more verbose comments.
2019-10-24 22:58:32 +11:00
inc hl
ld a, (hl)
exx ; HL as a numerical result
; same as other above
add a, 0xff-'9'
sub 0xff-9
jr c, .end
ld b, a ; we can now use a for overflow checking
Decimal parse optimisations (#45) * Optimised parsing functions and other minor optimisations UnsetZ has been reduced by a byte, and between 17 and 28 cycles saved based on branching. Since branching is based on a being 0, it shouldn't have to branch very often and so be 28 cycles saved most the time. Including the initial call, the old version was 60 cycles, so this should be nearly twice as fast. fmtHex has been reduced by 4 bytes and between 3 and 8 cycles based on branching. fmtHexPair had a redundant "and" removed, saving two bytes and seven cycles. parseHex has been reduced by 7 bytes. Due to so much branching, it's hard to say if it's faster, but it should be since it's fewer operations and now conditional returns are used which are a cycle faster than conditional jumps. I think there's more to improve here, but I haven't come up with anything yet. * Major parsing optimisations Totally reworked both parseDecimal and parseDecimalDigit parseDecimalDigit no longer exists, as it could be replaced by an inline alternative in the 4 places it appeared. This saves one byte overall, as the inline version is 4 bytes, 1 byte more than a call, and removing the function saved 5 bytes. It has been reduced from between 52 and 35 cycles (35 on error, so we'd expect 52 cycles to be more common unless someone's really bad at programming) to 14 cycles, so 2-3 times faster. parseDecimal has been reduced by a byte, and now the main loop is just about twice as fast, but with increased overhead. To put this into perspective, if we ignore error cases: For decimals of length 1 it'll be 1.20x faster, for decimals of length 2, 1.41x faster, for length 3, 1.51x faster, for length 4, 1.57x faster, and for length 5 and above, at least 1.48x faster (even faster if there's leading zeroes or not the worst case scenario). I believe there is still room for improvement, since the first iteration can be nearly replaced with "ld l, c" since 0*10=0, but when I tried this I could either add a zero check into the main loop, adding around 40 cycles and 10 bytes, or add 20 bytes to the overhead, and I don't think either of those options are worth it. * Inlined parseDecimalDigit See previous commit, and /lib/parse.asm, for details * Fixed tabs and spacing * Fixed tabs and spacing * Better explanation and layout * Corrected error in comments, and a new parseHex 5 bytes saved in parseHex, again hard to say what that does to speed, the shortest possible speed is probably a little slower but I think non-error cases should be around 9 cycles faster for decimal and 18 cycles faster for hex as there's now only two conditional returns and no compliment carries. * Fixed the new parseHex I accidentally did `add 0xe9` without specifying `a` * Commented the use of daa I made the comments surrounding my use of daa much clearer, so it isn't quite so mystical what's being done here. * Removed skip leading zeroes, added skip first multiply Now instead of skipping leading zeroes, the first digit is loaded directly into hl without first multiplying by 10. This means the first loop is skipped in the overhead, making the method 2-3 times faster overall, and is now faster for the more common fewer digit cases too. The number of bytes is exactly the same, and the inner loop is slightly faster too thanks to no longer needing to load a into c. To be more precise about the speed increase over the current code, for decimals of length 1 it'll be 3.18x faster, for decimals of length 2, 2.50x faster, for length 3, 2.31x faster, for length 4, 2.22x faster, and for length 5 and above, at least 2.03x faster. In terms of cycles, this is around 100+(132*length) cycles saved per decimal. * Fixed erroring out for all number >0x1999 I fixed the errors for numbers >0x1999, sadly it is now 6 bytes bigger, so 5 bytes larger than the original, but the speed increases should still hold. * Fixed more errors, clearer choice of constants * Clearer choice of constants * Moved and indented comment about fmtHex's method * Marked inlined parseDecimalDigit uses * Renamed .error, removed trailing whitespace, more verbose comments.
2019-10-24 22:58:32 +11:00
add hl, hl ; x2
sbc a, a ; a=0 if no overflow, a=0xFF otherwise
Decimal parse optimisations (#45) * Optimised parsing functions and other minor optimisations UnsetZ has been reduced by a byte, and between 17 and 28 cycles saved based on branching. Since branching is based on a being 0, it shouldn't have to branch very often and so be 28 cycles saved most the time. Including the initial call, the old version was 60 cycles, so this should be nearly twice as fast. fmtHex has been reduced by 4 bytes and between 3 and 8 cycles based on branching. fmtHexPair had a redundant "and" removed, saving two bytes and seven cycles. parseHex has been reduced by 7 bytes. Due to so much branching, it's hard to say if it's faster, but it should be since it's fewer operations and now conditional returns are used which are a cycle faster than conditional jumps. I think there's more to improve here, but I haven't come up with anything yet. * Major parsing optimisations Totally reworked both parseDecimal and parseDecimalDigit parseDecimalDigit no longer exists, as it could be replaced by an inline alternative in the 4 places it appeared. This saves one byte overall, as the inline version is 4 bytes, 1 byte more than a call, and removing the function saved 5 bytes. It has been reduced from between 52 and 35 cycles (35 on error, so we'd expect 52 cycles to be more common unless someone's really bad at programming) to 14 cycles, so 2-3 times faster. parseDecimal has been reduced by a byte, and now the main loop is just about twice as fast, but with increased overhead. To put this into perspective, if we ignore error cases: For decimals of length 1 it'll be 1.20x faster, for decimals of length 2, 1.41x faster, for length 3, 1.51x faster, for length 4, 1.57x faster, and for length 5 and above, at least 1.48x faster (even faster if there's leading zeroes or not the worst case scenario). I believe there is still room for improvement, since the first iteration can be nearly replaced with "ld l, c" since 0*10=0, but when I tried this I could either add a zero check into the main loop, adding around 40 cycles and 10 bytes, or add 20 bytes to the overhead, and I don't think either of those options are worth it. * Inlined parseDecimalDigit See previous commit, and /lib/parse.asm, for details * Fixed tabs and spacing * Fixed tabs and spacing * Better explanation and layout * Corrected error in comments, and a new parseHex 5 bytes saved in parseHex, again hard to say what that does to speed, the shortest possible speed is probably a little slower but I think non-error cases should be around 9 cycles faster for decimal and 18 cycles faster for hex as there's now only two conditional returns and no compliment carries. * Fixed the new parseHex I accidentally did `add 0xe9` without specifying `a` * Commented the use of daa I made the comments surrounding my use of daa much clearer, so it isn't quite so mystical what's being done here. * Removed skip leading zeroes, added skip first multiply Now instead of skipping leading zeroes, the first digit is loaded directly into hl without first multiplying by 10. This means the first loop is skipped in the overhead, making the method 2-3 times faster overall, and is now faster for the more common fewer digit cases too. The number of bytes is exactly the same, and the inner loop is slightly faster too thanks to no longer needing to load a into c. To be more precise about the speed increase over the current code, for decimals of length 1 it'll be 3.18x faster, for decimals of length 2, 2.50x faster, for length 3, 2.31x faster, for length 4, 2.22x faster, and for length 5 and above, at least 2.03x faster. In terms of cycles, this is around 100+(132*length) cycles saved per decimal. * Fixed erroring out for all number >0x1999 I fixed the errors for numbers >0x1999, sadly it is now 6 bytes bigger, so 5 bytes larger than the original, but the speed increases should still hold. * Fixed more errors, clearer choice of constants * Clearer choice of constants * Moved and indented comment about fmtHex's method * Marked inlined parseDecimalDigit uses * Renamed .error, removed trailing whitespace, more verbose comments.
2019-10-24 22:58:32 +11:00
ld d, h
ld e, l ; de is x2
add hl, hl ; x4
rla
Decimal parse optimisations (#45) * Optimised parsing functions and other minor optimisations UnsetZ has been reduced by a byte, and between 17 and 28 cycles saved based on branching. Since branching is based on a being 0, it shouldn't have to branch very often and so be 28 cycles saved most the time. Including the initial call, the old version was 60 cycles, so this should be nearly twice as fast. fmtHex has been reduced by 4 bytes and between 3 and 8 cycles based on branching. fmtHexPair had a redundant "and" removed, saving two bytes and seven cycles. parseHex has been reduced by 7 bytes. Due to so much branching, it's hard to say if it's faster, but it should be since it's fewer operations and now conditional returns are used which are a cycle faster than conditional jumps. I think there's more to improve here, but I haven't come up with anything yet. * Major parsing optimisations Totally reworked both parseDecimal and parseDecimalDigit parseDecimalDigit no longer exists, as it could be replaced by an inline alternative in the 4 places it appeared. This saves one byte overall, as the inline version is 4 bytes, 1 byte more than a call, and removing the function saved 5 bytes. It has been reduced from between 52 and 35 cycles (35 on error, so we'd expect 52 cycles to be more common unless someone's really bad at programming) to 14 cycles, so 2-3 times faster. parseDecimal has been reduced by a byte, and now the main loop is just about twice as fast, but with increased overhead. To put this into perspective, if we ignore error cases: For decimals of length 1 it'll be 1.20x faster, for decimals of length 2, 1.41x faster, for length 3, 1.51x faster, for length 4, 1.57x faster, and for length 5 and above, at least 1.48x faster (even faster if there's leading zeroes or not the worst case scenario). I believe there is still room for improvement, since the first iteration can be nearly replaced with "ld l, c" since 0*10=0, but when I tried this I could either add a zero check into the main loop, adding around 40 cycles and 10 bytes, or add 20 bytes to the overhead, and I don't think either of those options are worth it. * Inlined parseDecimalDigit See previous commit, and /lib/parse.asm, for details * Fixed tabs and spacing * Fixed tabs and spacing * Better explanation and layout * Corrected error in comments, and a new parseHex 5 bytes saved in parseHex, again hard to say what that does to speed, the shortest possible speed is probably a little slower but I think non-error cases should be around 9 cycles faster for decimal and 18 cycles faster for hex as there's now only two conditional returns and no compliment carries. * Fixed the new parseHex I accidentally did `add 0xe9` without specifying `a` * Commented the use of daa I made the comments surrounding my use of daa much clearer, so it isn't quite so mystical what's being done here. * Removed skip leading zeroes, added skip first multiply Now instead of skipping leading zeroes, the first digit is loaded directly into hl without first multiplying by 10. This means the first loop is skipped in the overhead, making the method 2-3 times faster overall, and is now faster for the more common fewer digit cases too. The number of bytes is exactly the same, and the inner loop is slightly faster too thanks to no longer needing to load a into c. To be more precise about the speed increase over the current code, for decimals of length 1 it'll be 3.18x faster, for decimals of length 2, 2.50x faster, for length 3, 2.31x faster, for length 4, 2.22x faster, and for length 5 and above, at least 2.03x faster. In terms of cycles, this is around 100+(132*length) cycles saved per decimal. * Fixed erroring out for all number >0x1999 I fixed the errors for numbers >0x1999, sadly it is now 6 bytes bigger, so 5 bytes larger than the original, but the speed increases should still hold. * Fixed more errors, clearer choice of constants * Clearer choice of constants * Moved and indented comment about fmtHex's method * Marked inlined parseDecimalDigit uses * Renamed .error, removed trailing whitespace, more verbose comments.
2019-10-24 22:58:32 +11:00
add hl, hl ; x8
rla
Decimal parse optimisations (#45) * Optimised parsing functions and other minor optimisations UnsetZ has been reduced by a byte, and between 17 and 28 cycles saved based on branching. Since branching is based on a being 0, it shouldn't have to branch very often and so be 28 cycles saved most the time. Including the initial call, the old version was 60 cycles, so this should be nearly twice as fast. fmtHex has been reduced by 4 bytes and between 3 and 8 cycles based on branching. fmtHexPair had a redundant "and" removed, saving two bytes and seven cycles. parseHex has been reduced by 7 bytes. Due to so much branching, it's hard to say if it's faster, but it should be since it's fewer operations and now conditional returns are used which are a cycle faster than conditional jumps. I think there's more to improve here, but I haven't come up with anything yet. * Major parsing optimisations Totally reworked both parseDecimal and parseDecimalDigit parseDecimalDigit no longer exists, as it could be replaced by an inline alternative in the 4 places it appeared. This saves one byte overall, as the inline version is 4 bytes, 1 byte more than a call, and removing the function saved 5 bytes. It has been reduced from between 52 and 35 cycles (35 on error, so we'd expect 52 cycles to be more common unless someone's really bad at programming) to 14 cycles, so 2-3 times faster. parseDecimal has been reduced by a byte, and now the main loop is just about twice as fast, but with increased overhead. To put this into perspective, if we ignore error cases: For decimals of length 1 it'll be 1.20x faster, for decimals of length 2, 1.41x faster, for length 3, 1.51x faster, for length 4, 1.57x faster, and for length 5 and above, at least 1.48x faster (even faster if there's leading zeroes or not the worst case scenario). I believe there is still room for improvement, since the first iteration can be nearly replaced with "ld l, c" since 0*10=0, but when I tried this I could either add a zero check into the main loop, adding around 40 cycles and 10 bytes, or add 20 bytes to the overhead, and I don't think either of those options are worth it. * Inlined parseDecimalDigit See previous commit, and /lib/parse.asm, for details * Fixed tabs and spacing * Fixed tabs and spacing * Better explanation and layout * Corrected error in comments, and a new parseHex 5 bytes saved in parseHex, again hard to say what that does to speed, the shortest possible speed is probably a little slower but I think non-error cases should be around 9 cycles faster for decimal and 18 cycles faster for hex as there's now only two conditional returns and no compliment carries. * Fixed the new parseHex I accidentally did `add 0xe9` without specifying `a` * Commented the use of daa I made the comments surrounding my use of daa much clearer, so it isn't quite so mystical what's being done here. * Removed skip leading zeroes, added skip first multiply Now instead of skipping leading zeroes, the first digit is loaded directly into hl without first multiplying by 10. This means the first loop is skipped in the overhead, making the method 2-3 times faster overall, and is now faster for the more common fewer digit cases too. The number of bytes is exactly the same, and the inner loop is slightly faster too thanks to no longer needing to load a into c. To be more precise about the speed increase over the current code, for decimals of length 1 it'll be 3.18x faster, for decimals of length 2, 2.50x faster, for length 3, 2.31x faster, for length 4, 2.22x faster, and for length 5 and above, at least 2.03x faster. In terms of cycles, this is around 100+(132*length) cycles saved per decimal. * Fixed erroring out for all number >0x1999 I fixed the errors for numbers >0x1999, sadly it is now 6 bytes bigger, so 5 bytes larger than the original, but the speed increases should still hold. * Fixed more errors, clearer choice of constants * Clearer choice of constants * Moved and indented comment about fmtHex's method * Marked inlined parseDecimalDigit uses * Renamed .error, removed trailing whitespace, more verbose comments.
2019-10-24 22:58:32 +11:00
add hl, de ; x10
rla
ld d, a ; a is zero unless there's an overflow
ld e, b
Decimal parse optimisations (#45) * Optimised parsing functions and other minor optimisations UnsetZ has been reduced by a byte, and between 17 and 28 cycles saved based on branching. Since branching is based on a being 0, it shouldn't have to branch very often and so be 28 cycles saved most the time. Including the initial call, the old version was 60 cycles, so this should be nearly twice as fast. fmtHex has been reduced by 4 bytes and between 3 and 8 cycles based on branching. fmtHexPair had a redundant "and" removed, saving two bytes and seven cycles. parseHex has been reduced by 7 bytes. Due to so much branching, it's hard to say if it's faster, but it should be since it's fewer operations and now conditional returns are used which are a cycle faster than conditional jumps. I think there's more to improve here, but I haven't come up with anything yet. * Major parsing optimisations Totally reworked both parseDecimal and parseDecimalDigit parseDecimalDigit no longer exists, as it could be replaced by an inline alternative in the 4 places it appeared. This saves one byte overall, as the inline version is 4 bytes, 1 byte more than a call, and removing the function saved 5 bytes. It has been reduced from between 52 and 35 cycles (35 on error, so we'd expect 52 cycles to be more common unless someone's really bad at programming) to 14 cycles, so 2-3 times faster. parseDecimal has been reduced by a byte, and now the main loop is just about twice as fast, but with increased overhead. To put this into perspective, if we ignore error cases: For decimals of length 1 it'll be 1.20x faster, for decimals of length 2, 1.41x faster, for length 3, 1.51x faster, for length 4, 1.57x faster, and for length 5 and above, at least 1.48x faster (even faster if there's leading zeroes or not the worst case scenario). I believe there is still room for improvement, since the first iteration can be nearly replaced with "ld l, c" since 0*10=0, but when I tried this I could either add a zero check into the main loop, adding around 40 cycles and 10 bytes, or add 20 bytes to the overhead, and I don't think either of those options are worth it. * Inlined parseDecimalDigit See previous commit, and /lib/parse.asm, for details * Fixed tabs and spacing * Fixed tabs and spacing * Better explanation and layout * Corrected error in comments, and a new parseHex 5 bytes saved in parseHex, again hard to say what that does to speed, the shortest possible speed is probably a little slower but I think non-error cases should be around 9 cycles faster for decimal and 18 cycles faster for hex as there's now only two conditional returns and no compliment carries. * Fixed the new parseHex I accidentally did `add 0xe9` without specifying `a` * Commented the use of daa I made the comments surrounding my use of daa much clearer, so it isn't quite so mystical what's being done here. * Removed skip leading zeroes, added skip first multiply Now instead of skipping leading zeroes, the first digit is loaded directly into hl without first multiplying by 10. This means the first loop is skipped in the overhead, making the method 2-3 times faster overall, and is now faster for the more common fewer digit cases too. The number of bytes is exactly the same, and the inner loop is slightly faster too thanks to no longer needing to load a into c. To be more precise about the speed increase over the current code, for decimals of length 1 it'll be 3.18x faster, for decimals of length 2, 2.50x faster, for length 3, 2.31x faster, for length 4, 2.22x faster, and for length 5 and above, at least 2.03x faster. In terms of cycles, this is around 100+(132*length) cycles saved per decimal. * Fixed erroring out for all number >0x1999 I fixed the errors for numbers >0x1999, sadly it is now 6 bytes bigger, so 5 bytes larger than the original, but the speed increases should still hold. * Fixed more errors, clearer choice of constants * Clearer choice of constants * Moved and indented comment about fmtHex's method * Marked inlined parseDecimalDigit uses * Renamed .error, removed trailing whitespace, more verbose comments.
2019-10-24 22:58:32 +11:00
add hl, de
adc a, a ; same as rla except affects Z
; Did we oveflow?
jr z, .loop ; No? continue
; error, NZ already set
exx ; HL is now string pointer, restore BC
; HL points to the char following the last success.
ret
.end:
push hl ; --> lvl 1, result
exx ; HL as a string pointer, restore BC
pop de ; <-- lvl 1, result
cp a ; ensure Z
ret
; Call parseDecimal and then check that HL points to a whitespace or a null.
parseDecimalC:
call parseDecimal
ret nz
ld a, (hl)
or a
ret z ; null? we're happy
2019-11-21 12:58:26 +11:00
jp isWS
; Parse string at (HL) as a hexadecimal value without the "0x" prefix and
; return value in DE.
; HL is advanced to the character following the last successfully read char.
; Sets Z on success.
parseHexadecimal:
ld a, (hl)
call parseHex ; before "ret c" is "sub 0xfa" in parseHex
; so carry implies not zero
ret c ; we need at least one char
push bc
ld de, 0
ld b, d
ld c, d
; The idea here is that the 4 hex digits of the result can be represented "bdce",
; where each register holds a single digit. Then the result is simply
; e = (c << 4) | e, d = (b << 4) | d
; However, the actual string may be of any length, so when loading in the most
; significant digit, we don't know which digit of the result it actually represents
; To solve this, after a digit is loaded into a (and is checked for validity),
; all digits are moved along, with e taking the latest digit.
.loop:
dec b
inc b ; b should be 0, else we've overflowed
jr nz, .end ; Z already unset if overflow
ld b, d
ld d, c
ld c, e
ld e, a
inc hl
ld a, (hl)
call parseHex
jr nc, .loop
ld a, b
add a, a \ add a, a \ add a, a \ add a, a
or d
ld d, a
ld a, c
add a, a \ add a, a \ add a, a \ add a, a
or e
ld e, a
xor a ; ensure z
.end:
pop bc
ret
; Parse string at (HL) as a binary value (010101) without the "0b" prefix and
; return value in E. D is always zero.
; HL is advanced to the character following the last successfully read char.
; Sets Z on success.
parseBinaryLiteral:
ld de, 0
.loop:
ld a, (hl)
add a, 0xff-'1'
sub 0xff-1
jr c, .end
rlc e ; sets carry if overflow, and affects Z
ret c ; Z unset if carry set, since bit 0 of e must be set
add a, e
ld e, a
inc hl
jr .loop
.end:
; HL is properly set
xor a ; ensure Z
ret
; Parses the string at (HL) and returns the 16-bit value in DE. The string
; can be a decimal literal (1234), a hexadecimal literal (0x1234) or a char
; literal ('X').
; HL is advanced to the character following the last successfully read char.
;
; As soon as the number doesn't fit 16-bit any more, parsing stops and the
; number is invalid. If the number is valid, Z is set, otherwise, unset.
parseLiteral:
ld de, 0 ; pre-fill
ld a, (hl)
cp 0x27 ; apostrophe
jr z, .char
; inline parseDecimalDigit
add a, 0xc6 ; maps '0'-'9' onto 0xf6-0xff
sub 0xf6 ; maps to 0-9 and carries if not a digit
ret c
; a already parsed so skip first few instructions of parseDecimal
jp nz, parseDecimalSkip
; maybe hex, maybe binary
inc hl
ld a, (hl)
inc hl ; already place it for hex or bin
cp 'x'
jr z, parseHexadecimal
cp 'b'
jr z, parseBinaryLiteral
; nope, just a regular decimal
dec hl \ dec hl
jp parseDecimal
; Parse string at (HL) and, if it is a char literal, sets Z and return
; corresponding value in E. D is always zero.
; HL is advanced to the character following the last successfully read char.
;
; A valid char literal starts with ', ends with ' and has one character in the
; middle. No escape sequence are accepted, but ''' will return the apostrophe
; character.
.char:
inc hl
ld e, (hl) ; our result
inc hl
cp (hl)
; advance HL and return if good char
inc hl
ret z
; Z unset and there's an error
; In all error conditions, HL is advanced by 3. Rewind.
dec hl \ dec hl \ dec hl
; NZ already set
ret
; Returns whether A is a literal prefix, that is, a digit or an apostrophe.
isLiteralPrefix:
cp 0x27 ; apostrophe
ret z
; continue to isDigit
; Returns whether A is a digit
isDigit:
cp '0' ; carry implies not zero for cp
ret c
cp '9' ; zero unset for a > '9', but set for a='9'
ret nc
cp a ; ensure Z
ret