From 976a93971cdd55dd75e06893089e49497ccb6bb7 Mon Sep 17 00:00:00 2001 From: Virgil Dupras Date: Mon, 27 May 2019 10:38:29 -0400 Subject: [PATCH] zasm: improve docs --- apps/zasm/README.md | 120 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 120 insertions(+) diff --git a/apps/zasm/README.md b/apps/zasm/README.md index 2c06264..45ebe27 100644 --- a/apps/zasm/README.md +++ b/apps/zasm/README.md @@ -12,4 +12,124 @@ provided that you have a copy libz80 living in `emul/libz80`. The resulting `zasm` binary takes asm code in stdin and spits binary in stdout. +## Literals + +There are decimal, hexadecimal and binary literals. A "straight" number is +parsed as a decimal. Hexadecimal literals must be prefixed with `0x` (`0xf4`). +Binary must be prefixed with `0b` (`0b01100110`). + +Decimals and hexadecimal are "flexible". Whether they're written in a byte or +a word, you don't need to prefix them with zeroes. Watch out for overflow, +however. + +Binary literals are also "flexible" (`0b110` is fine), but can't go over a byte. + +There is also the char literal (`'X'`), that is, two qutes with a character in +the middle. The value of that character is interpreted as-is, without any +encoding involved. That is, whatever binary code is written in between those +two quotes, it's what is evaluated. Only a single byte at once can be evaluated +thus. There is no escaping. `'''` results in `0x27`. You can't express a newline +this way, it's going to mess with the parser. + +Then comes our last literal, the string literal. It's a chain of characters +surrounded by double quotes. Example: `"foo"`. This literal can only be used +in the `.db` directive and is equivalent to each character being single-quoted +and separated by commas (`'f', 'o', 'o'`). No null char is inserted in the +resulting value (unlike what C does). + +## Labels + +Lines starting with a name followed `:` are labeled. When that happens, the +name of that label is associated with the binary offset of the following +instruction. + +For example, a label placed at the beginning of the file is associated with +offset 0. If placed right after a first instruction that is 2 bytes wide, then +the label is going to be bound to 2. + +Those labels can then be referenced wherever a constant is expeced. They can +also be referenced where a relative reference is expected (`jr` and `djnz`). + +Labels can be forward-referenced, that is, you can reference a label that is +defined later in the source file or in an included source file. + +Labels starting with a dot (`.`) are local labels: they belong only to the +namespace of the current "global label" (any label that isn't local). Local +namespace is wiped whenever a global label is encountered. + +Local labels allows reuse of common mnemonics and make the assembler use less +memory. + +Global labels are all evaluated during the first pass, which makes possible to +forward-reference them. Local labels are evaluated during the second pass, but +we can still forward-reference them through a "first-pass-redux" hack. + +Labels can be alone on their line, but can also be "inlined", that is, directly +followed by an instruction. + +## Constants + +The `.equ` directive declares a constant. That constant's argument is an +expression that is evaluated right at parse-time. + +Constants are evaluated during the second pass, which means that they can +forward-reference labels. + +However, they *cannot* forward-reference other constants. + + +## Expressions + +Wherever a constant is expected, an expression can be written. An expression +is a bunch of literals or symbols assembled by operators. For now, only `+`, `-` +and `*` operators are supported. No parenthesis yet. + +Expressions can't contain spaces. + +## The Program Counter + +The `$` is a special symbol that can be placed in any expression and evaluated +as the current output offset. That is, it's the value that a label would have if +it was placed there. + +## Includes + +The `#inc` directive is special. It takes a string literal as an argument and +opens, in the currently active filesystem, the file with the specified name. + +It then proceeds to parse that file as if its content had been copy/pasted in +the includer file, that is: global labels are kept and can be referenced +elsewhere. Constants too. An exception is local labels: a local namespace always +ends at the end of an included file. + +There an important limitation with includes: only one level of includes is +allowed. An included file cannot have an `#inc` directive. + +## Directives + +**.db**: Write bytes specified by the directive directly in the resulting + binary. Each byte is separated by a comma. Example: `.db 0x42, foo` + +**.dw**: Same as `.db`, but outputs words. Example: `.dw label1, label2` + +**.equ**: Binds a symbol named after the first parameter to the value of the + expression written as the second parameter. Example: + `.equ foo 0x42+'A'` + +**.fill**: Outputs the number of null bytes specified by its argument, an + expression. Often used with `$` to fill our binary up to a certain + offset. For example, if we want to place an instruction exactly at + byte 0x38, we would preceed it with `.fill 0x38-$`. + +**.org**: Sets the Program Counter to the value of the argument, an expression. + For example, a label being defined right after a `.org 0x400`, would + have a value of `0x400`. Does not do any filling. You have to do that + explicitly with `.fill`, if needed. Often used to assemble binaries + designed to run at offsets other than zero (userland). + +**#inc**: Takes a string literal as an argument. Open the file name specified + in the argument in the currently active filesystem, parse that file + and output its binary content as is the code has been in the includer + file. + [libz80]: https://github.com/ggambetta/libz80