1
0
mirror of https://github.com/hsoft/collapseos.git synced 2024-11-23 12:38:05 +11:00

doc: take implementation notes out of blkfs

This commit is contained in:
Virgil Dupras 2020-08-16 08:08:23 -04:00
parent 5c4bbaabf4
commit d03d93668f
21 changed files with 225 additions and 250 deletions

View File

@ -1,7 +1,6 @@
MASTER INDEX
30 Dictionary
70 Implementation notes 100 Block editor
30 Dictionary 100 Block editor
120 Visual Editor 150 Extra words
200 Z80 assembler 260 Cross compilation
280 Z80 boot code 350 Core words

View File

@ -1,6 +0,0 @@
Implementation notes
71 Execution model 73 Executing a word
75 Stack management 77 Dictionary
80 System variables 85 Word types
89 Initialization sequence 91 Stable ABI

11
blk/071
View File

@ -1,11 +0,0 @@
EXECUTION MODEL
After having read a line through readln, we want to interpret
it. As a general rule, we go like this:
1. read single word from line
2. Can we find the word in dict?
3. If yes, execute that word, goto 1
4. Is it a number?
5. If yes, push that number to PS, goto 1
6. Error: undefined word.

16
blk/073
View File

@ -1,16 +0,0 @@
EXECUTING A WORD
At it's core, executing a word is pushing the wordref on PS and
calling EXECUTE. Then, we let the word do its things. Some
words are special, but most of them are of the compiledWord
type, and that's their execution that we describe here.
First of all, at all time during execution, the Interpreter
Pointer (IP) points to the wordref we're executing next.
When we execute a compiledWord, the first thing we do is push
IP to the Return Stack (RS). Therefore, RS' top of stack will
contain a wordref to execute next, after we EXIT.
At the end of every compiledWord is an EXIT. This pops RS, sets
IP to it, and continues.

16
blk/075
View File

@ -1,16 +0,0 @@
Stack management
The Parameter stack (PS) is maintained by SP and the Return
stack (RS) is maintained by IX. This allows us to generally use
push and pop freely because PS is the most frequently used.
However, this causes a problem with routine calls: because in
Forth, the stack isn't balanced within each call, our return
offset, when placed by a CALL, messes everything up. This is
one of the reasons why we need stack management routines below.
IX always points to RS' Top Of Stack (TOS)
This return stack contain "Interpreter pointers", that is a
pointer to the address of a word, as seen in a compiled list of
words.
(cont.)

11
blk/076
View File

@ -1,11 +0,0 @@
Stack underflow and overflow: In each native word involving
PSP popping, we check whether the stack is big enough. If it's
not we go in "uflw" (underflow) error condition, then abort.
We don't check RSP for underflow because the cost of the check
is significant and its usefulness is dubious: if RSP isn't
tightly in control, we're screwed anyways, and that, well
before we reach underflow.
Overflow condition happen when RSP and PSP meet somewhere in
the middle. That check is made at each "next" call.

16
blk/077
View File

@ -1,16 +0,0 @@
Dictionary
A dictionary entry has this structure:
- Xb name. Arbitrary long number of character (but can't be
bigger than input buffer, of course). not null-terminated
- 2b prev offset
- 1b size + IMMEDIATE flag
- 1b code pointer (always jumps in the <0x100 range)
- Parameter field (PF)
The prev offset is the number of bytes between the prev field
and the previous word's code pointer.
The size + flag indicate the size of the name field, with the
7th bit being the IMMEDIATE flag. (cont.)

12
blk/078
View File

@ -1,12 +0,0 @@
(cont.) The code pointer point to "word routines". These
routines expect to be called with IY pointing to the PF. They
themselves are expected to end by jumping to the address at
(IP). They will usually do so with "jp next". They are 1b
because all those routines live in the first 0x100 bytes of
the boot binary. The 0 MSB is assumed.
That's for "regular" words (words that are part of the dict
chain). There are also "special words", for example NUMBER,
LIT, FBR, that have a slightly different structure. They're
also a pointer to an executable, but as for the other fields,
the only one they have is the "flags" field.

16
blk/080
View File

@ -1,16 +0,0 @@
System variables
There are some core variables in the core system that are
referred to directly by their address in memory throughout the
code. The place where they live is configurable by the SYSVARS
constant in xcomp unit, but their relative offset is not. In
fact, they're mostly referred to directly as their numerical
offset along with a comment indicating what this offset refers
to.
This system is a bit fragile because every time we change those
offsets, we have to be careful to adjust all system variables
offsets, but thankfully, there aren't many system variables.
Here's a list of them:
(cont.)

16
blk/081
View File

@ -1,16 +0,0 @@
SYSVARS FUTURE USES +3c BLK(*
+02 CURRENT +3e A@*
+04 HERE +40 A!*
+06 C<? +42 FUTURE USES
+08 C<* override +51 CURRENTPTR
+0a NLPTR +53 (emit) override
+0c C<* +55 (key) override
+0e WORDBUF +57 FUTURE USES
+2e BOOT C< PTR
+30 IN>
+32 IN(* +70 DRIVERS
+34 BLK@* +80 RAMEND
+36 BLK!*
+38 BLK>
+3a BLKDTY
(cont.)

16
blk/082
View File

@ -1,16 +0,0 @@
CURRENT points to the last dict entry.
HERE points to current write offset.
IP is the Interpreter Pointer
PARSEPTR holds routine address called on (parse)
C<* holds routine address called on C<. If the C<* override
at 0x08 is nonzero, this routine is called instead.
IN> is the current position in IN(, which is the input buffer.
IN(* is a pointer to the input buffer, allocated at runtime.
(cont.)

16
blk/083
View File

@ -1,16 +0,0 @@
C<? is a flag indicating whether a character is waiting in the
input stream. 1 means yes, 0 means no. It is the responsibility
of C<* to update that flag.
WORDBUF is the buffer used by WORD
BOOT C< PTR is used when Forth boots from in-memory
source. See "Initialization sequence" below.
(cont.)

14
blk/084
View File

@ -1,14 +0,0 @@
CURRENTPTR points to current CURRENT. The Forth CURRENT word
doesn't return RAM+2 directly, but rather the value at this
address. Most of the time, it points to RAM+2, but sometimes,
when maintaining alternative dicts (during cross compilation
for example), it can point elsewhere.
NLPTR points to an alternative routine for NL (by default,
CRLF).
BLK* see B416.
FUTURE USES section is unused for now.
DRIVERS section is reserved for recipe-specific drivers.

15
blk/085
View File

@ -1,15 +0,0 @@
Word types
There are 4 word types in Collapse OS. Whenever you have a
wordref, it's pointing to a byte with numbers 0 to 3. This
number is the word type and the word's behavior depends on it.
0: native. This words PFA contains native binary code and is
jumped to directly.
1: compiled. This word's PFA contains an atom list and its
execution is described in "EXECUTION MODEL" above.
2: cell. This word is usually followed by a 2-byte value in its
PFA. Upon execution, the address of the PFA is pushed to PS.
(cont.)

View File

@ -1,6 +0,0 @@
3: DOES>. This word is created by "DOES>" and is followed
by a 2-byte value as well as the address where "DOES>" was
compiled. At that address is an atom list exactly like in a
compiled word. Upon execution, after having pushed its cell
addr to PSP, it executes its reference exactly like a
compiled word.

16
blk/089
View File

@ -1,16 +0,0 @@
Initialization sequence
On boot, we jump to the "main" routine in B289 which does
very few things.
1. Set SP to PS_ADDR and IX to RS_ADDR
2. Sets HERE to SYSVARS+0x80.
3. Sets CURRENT to value of LATEST field in stable ABI.
4. Execute the word referred to by 0x04 (BOOT) in stable ABI.
In a normal system, BOOT is in core words at B396 and does a
few things:
1. Initialize all overrides to 0.
2. Write LATEST in BOOT C< PTR ( see below )
3. Set "C<*", the word that C< calls to (boot<). (cont.)

10
blk/090
View File

@ -1,10 +0,0 @@
4. Call INTERPRET which interprets boot source code until
ASCII EOT (4) is met. This usually init drivers.
5. Initialize rdln buffer, _sys entry (for EMPTY), prints
"CollapseOS" and then calls (main).
6. (main) interprets from rdln input (usually from KEY) until
EOT is met, then calls BYE.
In RAM-only environment, we will typically have a
"CURRENT @ HERE !" line during init to have HERE begin at the
end of the binary instead of RAMEND.

16
blk/091
View File

@ -1,16 +0,0 @@
Stable ABI
Across all architectures, some offset are referred to by off-
sets that don't change (well, not without some binary manipu-
lation). Here's the complete list of these references:
04 BOOT addr 06 (uflw) addr 08 LATEST
13 (oflw) addr 2b (s) wordref 33 2>R wordref
42 EXIT wordref 53 (br) wordref 67 (?br) wordref
80 (loop) wordref bf (n) wordref
BOOT, (uflw) and (oflw) exist because they are referred to
before those words are defined (in core words). LATEST is a
critical part of the initialization sequence.
(cont.)

16
blk/092
View File

@ -1,16 +0,0 @@
Stable wordrefs are there for more complicated reasons. When
cross-compiling Collapse OS, we use immediate words from the
host and some of them compile wordrefs (IF compiles (?br),
LOOP compiles (loop), etc.). These compiled wordref need to
be stable across binaries, so they're part of the stable ABI.
Another layer of complexity is the fact that some binaries
don't begin at offset 0. In that case, the stable ABI doesn't
begin at 0 either. The EXECUTE word has a special handling of
those case where any wordref < 0x100 has the binary offset
applied to it.
But that's not the end of our problems. If an offsetted binary
cross compiles a binary with a different offset, stable ABI
references will be > 0x100 and be broken.
(cont.)

View File

@ -1,3 +0,0 @@
For this reason, any stable wordref compiled in the "hot zone"
(B397-B400) has to be compiled by direct offset reference to
avoid having any binary offset applied to it.

224
doc/impl.txt Normal file
View File

@ -0,0 +1,224 @@
# Implementation notes
# Execution model
After having read a line through readln, we want to interpret
it. As a general rule, we go like this:
1. read single word from line
2. Can we find the word in dict?
3. If yes, execute that word, goto 1
4. Is it a number?
5. If yes, push that number to PS, goto 1
6. Error: undefined word.
# Executing a word
At it's core, executing a word is pushing the wordref on PS and
calling EXECUTE. Then, we let the word do its things. Some
words are special, but most of them are of the "compiled"
type (regular nonnative word), and that's their execution that
we describe here.
First of all, at all time during execution, the Interpreter
Pointer (IP) points to the wordref we're executing next.
When we execute a compiled word, the first thing we do is push
IP to the Return Stack (RS). Therefore, RS' top of stack will
contain a wordref to execute next, after we EXIT.
At the end of every compiled word is an EXIT. This pops RS, sets
IP to it, and continues.
# Stack management
In all supported arches, The Parameter Stack and Return Stack
tops are trackes by a registered assigned to this purpose. For
example, in z80, it's SP and IX that do that. The value in those
registers are referred to as PS Pointer (PSP) and RS Pointer
(RSP).
Those stacks are contiguous and grow in opposite directions. PS
grows "down", RS grows "up".
Stack underflow and overflow: In each native word involving
PS popping, we check whether the stack is big enough. If it's
not we go in "uflw" (underflow) error condition, then abort.
We don't check RS for underflow because the cost of the check
is significant and its usefulness is dubious: if RS isn't
tightly in control, we're screwed anyways, and that, well
before we reach underflow.
Overflow condition happen when RSP and PSP meet somewhere in
the middle. That check is made at each "next" call.
# Dictionary entry
A dictionary entry has this structure:
- Xb name. Arbitrary long number of character (but can't be
bigger than input buffer, of course). not null-terminated
- 2b prev offset
- 1b name size + IMMEDIATE flag (7th bit)
- 1b entry type
- Parameter field (PF)
The prev offset is the number of bytes between the prev field
and the previous word's code pointer.
The size + flag indicate the size of the name field, with the
7th bit being the IMMEDIATE flag.
The entry type is simply a number corresponding to a type which
will determine how the word will be executed. See "Word types"
below.
# Word types
There are 4 word types in Collapse OS. Whenever you have a
wordref, it's pointing to a byte with numbers 0 to 3. This
number is the word type and the word's behavior depends on it.
0: native. This words PFA contains native binary code and is
jumped to directly.
1: compiled. This word's PFA contains an atom list and its
execution is described in "Execution model" above.
2: cell. This word is usually followed by a 2-byte value in its
PFA. Upon execution, the address of the PFA is pushed to PS.
3: DOES>. This word is created by "DOES>" and is followed
by a 2-byte value as well as the address where "DOES>" was
compiled. At that address is an atom list exactly like in a
compiled word. Upon execution, after having pushed its cell
addr to PSP, it executes its reference exactly like a
compiled word.
# System variables
There are some core variables in the core system that are
referred to directly by their address in memory throughout the
code. The place where they live is configurable by the SYSVARS
constant in xcomp unit, but their relative offset is not. In
fact, they're mostly referred to directly as their numerical
offset along with a comment indicating what this offset refers
to.
This system is a bit fragile because every time we change those
offsets, we have to be careful to adjust all system variables
offsets, but thankfully, there aren't many system variables.
Here's a list of them:
SYSVARS FUTURE USES +3c BLK(*
+02 CURRENT +3e A@*
+04 HERE +40 A!*
+06 C<? +42 FUTURE USES
+08 C<* override +51 CURRENTPTR
+0a NLPTR +53 (emit) override
+0c C<* +55 (key) override
+0e WORDBUF +57 FUTURE USES
+2e BOOT C< PTR
+30 IN>
+32 IN(* +70 DRIVERS
+34 BLK@* +80 RAMEND
+36 BLK!*
+38 BLK>
+3a BLKDTY
CURRENT points to the last dict entry.
HERE points to current write offset.
IP is the Interpreter Pointer
PARSEPTR holds routine address called on (parse)
C<* holds routine address called on C<. If the C<* override
at 0x08 is nonzero, this routine is called instead.
IN> is the current position in IN(, which is the input buffer.
IN(* is a pointer to the input buffer, allocated at runtime.
CURRENTPTR points to current CURRENT. The Forth CURRENT word
doesn't return RAM+2 directly, but rather the value at this
address. Most of the time, it points to RAM+2, but sometimes,
when maintaining alternative dicts (during cross compilation
for example), it can point elsewhere.
NLPTR points to an alternative routine for NL (by default,
CRLF).
BLK* see B416.
FUTURE USES section is unused for now.
DRIVERS section is reserved for recipe-specific drivers.
# Initialization sequence
(this describes the z80 boot sequence, but other arches have
a very similar sequence, and, of course, once we enter Forth
territory, identical)
On boot, we jump to the "main" routine in B289 which does
very few things.
1. Set SP to PS_ADDR and IX to RS_ADDR
2. Sets HERE to SYSVARS+0x80.
3. Sets CURRENT to value of LATEST field in stable ABI.
4. Execute the word referred to by 0x04 (BOOT) in stable ABI.
In a normal system, BOOT is in core words at B396 and does a
few things:
1. Initialize all overrides to 0.
2. Write LATEST in BOOT C< PTR ( see below )
3. Set "C<*", the word that C< calls to (boot<).
4. Call INTERPRET which interprets boot source code until
ASCII EOT (4) is met. This usually init drivers.
5. Initialize rdln buffer, _sys entry (for EMPTY), prints
"CollapseOS" and then calls (main).
6. (main) interprets from rdln input (usually from KEY) until
EOT is met, then calls BYE.
In RAM-only environment, we will typically have a
"CURRENT @ HERE !" line during init to have HERE begin at the
end of the binary instead of RAMEND.
# Stable ABI
Across all architectures, some offset are referred to by off-
sets that don't change (well, not without some binary manipu-
lation). Here's the complete list of these references:
04 BOOT addr 06 (uflw) addr 08 LATEST
13 (oflw) addr 2b (s) wordref 33 2>R wordref
42 EXIT wordref 53 (br) wordref 67 (?br) wordref
80 (loop) wordref bf (n) wordref
BOOT, (uflw) and (oflw) exist because they are referred to
before those words are defined (in core words). LATEST is a
critical part of the initialization sequence.
Stable wordrefs are there for more complicated reasons. When
cross-compiling Collapse OS, we use immediate words from the
host and some of them compile wordrefs (IF compiles (?br),
LOOP compiles (loop), etc.). These compiled wordref need to
be stable across binaries, so they're part of the stable ABI.
Another layer of complexity is the fact that some binaries
don't begin at offset 0. In that case, the stable ABI doesn't
begin at 0 either. The EXECUTE word has a special handling of
those case where any wordref < 0x100 has the binary offset
applied to it.
But that's not the end of our problems. If an offsetted binary
cross compiles a binary with a different offset, stable ABI
references will be > 0x100 and be broken.
For this reason, any stable wordref compiled in the "hot zone"
(B397-B400) has to be compiled by direct offset reference to
avoid having any binary offset applied to it.