Collapse OS' Forth implementation notes *** EXECUTION MODEL After having read a line through readln, we want to interpret it. As a general rule, we go like this: 1. read single word from line 2. Can we find the word in dict? 3. If yes, execute that word, goto 1 4. Is it a number? 5. If yes, push that number to PS, goto 1 6. Error: undefined word. *** EXECUTING A WORD At it's core, executing a word is pushing the wordref on PS and calling EXECUTE. Then, we let the word do its things. Some words are special, but most of them are of the compiledWord type, and that's their execution that we describe here. First of all, at all time during execution, the Interpreter Pointer (IP) points to the wordref we're executing next. When we execute a compiledWord, the first thing we do is push IP to the Return Stack (RS). Therefore, RS' top of stack will contain a wordref to execute next, after we EXIT. At the end of every compiledWord is an EXIT. This pops RS, sets IP to it, and continues. *** Stack management The Parameter stack (PS) is maintained by SP and the Return stack (RS) is maintained by IX. This allows us to generally use push and pop freely because PS is the most frequently used. However, this causes a problem with routine calls: because in Forth, the stack isn't balanced within each call, our return offset, when placed by a CALL, messes everything up. This is one of the reasons why we need stack management routines below. IX always points to RS' Top Of Stack (TOS) This return stack contain "Interpreter pointers", that is a pointer to the address of a word, as seen in a compiled list of words. *** Dictionary A dictionary entry has this structure: - Xb name. Arbitrary long number of character (but can't be bigger than input buffer, of course). not null-terminated - 2b prev offset - 1b size + IMMEDIATE flag - 2b code pointer - Parameter field (PF) The prev offset is the number of bytes between the prev field and the previous word's code pointer. The size + flag indicate the size of the name field, with the 7th bit being the IMMEDIATE flag. The code pointer point to "word routines". These routines expect to be called with IY pointing to the PF. They themselves are expected to end by jumping to the address at (IP). They will usually do so with "jp next". That's for "regular" words (words that are part of the dict chain). There are also "special words", for example NUMBER, LIT, FBR, that have a slightly different structure. They're also a pointer to an executable, but as for the other fields, the only one they have is the "flags" field. *** System variables There are some core variables in the core system that are referred to directly by their address in memory throughout the code. The place where they live is configurable by the RAMSTART constant in conf.fs, but their relative offset is not. In fact, they're mostlly referred to directly as their numerical offset along with a comment indicating what this offset refers to. This system is a bit fragile because every time we change those offsets, we have to be careful to adjust all system variables offsets, but thankfully, there aren't many system variables. Here's a list of them: RAMSTART INITIAL_SP +02 CURRENT +04 HERE +06 IP +08 FLAGS +0a PARSEPTR +0c CINPTR +0e WORDBUF +2e SYSVNXT +4e INTJUMP +51 MMAPPTR +53 RESERVED +60 SYSTEM SCRATCHPAD +80 RAMEND INITIAL_SP holds the initial Stack Pointer value so that we know where to reset it on ABORT CURRENT points to the last dict entry. HERE points to current write offset. IP is the Interpreter Pointer FLAGS holds global flags. Only used for prompt output control for now. PARSEPTR holds routine address called on (parse) CINPTR holds routine address called on C< WORDBUF is the buffer used by WORD SYSVNXT is the buffer+tracker used by (sysv) INTJUMP All RST offsets (well, not *all* at this moment, I still have to free those slots...) in boot binaries are made to jump to this address. If you use one of those slots for an interrupt, write a jump to the appropriate offset in that RAM location. MMAPPTR: Address behind (mmap), which is called before every !/C!/@/C@ world to give the opportunity to change the address of the call. SYSTEM SCRATCHPAD is reserved for temporary system storage or can be reserved by low-level drivers. These are the current usages of this space throughout the project: * 0x60-0x62: (c<) pointer during in-memory initialization (see below) * 0x62-0x6a: ACIA buffer pointers in RC2014 recipes. *** Initialization sequence On boot, we jump to the "main" routine in boot.fs which does very few things. It sets up the SP register, CURRENT and HERE to LATEST (saved in stable ABI), then look for the BOOT word and calls it. In a normal system, BOOT is in icore and does a few things: 1. Find "(parse)" and set "(parse*)" to it. 2. Find "(c<)" a set CINPTR to it (what C< calls). 3. Write LATEST in SYSTEM SCRATCHPAD ( see below ) 4. Find "INIT". If found, execute. Otherwise, execute "INTERPRET" On a bare system (only boot+icore), this sequence will result in "(parse)" reading only decimals and (c<) reading characters from memory starting from CURRENT (this is why we put CURRENT in SYSTEM SCRATCHPAD, it tracks current pos ). This means that you can put initialization code in source form right into your binary, right after your last compiled dict entry and it's going to be executed as such until you set a new (c<). Note that there is no EMIT in a bare system. You have to take care of supplying one before your load core.fs and its higher levels. Also note that this initialization code is fighting for space with HERE: New entries to the dict will overwrite that code! Also, because we're barebone, we can't have comments. This leads to peculiar code in this area. If you see weird whitespace usage, it's probably because not using those whitespace would result in dict entry creation overwriting the code before it has the chance to be interpreted. *** Memory maps We have a mechanism to map memory ranges to something else. We call this memory maps. There is a reserved address in memory for a memory mapping routine. The word (mmap*) returns that address. By default, it's zero which means no mapping. Each call to @, C@, ! or C! call that word, if nonzero, before executing. This allows you to do pretty much anything. Try to be efficient in your programming, however, because those words are called *very* often. Here's a toy example of memory map usage: > 8 0x8000 DUMP :00 0000 0000 0000 0000 ........ > : foo DUP 0x8000 = IF 2 + THEN ; > ' foo (mmap*) ! > 8 0x8000 DUMP :00 0000 0000 0000 0000 ........ > 0x1234 0x8000 ! > 8 0x8000 DUMP :00 3412 3412 0000 0000 4.4..... > 0 (mmap*) ! > 8 0x8000 DUMP :00 0000 3412 0000 0000 ..4..... >