Eigenratios of Self-Interpreters: A self-written, self-modifying self-interpreter for a subleq based OISC machine!

In one of my posts to the ICFP list a while ago I threw in a couple of questions about eigenratios. One question was:
What does an eigenratio really tell us anyway?
and Antonio Vargas replied,

That a OISC machine with only sub-and-jump-if-negative instruction is __measurably__ crappier than a real machine...

Sometime after that I started thinking that it might be an interesting challenge to write a self-interpreter for a OISC machine and see how well Antonio's answer stacked up. More recently, after convincing myself that the um.um self-interpreter could be described "exactly" in matrix form (by using multiple rows/columns for some instructions), the OISC idea resurfaced, also because I thought all those potential branches (one for "subleq" instruction) could be a good test of the idea of splitting instructions into multiple rows/columns in the matrix, one for each variation.

I didn't really expect much else from the exercise. So I set to work and all went well for a while. In fact, after I'd gained some experience with "thinking in subleq" I reached a stage where I thought I was getting near to having a working self-interpreter, and it didn't seem particularly large or complex. I was even starting to think the eigenratio might eventually turn out to be quite low! Then I hit the wall. How to achieve an indirect copy with subleq? (In other words, something like "*A = **B" in "C".) It took me a while before finally realising that the only solution seemed to require self-modifying code. Ha! In fact, it turned out that I had to use the same technique several times, with self-modifying code modifying another chunk later. So in the end it wasn't nearly as straightforward as I first thought it might be! I guess OISC machines might not be the wave of the future after all...

I'm still working on building "the matrix" for this interpreter. My first attempt was a 2 x 2 matrix, with one row/column for each for the branching and non-branching variations of the "subleq" instruction. This is what I ended up with:

|28 23|
|13 12|

This implied an eigenratio of about 39.0526, but testing the self-interpreter directly seemed to show the value should be a little higher, more like 39.0606. When checking my working more carefully I realised there were possibly other "variations" that needed to be considered if I wanted an exact result, so now I'm working on an 8 x 8 matrix to achieve that. I'll post that once I'm sure I've got the correct figures. The current version is apparently very "close" but when I use it directly to predict the exact counts of instructions executed for a "tower of self-interpreters" of some height, the results are slightly different from what I get by direct measurement in my underlying Python interpreter.

Anyway, I'm not sure if anybody else has attempted anything quite as silly as a "OISC" self-interpreter before so I've decided to post my (assembler) code for this interpreter here. So here, in all its "glory", is my self-written, self-modifying self-interpreter! (Possibly even a world first" for an OSIC?! :-) There are almost certainly improvements that can be made (which I leave as an exercise for the more "subleq" capable coders), but as far as I can tell it does at least work properly!


# Self interpreter for OISC ("One Instruction Set Computer") using "subleq".
#
# This uses "subleq" as the "one instruction" where A, B and C are memory
# addresses, and in the explanations and comments [A] means the integer value
# stored at address A. Memory is a zero based array of integers of some
# suitable "width". Signed 32 bit values are perfectly adequate.
#
# A single "subleq" instruction written like this
#
#    subleq A B C
#
# and is stored in memory as three addresses A B and C in consecutive
# locations. The following pseudocode describes what happens when one
# instruction is executed.
#
#    [B] = [B] - [A]
#    if [B] <= 0 goto C
#
# Note that value of C is taken to be the value before the subtraction.
#
# Omitting the C operand (i.e. writing just "subleq A B") is understood to be
# shorthand for specifying C as the address of the next instruction. When the
# program starts running, the program counter is 0. It halts when a branch
# to a negative address happens.
#
# This self-interpreter expects the program that it should run to appear
# directly after its own code/data in memory.
#
# Written by Clive Gifford, 29/30 August 2006 based on the Wikipedia
# description of a OISC using the "subleq" instruction.

start:

# [A1] = [PC]

      subleq A1   A1
      subleq PC   ZERO
      subleq ZERO A1
      subleq ZERO ZERO

# Code below (after modification from above) is equivalent to [A] = [[A1]]

      subleq A    A
A1:   data   0
      data   ZERO
      data   A2
A2:   subleq ZERO A
      subleq ZERO ZERO

# [A] = [A] + [LEN]

      subleq LEN  ZERO
      subleq ZERO A
      subleq ZERO ZERO

# [B1] = [PC] + 1

      subleq B1   B1
      subleq PC   ZERO
      subleq ZERO B1
      subleq ZERO ZERO

      subleq NEG1 B1

# Code below (after modification from above) is equivalent to [B] = [[B1]]

      subleq B    B
B1:   data   0
      data   ZERO
      data   B2
B2:   subleq ZERO B
      subleq ZERO ZERO

# [B] = [B] + [LEN]

      subleq LEN  ZERO
      subleq ZERO B
      subleq ZERO ZERO

# We have to copy [C] now, just in case the emulated subtraction modifies [[PC] + 2].
# [C1] = [PC] + 2

      subleq C1    C1
      subleq PC    ZERO
      subleq ZERO  C1
      subleq ZERO  ZERO

      subleq NEG2  C1

# Code below (after modification from above) is equivalent to [C] = [[C1]]

      subleq C    C
C1:   data   0
      data   ZERO
      data   C2
C2:   subleq ZERO C
      subleq ZERO ZERO

# [[B]] = [[B]] - [[A]];
# if [[B]] <= 0 goto LEQZ

# Earlier code referring to addresses A and B has modified the next
# couple of words to create the equivalent of "subleq [A] [B] LEQZ"
# This is where we're "emulating" the subtraction...

A:    data   0
B:    data   0
      data   LEQZ

# [PC] += 3

      subleq NEG3  PC
      subleq ZERO  ZERO START

# We come to address LEQZ when the emulated subleq is going to take
# the branch to the emulated address C.

LEQZ:

# First save the "raw" value in PC
# [PC] = [C]

      subleq PC    PC
      subleq C     ZERO
      subleq ZERO  PC
      subleq ZERO  ZERO

# Check if [C] is less than zero. Halt if it is. 
# We don't care about changing [C] as we've already copied the value to [PC] above.

      subleq NEG1  C     -1

# [PC] = [PC] + [LEN]

      subleq LEN   ZERO
      subleq ZERO  PC
      subleq ZERO  ZERO  START

NEG3: data  -3
NEG2: data  -2
NEG1: data  -1
ZERO: data   0

C:    data   0
PC:   data   PROG
LEN:  data   PROG

# Code for the program to be interpreted must start at the PROG address...
PROG:

And that all turns into an "executable" containing the following values (arranged as triples for the addresses that will be interpreted as subleq instructions, and one value per line for the other data at the end):


 15  15   3
134 132   6
132  15   9
132 132  12
 96  96  15
  0 132  18
132  96  21
132 132  24
135 132  27
132  96  30
132 132  33
 51  51  36
134 132  39
132  51  42
132 132  45
131  51  48
 97  97  51
  0 132  54
132  97  57
132 132  60
135 132  63
132  97  66
132 132  69
 87  87  72
134 132  75
132  87  78
132 132  81
130  87  84
133 133  87
  0 132  90
132 133  93
132 132  96
  0   0 105
129 134 102
132 132   0
134 134 108
133 132 111
132 134 114
132 132 117
131 133  -1
135 132 123
132 134 126
132 132   0
 -3
 -2
 -1
  0
  0
136
136

To build a tower of this interpreter, just concatenate as many copies as you desire, and finish it off with the "top level program" code to be executed.

Eigenratios of Self-Interpreters

Thursday, August 31, 2006

A self-written, self-modifying self-interpreter for a subleq based OISC machine!

0 Comments:

About Me

Previous Posts