Eigenratios of Self-Interpreters

Friday, November 30, 2007

Searching for the Holy (Golden?) Grail of Eigenratios

When do mathematicians typically first learn about continued fractions?

I've had some tertiary level training in pure maths (not that you can probably tell!) but I'm pretty sure these beautiful creatures never popped up in any serious way in any of my courses, and ditto for my earlier schooling. They aren't complete strangers to me, but I don't think I'd ever thought much beyond "wow, that's a strange and beautiful thing" when I have bumped into one previously. At least up until a few days ago when I started looking a bit harder and discovered all sorts of curious and interesting facts about them. Here are two examples of infinite continued fractions:

$\!\ \sqrt{2} = 1 + \frac{1}{2 + \frac{1}{2 + \frac{1}{2 + \cdots}}}.$

and

$<br />\pi=3 + \cfrac{1}{6 + \cfrac{9}{6 + \cfrac{25}{6 + \cfrac{49}{6 + \cfrac{81}{6 + \cfrac{121}{\ddots\,}}}}}}<br />$

In my previous post I talked a bit about Alex Smith's proof that the Wolfram Prize 2 state, 3 symbol Turing machine was universal. His proof relies on an infinite non-periodic initial pattern on the tape. Since then Alex has come up with a language called 1cnis with the idea that this could in some sense formalise the kind of patterns that he thinks should be allowed on the initial tape. Perhaps we can even think of it as a kind of compiler, taking a finite source description and turning that into a infinite amount of machine language for the 2,3 machine. Talk about code bloat!

Anyway, Alex's email about 1cnis reminded me of some idle thoughts I'd had a few days earlier about a possible hierarchy of classes of initial tape patterns. We could imagine that the traditional Turing machine blank background could be seen as the infinite decimal expansion of the value zero. And other non-blank periodic backgrounds can be viewed as the decimal expansion of a simple fraction. When I'd been thinking along these lines I'd thought that perhaps a continued fraction of some kind might be the next most general class of possible background pattern.

Well, you can't realistically use any old continued fraction because every real number can be represented as a continued fraction and many of those are non-computable. But some real numbers have regular periodic representations as continued fractions - for example, the positive square root of any natural number that isn't a perfect square. These kinds of infinite continued fractions (with some kind of regular or repeating pattern) are clearly computable real numbers. In some vague sense the idea of continued fractions made me think of the kind of pattern used by Alex Smith's proof although viewing it as a sum of an infinite sequence is probably far more natural.

Changing course slightly, I also exchanged a couple of emails with "r.e.s." recently after spotting a message from him about some small universal tag systems and details on how to create them. As Alex Smith's proof also went via Cyclic Tag Systems I'd got this crazy idea into my head that it might be fun to try to write a self-interpreter for one of the variants of tag systems directly, and so the information posted by r.e.s. caught my eye.

You can view a specific encoding scheme for some particular universal machine (e.g. a universal cyclic tag system with its set of productions, or a particular universal Turing machine) as a specification for a programming language. We can use that language to write a specific program to compute some function. That program can be then be understood as a description of another machine (along with appropriate data) that is purpose-built to do the required computation. So, while some "languages" already exist for some universal tag systems (and corresponding "self-interpreters" could also be realised), I'm pretty sure they are not the most efficient (because they go via an emulation of some Turing machine, which in turn emulates a tag system. I'm interested in finding the most efficient self-interpreter (in the sense that it has the lowest eigenratio) and it seems likely to me that would be a more direct encoding system. In other words, I want to find the encoding scheme that allows the most efficient implementation of a universal tag system - one that can "directly" emulate any other tag system - and measuring efficiency by looking at the asymptotic limit of the increase in runtime as the system emulates additional copies of itself and some other fixed program/data "on top".

To slightly paraphrase a comment from "r.e.s.", my goal is "to find a Golden Implementation in a Golden Language, for which the lower bound is the Golden Ratio". Ha ha! I had mentioned this blog to "r.e.s." - he has had a look and must have eventually found "Clive's Conjecture". This was a suggestion that the lower bound for any eigenratio for any language/system self-interpreter was the Golden Ratio. I really just plucked that number out of thin air, more "for fun" than based on any serious reasoning or belief. I did consider a few other candidates at the time, e and 2 for example. But some of the properties of the Golden Ratio that I knew about seemed vaguely "right" and so that eventually swung it for me.

That was a all a long time ago, so the funny thing (to me) is that it was only a month or two ago that I finally stumbled over the fact that there is a very nice small matrix that has precisely the Golden Ratio as its dominant eigenvalue. So if someone can find a computational system with a self-interpreter that is "described" by this matrix then my conjecture might actually start to look slightly more serious!

Anyway, for now I think I'll call this the Golden Matrix. It can be understood as describing how to get the next term of the Fibonacci sequence from the previous two. Just use it to multiply a column vector containing F_k+1 as the first element and F_k as the second to get a new vector with F_k+2 and F_k+1.

${F_{k+2} \choose F_{k+1}} = \begin{pmatrix} 1 & 1 \\ 1 & 0 \end{pmatrix} {F_{k+1} \choose F_{k}}$

As k goes to infinity, the limit of F_k+1/F_k is the Golden Ratio. This was one of the things I knew when I made my "conjecture" but I didn't find out about the connection to the matching Golden Matrix representation until much more recently!

So lets have a look at the continued fraction representation of the Golden Ratio (Phi):

$\varphi = [1; 1, 1, 1, \dots] = 1 + \cfrac{1}{1 + \cfrac{1}{1 + \cfrac{1}{1 + \ddots}}}$

Isn't that the clincher?! How can anything so beautiful not be the answer?

Tuesday, November 13, 2007

A very small Turing machine, some very big numbers, and an oracle!

A few weeks ago, Wolfram Research announced that Alex Smith had won the $25000 prize (first offered some months earlier) for a proof that a particular 2 state, 3 colour (symbol) Turing machine was universal. (He could also have won the prize for proving it was not universal.) If this proof holds up, then this 2,3 machine is as "small" as any universal Turing machine can be, as smaller machines are already known to not be capable of being universal.

Note though that there were some slightly unusual and fuzzy areas in the competition description: for one thing the prize machine has no explicit halt state and so doesn't really qualify as a "classical" Turing machine, and also the definition of "universal" was left a bit open ended.

And then, it also turns out that Alex Smith's proof pushed the boundaries of what has been previously generally accepted as "the rules" for Turing machines. In particular his proof relies on the construction of an infinite non-periodic initial configuration. Classical Turing machine descriptions require the equivalent of infinite "blank" cells either side of a finite region (where the real input data is "written"), although periodic patterns have also been explored more recently and may be considered acceptable by some. So, one issue with Alex's proof that has caused some ongoing debate is whether or not his infinite and non-periodic initial tape configuration isn't "cheating" in some way - if you're not extremely careful, an initial tape configuration like this can actually be helping the machine act as if it was universal even though it really isn't.

But putting that aside, another thing that really bothered me about Alex's proof (and the particular style of infinite initial configuration it requires) was that it claimed to show us a "universal Turing machine", yet this machine had no way to emulate a copy of itself! That apparently didn't matter to the judges, but how could I ever hope to calculate an eigenratio if there wasn't a way to construct a self-interpreter to start with?! :-)

All the controversy about Alex's proof has been concerned with the infinite non-periodic nature of his initial tape configuration, and exactly how it is "constructed". But there's an important first step in Alex's proof (before it goes anywhere near the construction of the infinite initial configuration), and that is where he shows how to construct a finite initial configuration for this machine. Each such finite initial configuration will emulate a given cyclic tag system for any particular finite number of "cycles". So, if you only need to run that cyclic tag system for, say, a billion cycles then Alex's proof can show you how to set up the appropriate finite initial configuration. You might have a more difficult system to emulate though - perhaps you need a googol of cycles to be sure there's enough juice in the tank to do the required computations. No problem again. And we can keep on going, there is no finite upper limit. Emulating a googleplex of cycles of your cyclic tag system is really just about as easy. Just plug "googolplex" into Alex's program for generating finite initial configurations and you'll have your answer "shortly". :-)

Now all of this might make you think that Alex has already shown in the first part of the proof how this machine can emulate "any number" of cycles (with finite non-blank initial tape content) and so is already "good enough" to be called universal. The trouble is that we don't (in general) know in advance exactly how many cycles the cyclic tag system will execute to do what it needs to do. That's the whole thing with these types of systems - knowing how many steps they'll need to finish, or even if they'll ever stop running at all is undecidable!

But maybe there's another way to look at this. It turns out that Turing machines (and other similar computational models such as cyclic tag systems) have their own version of what is known as the Busy Beaver Problem. You can follow the link for more details but basically there is a maximum number of steps or cycles for each "size" of one of these machines, such that if a particular "run" ever goes past that number of steps or cycles then we also know it will never halt. (It's stuck in some kind of infinite loop for example.) So if we get to that point and the machine hasn't stopped, then we can be absolutely confident that it will never stop by itself, and just "pull the plug" instead (to "save some time" as it were)! Now, these Busy Beaver type problems produce very very very very big numbers in general. And that's still an understatement! In fact, trying to imagine them makes me feel a little bit queasy! Worse, they aren't computable (not even in a theoretical sense - unless the Church-Turing thesis turns out to be false and some form of effective computing system more powerful than Turing machines is discoved), but we do know they exist. (Actually, some of the Busy Beaver values are known, but these are only for the very smallest sizes of a few machines.) In any case, my main point is that these Busy Beaver values exist!

When thinking about this yesterday I realised it also means that for any particular cyclic tag system, we know there is always some number of cycles that we can run it for (including via emulation by the 2,3 machine), and we can be sure it will have produced a "usable result" by the end, if not earlier. Either the cyclic tag system will have terminated by that stage, in which case we can read off the result, or if it hasn't terminated by then we know it also never will. And we also know that the initial configuration of the tape for this massive emulation is also guaranteed to be finite. The only catch is that we don't know how to work out this Busy Beaver number for any non-trivial cyclic tag system, and so we also can't actually produce the finite initial configuration!

But just knowing it exists is enough for me - there is a finite configuration out there somewhere, generated directly from a very large (and non-computable) number, and any higher value will do as well for that matter. If we could find an oracle to tell us that magic number, then we could actually construct the initial configuration "simply" by just plugging it into Alex's Perl script, although we might need to add some more RAM to our machine first!

Anyway, now that we know a suitable finite initial pattern always exists, we can (in a hypothetical sense) build a self-interpreter for this machine. As for determining its eigenratio, I'm leaving that as "an exercise for the reader". But be warned, I suspect it could be rather large! To get you started, working backward, I think you'll have to do something like (a) find a classical universal Turing machine and set it up to emulate the prize 2,3 machine and it's (finite) data tape (which could be emulating some arbitrary cyclic tag system for example, but not necessarily), then (b) construct a tag system to emulate that Turing machine and matching input, (c) construct a cyclic tag system to emulate the tag system, and finally use Alex Smith's proof (and your pet oracle again) to show how to set-up the finite initial configuration that allows the prize 2,3 machine to emulate the whole circular contraption for "enough" cycles...

PS: if you want to read more about big numbers, then you may want to read Scott Aaronson's "Who Can Name the Bigger Number?".

Monday, September 03, 2007

A Self-Interpreter for Conway's "Game of Life"

John Horton Conway's "Game of Life" was one of many amazing things Martin Gardner wrote about in his Mathematical Games column in Scientific American a few decades ago. I used to read those columns as a schoolboy in the 70s, and "Life" was something I found totally fascinating back then and have revisited periodically every since. It was probably a significant influence on me eventually heading into a career in computer programming. Back in 1982 I even ended up writing a Z80 assembly language implementation on my trusty TRS-80 Model I. I tried to sell it but the software distributor wasn't interested, claiming they already had several other versions of something similar from others. But I still have a printout of that code somewhere - my one and only significant programming project done all in assembly! (Copies available on request. :-)

Anyway, ever since starting this blog, a thought has been lurking in the back of my mind about the possibility of a self-interpreter for "Life". I imagined this would be a rather large pattern that could be used to tile the entire Life universe, and each such copy of the pattern would (somehow) emulate one cell of Life. If you can work out how to do that, then you can have that emulation run another set of copies of the same pattern, and on, ad infinitum! Something led me back to this again in the last few days and I started trawling the web to see if it had already been done. Although I didn't find anything in that vein, I did find a bunch of other amazing stuff that various people have done in more recent years, including Paul Rendell's construction of a Turing machine in Life. So I decided to contact him to ask if he knew of any existing pattern of the kind I was interested in, and if he had any advice to offer a newbie Life pattern constructor if not. Today I received his reply, pointing me to David Bell's "Unit Cell" pattern, created over ten years ago.

I have to admit to feeling some disappointment, because I'd started to hope that nobody else had thought of this idea and perhaps I could even be the first to construct it. Hah! It actually turned out the pattern was bundled in a collection that comes with Golly, an open source life simulator that I already had on my machine - I just hadn't looked hard enough through all those sample patterns and obviously also hadn't searched for the right terms when using Google! By the way, Golly is amazingly fast - I concede perhaps even a tad quicker than my old Z80 version - but no doubt only because of the super clever Hashlife algorithm it uses! I don't have a full understanding of this algorithm yet but it seems it was invented by Bill Gosper about the same time I was writing my Z80 code. Obviously I was talking to the wrong people back then, although implementing Hashlife in 16K of RAM and in assembler might have been somewhat challenging also!

Dragging myself back on topic again, Life's "Self-Interpreter" has already been created, and it's a thing of beauty. Grab Golly and have a look - in the version 1.2 distribution the pattern is second to last in the Signal-Circuitry group. The pattern below that ("Deep Cell") turns out to be a modified version that actually emulates two independent Life cells at the same time (can't say I've seen that trick in any other self-interpreters!) and a bit more poking around also turned up Brice Due's Metapixel pattern, a more recent and further major development that lets Life emulate a whole class of other cellular automata that use different rules as well as itself. I also want to mention Jason Summers' "unit cell" that emulates Wolfram's Rule 110 which is known to be universal. In fact, there's so much cool stuff in this whole area that I hardly know where to start or end.

But the main point as far as this blog goes is that "Life" has a self-interpreter, and it's eigenratio is exactly 5760! (That's the number of generations that need to be run in Dean Bell's unit cell at level N in order to emulate a single generation in level N+1.) One interesting aspect of this is that the ratio is the same between all levels of any such similar Life "self-interpreter" that you might want to stack on top of each other - rather than just being the limit as the number of levels goes to infinity as in self-interpreters for more conventional computational systems. I'm not quite sure what to think of the corresponding matrix though - I guess it is just a scalar value in this case?

Anyway, I must go and play with Golly again. I can only recommend that you do the same!

Saturday, September 01, 2007

When is a self-interpreter not a "real" self-interpreter?

I started this blog about a year ago, after participating in the 2006 ICFP Programming Contest. The 2007 contest was run over a 72 hour period on 20 - 23 July, and so I tried my hand once again along with a little assistance from my regular team mate. Thanks Shaun, and Happy Birthday blog!

Unfortunately, like many others in this year's contest we didn't progress beyond a fairly minimal level. A slice of 50th place out of some hundreds of entries might not sound too bad but it was really the start of a very long flat tail on the scoreboard. And in my case this was due (in large part) to a very stupid bug and my inability to find and fix it fast enough. (I have since honed my skills with Python code testing and checking tools, set up a directory ready for next year, and sworn to do it right in 2008!)

But putting all that aside, the contest task was quite similar to that from last year in many ways. However, instead of the UM language from 2006, this time we had to interpret "Fuun DNA". The Fuun DNA language is a kind of string rewriting system, and sometime after the contest was over (and also after I had spent huge amounts of time trying to progress further with only limited success!) I suggested that perhaps someone should write a self-interpreter for Fuun DNA. Jochen Hoenicke promptly did this - he's the smart guy who was also the first person acknowledged by the organisers to have generated the exact output image required by this year's contest, albeit also about a month after the 72 hour contest closed! I should also say that he knows this first shot at a self-interpreter has a couple of bugs/limitations. You can find the "source" and "DNA" for it here (in selfinterpreter.*).

But there was also some discussion (see the comments for this post on Max Rabkin's blog) about what a self-interpreter for Fuun DNA actually needs to do! The language doesn't support input directly, but appending the "input" onto the end of the "DNA" representing your self-interpreter seems a reasonable way to achieve this. Because the next "pattern/template" instruction executed is always read from the start of the current DNA string, this also means you could put any kind of NOP program code in front of the DNA to be interpreted and claim to have a self-interpreter! This doesn't gel with my notion of a self-interpreter at all, and so I've ended up revisiting various definitions in an attempt to find a clear-cut definition of a self-interpreter that would disallow "tricks" such as this.

Most definitions seem pretty basic. Something like this (from the Wikipedia entry):

A self-interpreter is a programming language interpreter written in the language it interprets.

So then we need to understand what a "programming language interpreter" is. From the relevant Wikipedia entry for "interpreter" we are told:

...an interpreter is a computer program that executes, or performs, instructions written in a computer programming language

It then goes on to talk about the structure of a self-interpreter:

An interpreter needs to be able to analyze, or parse, instructions written in the source language. It also needs to represent any program state, such as variables or data structures, that a program may create. It needs to be able to move around in the source code when instructed to do so by control flow constructs such as loops or conditionals. Finally, it usually needs to interact with an environment, such as by doing input/output with a terminal or other user interface device.

This is all well and good, and sits pretty comfortably with my understanding of what a self-interpreter should normally do. It would also seem to exclude empty strings and other variations on that theme, and I'm happy about that also! However it's all still rather informal.

In their paper "A Very Short Self_Interpreter" (describing their brainfuck self-interpreter), Oleg Mazonka and Daniel B. Cristofani add a few more conditions. (Note that this extended definition may not be widely accepted as being correct.)

Here we define self-interpreter as an interpreter which is written in the same language it interprets, and which also meets these three requirements:

1. the language must be Turing-complete;
2. the behavior of programs when interpreted by a self-interpreter must be the same as their behavior when interpreted by any other interpreter, i.e. the two must produce identical observable output for any legitimate input, though not necessarily at the same speed; and
3. the self-interpreter must not use language constructs which are designed to facilitate recognition and interpretation of these and other language constructs (self-interpretation), e.g. LISP's eval.

The second requirement looks like a tautology saying that a self-interpreter is an interpreter. Nevertheless this requirement rules out languages whose pragmatics are not sufficient to permit a self-interpreter. Later we show that not every Turing-complete language can have a self-interpreter in this sense. This property forms a distinction between languages: some languages are environmentally complete, while others are not. Thus, Turing-completeness comes from the semantics of the language and environmental completeness from its pragmatics.

Item 3 seems the least well defined to me, and particularly when you look at languages such as that defined by this years ICFP Contest, the "Fuun DNA". Being a string rewriting system, almost every aspect of that language seems to fit in item 3! Now, I must admit I haven't checked this very carefully but I think that Jochen's self-interpreter basically works by parsing the code to be interpreted just enough so that the next "instruction" can be wrapped within another instruction which is then placed at the front of current "DNA" where it will be the next thing executed. The wrapper skips past the rest of the self-interpreter before executing the wrapped instruction more or else in the original environment that it was copied from. Repeat until done. Is that "cheating"? I suspect a panel of judges would say it was, but perhaps only by a majority decision, not unanimously. And just to be clear, this is not meant to be a criticism of what Jochen did! In fact I think he wrote it that way mainly to demonstrate how to circumvent a different idea put forward earlier by someone else to stop "cheating".

It seems to me that most languages could be described in terms of some number of atomic units of computation. For the Fuun DNA, these include the ability to match specific bases, the ability to skip past some fixed number of bases, to search for a particular sequence and so on. The language is defined in terms of these basic things that it is able to do. In my opinion, a self-interpreter needs to be emulating the target language/program at least down to that level, even if it is technically possible to take short-cuts. I guess this is really what the last item in Oleg and Daniel's requirements is trying to get at.

Even when you get down to that level of detail, there are choices to be made. Once again the Wikipedia article talks about this in some detail so I'll just quote it again:

There is an important design dimension in the implementation of a self-interpreter, namely whether a language feature in the interpreted language is implemented by using the same feature in the implementation language of the interpreter (the host language). A typical example is whether a closure in a Lisp-like language is implemented using closures in the interpreter language or implemented "manually" using a data structure that stores the environment explicitly. The more features are implemented by the same feature in the interpreter language, the less control the programmer of the interpreter has. For example, a different behavior for dealing with number overflows cannot be realized if the arithmetic operations are just delegated to the corresponding operations in the host language.

Of course, this blog is supposedly about "eigenratios" of self-interpreters, and up until now eigenratios haven't been mentioned in this post. In my earlier experiments, the eigenratios that I've managed to determine exactly have been derived from a matrix whose elements corresponds to a count of times particular "instructions" (or "variations" of them) are executed in the self-interpreter when interpreting one of the same "instructions", etc., from the target program. Although this is a very one-sided view of the world, obviously I would like all "self-interpreters" to emulate the target language in a sufficiently detailed fashion as to allow a corresponding matrix to be constructed! Perhaps something like this could be another requirement for a "real" self-interpreter? And in a sense I think it is another way of "interpreting" item 3 from the paper mentioned earlier.

Finally, if you want more (although not exactly on the same topic), there's a lot of other interesting reading about self-interpreters and meta-circular interpreters (in particular) in a post by Reg Braithwaite on his blog (raganwald).

Tuesday, February 20, 2007

The Minus Language

It's been more than two months since I posted anything here - obviously the Christmas break sent me to sleep.

But recently I've been prodded back into a more wakeful state again by Darren Smith who has created an esoteric language called Minus. We exchanged a few emails about Minus and his plans to write a self-interpreter for it. I haven't done anything with Minus except to briefly try out Darren's interpreter and some of his example programs but it reminds me of an amalgam of various "one instruction set computers" as described on Wikipedia - mostly like the subleq variant for the core instruction, but with a different branching mechanism plus some extensions to support console I/O and other features.

Anyway, I see today that Darren has now posted details of his first self-interpreter including the observed eigenratio and his analysis to back that up. A big "thumbs up" from me. I hope future authors of new languages take note of this new trend, and also publish a self-interpreter plus matching eigenratio along with the language specifications! :-)

I think it's fair to say Darren's first self-interpreter is optimised more for size than speed. Its eigenratio is just over 87 which seems a bit higher than what I would have thought possible given it's apparent similarity to the OISC/subleq machine. However I'm not volunteering to write a faster version - I'll leave that to him!

Also, awoken from my slumber when it comes to eigenratios, I decided to revisit my original efforts to create a language and self-interpreter with the explicit goal of achieving a "low eigenratio". My latest revision of that project achieves an eigenratio of about 8.138. When I get to the point where I can't find any more easy/obvious improvements (and have tidied up the various bits and pieces) I'll publish it for anybody who wants to have a look - not that it's particularly clever.

Monday, December 18, 2006

Composite Self-Interpreters

It's funny how little snippets of more or less unrelated pieces of information can fuse in your mind when the conditions are right. I had one of those moments earlier tonight so I'm recording the result now before it has a chance to disintegrate and disappear!

First, a little while ago ago Dean Scarff mentioned in an email the idea of using a matrix (not necessarily square) to describe a cross-language interpreter in much the same way that I've been trying to use matrices to describe various self-interpreters. I think the cross-language aspect had floated through my head at some earlier point in time also but wasn't more than a fleeting thought at best.

Then earlier today Oleg Mazonka mentioned in another email some slightly surprising results he got from running combinations of my new cgbfi2.b BF self-interpreter and Daniel B. Cristofani's dbfi.b. I had also played around a little with running combinations in different orders during testing but hadn't thought all that much about it.

Also fresh in my thoughts was something I read about in the last day or two pointing out that if you have two matrices A and B, with A being m by n and B being n by m, then the eigenvalues of AB are the same as the eigevalues of BA (plus m-n eigenvalues of zero if we also assume m is greater than or equal to n).

And the final thing was a discussion I saw recently (on some site about esoteric programming languages) about one person who had written an interpreter in language L1 for a language L2, and someone else had written an interpreter in language L2 for L1. The comment was made that between the two of them, they now effectively had the first self-interpreter for one of those languages! (Update: 21 Dec. Found another copy of the discussion - actually about a Befunge to ETA compiler written in Ruby, and an ETA interpreter written in Befunge, but in this case my bad memory didn't do any real harm!)

Anyway, all this stuff kind of merged in my head not long ago, leading me to realise there is another interesting (but also probably totally useless!) thing we can say about self-interpreters that are composites of other interpreters.

Say we have a pair of complementary cross language interpreters as described above. The pair can be combined to make a logical single unit in two different ways - effectively creating a self-interpreter for either of the two languages involved. If we also assume we can build a matrix for each separate cross-language interpreter, then we can also get squares matrices AB and BA that must correctly describe the two "composite" self-interpreters. Then we can apply the result about eigenvalues of AB and BA being the same (except for additional zero eigenvalues if A and B not square) and see that the eigenratios (dominant eigenvalue) for each such pair of composite self-interpreters must be the same, even though they are for different languages!

Finally, we can also apply the same idea to two different self-interpreters for the same language. The "product" of those two can be built in two different ways, and again we can see that the eigenratio of these composite self-interpreters must be identical.

This all seems "very cool" to me... although I better go and check it out with an example or two because so far this is all "just theory"!

Thank you Oleg and Dean for providing the seeds for an interesting thought!

The Mark II Brainf*ck Self-Interpreter

I obsess, therefore I am...

This will be short and sweet, because all I really want to do is to "release" my second version of a faster BF self-interpreter to the world at large.

The commented version of cgbfi2.b is probably easier to understand but here's a lean and mean version anyway!


>>>>+[->>>++>+>+++++++[<++++>>++<-]++>>+>+>+++++[>++>++
++++<<-]+>>>,<++[[>[->>]<[>>]<<-]<[<]<+>>[>]>[<+>-[[<+>
-]>]<[[[-]<]++<-[<+++++++++>[<->-]>>]>>]]<<]>[-]+<<[--[
[-]>>-<<<+>]>>[-<<<<[>+<-]+<<+[->[<+>>>>>>+<<<<<-]<[>+<
-]>>>>>>>+<[-[-[-[-[-[-[-[-[[-]>-<]>[-<<+++++++>>]<]]]>
[-]<]>[-<<+++>>]<]>[-<<+>>]<]>[-]<]>[-<<<<<<<+>>>>>>>]<
]>[-]<<<<<]>>>[<<+>>-]<+<[-[-[-[-[-[-[-[-[-[-[[-]>-<<<[
-]<<+>>]]]]>[-]<]>[-<<<[-]<<+++++++>>>]<]]]>[-]<]>[-<<<
[-]<<+++++++>>>]<]]>[-]<<<<<<[-]>>[-<<<[>+>>+<<<-]>[<+>
-]>>[-[-[[-]>>]>[<<[<+>>+<-]>[<+>-]+<<-[-[-[-[-[-[-[-[-
[-[-<+>]<+++++++++++++>>[-]>->-<<]]]>>[->>>>>]<<]>>[-<<
<++++++++++++>>[-]>>-]<<]>>[->>>>>]<<]>>[-<<<++++++++++
+>>[-]>>-]<<]>>[-<<<++++++++++>>[-]>>-]<<]]>>[->>>>>]<<
]<]]>[>>]<<<]>>+>>>]<<]>>[->+>]<<<]<<[<<]>>[[<+>>+<-]+<
-[-[-[-[-[-[-[-[-[-[-[-[-[-[-[->->>[>>]>>[>>]<[<-<<-<]<
[<<]<<[<<]<]>[->>[>>]>>[>>]<[>+>>+>]<[<<]<<[<<]]<]>[->>
[>>]>>[>>]<[-]<[<<]<<[<<]]<]>[->>[>>]>>[>>]<[<-<]<[<<]<
<[<<]]<]>[->>[>>]>>[>>]<[>+>]<[<<]<<[<<]]<]>[->>[>>]>>[
>>]<<-<<-<<[<<]<<[<<]]<]>[->>[>>]>>[>>]+>>+[<<]<<[<<]]<
]>[->>[>>]>>[>>]<++<[<<]<<[<<]]<]>[->>[>>]>>[>>]<+<[<<]
<<[<<]]<]>[->>[>>]>>[>>]<,<[<<]<<[<<]]<]>[->>[>>]>>[>>]
<-<[<<]<<[<<]]<]>[->>[>>]>>[>>]<.<[<<]<<[<<]]<]>[->>[>>
]>>[>>]<<-<<[<<]<<[<<]]<]>[->>[>>]>>[>>]+[<<]<<[<<]]<]>
[->>[>>]>>[>>]<[>+>>+<<<-]>[<+>-]>>[<<+>>[-]]+<<[>>-<<-
]>>[<<+>>>>+<<-]>>[<<+>>-]<<[>>+<<-]+>>[<<->>-]<<<<[-<<
[<<]<<[<<]<<<<<++>>]>>[-<<<<<<[<<]<<[<<]<]>]<]>[->>[>>]
>>[>>]<[>+>>+<<<-]>[[<+>-]>>[-]+<<]>>[<<+>>>>+<<-]>>[<<
+>>-]<<[>>+<<-]+>>[<<->>-]<<<<[-<<[<<]<<[<<]<<<<<+>>]>>
[-<<<<<<[<<]<<[<<]<]>]>[<+>-]<<<<<<[>>+<<-[->>->>+[>>>[
-<+>>+<]+<-[-[[-]>-<]>[-<<<+>>>]<]>[-<<<->>>]>[-<+>]<<<
<[>>+<<-]>>]<<<<<<]>>[-<<+[>>>[-<+>>+<]+<-[-[[-]>-<]>[-
<<<->>>]<]>[-<<<+>>>]>[-<+>]<<<<[<<+>>-]<<]]]>>>>>>>]

Isn't it lovely? :-)

In a nutshell, cgbfi2.b is a derivative of cgbfi1.b that replaces "++", ">>", "<<", "[-]", "[>]", "[<]", "[>>]" and "[<<]" sequences in its input program with some new single value codes internally. This means those sequences are treated as single instructions in the main interpreter loop, thus requiring only one return trip between the emulated code and data in each case instead of (often) many more. In addition this strategy reduces the size of memory footprint of the emulated code, which also helps.

Of course, cgbfi2.b is a lot larger than cgbfi1.b (1758 instructions versus 950 once the comments are stripped out) but cgbfi2.b effectively sees itself as a 1178 instruction program after those common sequences described above are "compiled" to single instructions in memory. 1178 is still larger than 950 but the other gains made by the reduced trips to the data and back at runtime generally make this new version much faster. The other negative is that this new interpreter does more work when loading a program (it was hard work recognising those few short sequences!) so it can be slower than other interpreters when running some shorter/simpler programs. But for larger and more complex programs (including itself!) it is definitely much faster.

In terms of finding an eigenratio, it appears that this new interpreter might have an eigenratio of somewhere in the "low thousands". This guesstimate is again only based on results from running a few very small programs on a three level deep stack of the interpreter. But if this is not too far from the real value then it seems a fast PC might manage to run a four level stack in a few days. I'll have to find a faster machine than mine to test this with!

However I did manage to run a 4 level deep stack with a slight "cheat" about 10 days ago when I was experimenting with different strategies. The "cheat" was to create a version of the interpreter that expects its input to already be converted to binary values 1-8 instead of the original "][><.-,+" BF instructions, and values 9-16 for the "virtual instructions", plus 0 for the end of input marker. This means the loader routine was extremely short and simple (something like ">>>>>>+[-<,[>>>+<]>]>+<<<<[<<]>>"), but the main interpreter loop was unchanged. In that experiment, the four level stack of the "binary BF self-interpreter" appeared to be converging towards an eigenratio of about 600. Once again, on a faster machine this would seem to indicate that a 5 level stack might be able to be run (with a very simple program "on top") in a few days or so.

More later hopefully, once I find a faster machine, or someone else produces a faster faster version! For now, I think I'm just about out of ideas, energy and enthusiasm in that department - not to mention skill. Even finding a way to do basically the same thing as cgbfi2.b but in a more compact implementation should make it a lot faster.

Friday, December 08, 2006

Eliminate the infinite!

Well - maybe not eliminate it - but at least push it back indefinitely!

But before I get to that, there were at least two other brainf*ck (BF) self-interpreters in my collection that I overlooked in the initial testing of my own self-interpreter (as described in my previous post) - one by Dean Scarff (bfi446.b) and another by "Keymaker" (kbfi.b). However, I've since tested both of those and it seems that mine is still the "fastest". :-)

I also sent an email to all the authors of these various other BF self-interpreters telling them about my quest for a BF self-interpreter eigenratio, hoping that perhaps someone might be tempted to create an even faster version. Dean Scarff and I have exchanged a couple of emails since then and his comments have motivated me to try to collect/clarify some of my half-baked thoughts about aspects of "eigenratio theory", such as it is.

Meanwhile, I've also been trying to find faster ways to run BF programs, and experimenting with some other ideas for improving cgbfi1.b (my first BF self-interpreter) further. I ended up looking at bfc, a BF to C optimising compiler, and although I haven't really used this much but some quick tests seem to show it is the new leader in the "fastest way to run a given BF program" competition. However, Oleg Mazonka has also been working on making a faster interpreter and has posted some new code and results on his BF page, so perhaps bfc might also be overtaken later.

So, back to the "eliminating the infinite"...

I have had some doubts (on and off) over recent times about whether any particular BF self-interpreter would end up with an eigenratio or not. "Infinite memory" and a distance (from current position) related cost for accessing memory were probably the main reasons for my doubts. So now I want to try to outline my reasons for renewed confidence (long may it last!) that a BF self-interpreter will (like the OISC/subleq interpreter and um.um from the ICFP 2006 contest and those for my own toy languages) also eventually turn out to have an eigenratio. (And if I'm wrong, then this post will probably haunt me forever!)

In essence I believe we can find an eigenratio for a self-interpreter in a bounded memory version of a BF machine. Then we can expand that machine to arbitrarily larger sizes and the eigenratio will not change. Thus the eigenratio for the same self-interpreter running on the "infinite memory" (Turing complete) BF machine is also the same.

A more detailed version of my reasoning follows below. (This was significantly modified on the 10th, and again on the 11th of December, in a possibly futile attempt to make it more readable and understandable!)

1. A self-interpreter for a deterministic Finite State Machine (FSM) has an eigenratio (which can be calculated from a corresponding square matrix).

The meaning of a particular language can be described in terms of a state transition system where each state represents a particular snapshot of some 'machine'. (See: operational semantics.) If the underlying machine is 'finite' in all respects and deterministic, then we dealing with a deterministic FSM, and if the language also only supports a finite number of "instructions" (for example, there are eight in BF, although handling some such as the input instruction would require some additional variations) then there must also be a finite set of transitions possible between those states.

A self-interpreter emulates this machine - to some degree at least. Ideally it would emulate the machine exactly with every state and transition in the "real" machine being faithfully mirrored in the emulated machine. However, on a finite machine the emulation must have a somewhat smaller set of states and transitions, because some resources on the underlying machine are being consumed by the interpreter in order to represent the program it is running and to take care of any other bookkeeping that it has to do.

The "execution" of a single "instruction" in the emulated machine causes a particular (emulated) transition to occur at that level. The self-interpreter handles this by executing some sequence of instructions causing a matching sequence of transitions on the underlying host machine. Let's also say that each transition has some fixed positive cost (execution time) associated with it, possibly a different value for different transitions.

From all this, we can (in a theoretical sense at least!) construct a matrix describing the "inner workings" of the self-interpreter (as described in earlier posts) with exactly one row for each transition in the underlying machine, and one column for every transition in the emulated machine. We must also make this into a square matrix by adding sufficient "dummy" zero-filled columns for the transitions that don't actually exist in the emulated machine even though they do exist in the underlying machine (and may be used by the self-interpreter in various ways). Finally, we can reorder the rows and columns so that those same zero-filled columns (and matching rows) appear at the bottom (for the rows) and at the right (for the columns) of the matrix. (This makes things easier to follow later.)

So now we have a large square matrix, with a matching row/column for every possible transition. The dominant eigenvalue of this matrix must also be the eigenratio for the self-interpreter, even if we cannot actually stack an "infinite" number of interpreters in this finite machine! By considering each possible transition as separate entity (instead of trying to group them into relatively few "variations") we are effectively using the "highest resolution" version of a matrix possible for that self-interpreter.

Note that because of the zero-filled columns on the right (corresponding to the transitions in the underlying machine that can't be emulated due of a lack of resources), those columns and the corresponding rows don't contribute to the value of the eigenratio. Therefore the smaller square matrix consisting of only the first N rows and columns, where N is the number of transitions in the emulated FSM, will also produce the correct eigenratio (dominant eigenvalue), even though that smaller matrix also can't be used to calculate the full number of transitions "executed" in the running of any particular program via the interpreter.

If we needed (in a theoretical sense only!) to use the "full" matrix (the one including the extra zero-filled columns) to do something like calculate the exact number of transitions "executed" when interpreting some other program, then the vectors of counts will also have to include a matching number of zero values corresponding to those "impossible" (in the emulated machine) transitions, but everything else works essentially as described in earlier posts.

2. Now consider a reduced version of our BF machine, where instead of unbounded memory it has a finite amount of data cells and also a finite range of possible values in each cell of the data, and also an upper bound of the length of the program that can be run, but still large enough to run some particular self-interpreter. This version of a BF machine is therefore a FSM instead of being Turing complete, and (from 1) the self-interpreter has some fixed eigenratio and corresponding matrix in that machine.

In a finite version of the BF machine, a "state" represents one unique snapshot of the entire data array plus the position of the data pointer. For all of these states (except those representing situations where the data pointer is already at the rightmost data cell), there will be a valid transition for the '>' instruction, taking us to another state where the data pointer has moved to the "right" by one but the contents of memory haven't changed at all, and similarly with all other instructions that are valid at that point. There would need to be multiple transitions from each state for the input instruction, one for each possible value that could be input.

3. Now we show that we can expand the size of this smaller BF machine (by increasing the number of number of memory cells and/or increasing the range of values each cell can hold) without essentially changing the behaviour of the same self-interpreter when it is used in the expanded machine to run any program that was also able run in the smaller machine from step 2.

As we add more memory cells (or increase the range of values each cell can hold) we are adding new states to our machine. Therefore there will also be new transitions (some from the "edges" of the state transition graph (for the smaller FSM) connecting to some new states, and others between various pairs of the newly added states). However, we can consider the states (and transitions between them) from the smaller machine to be unchanged as long as we extend the meaning of each state to include all of the newly added memory cells (or newly added memory bits in each data cell) being zero. (The language, BF, hasn't changed.) Thus the matrix for the self-interpreter in this larger machine can be constructed so that it consists of a copy of the matrix from the smaller machine that is then extended with new columns to the right and new rows at the bottom. Also, at least some of the dummy columns filled with zeroes in step 1 will now be for transitions that are emulated properly in the larger machine and so these will now generally have different values. But importantly, we also know that the elements in the new rows added at the bottom will be zero where they appear in the columns corresponding to the properly emulated transitions in the smaller machine. (The first N columns from that smaller machine matrix, using the same value for N as in part 1.)

Why? Remember that each column essentially describes how one transition in the emulated machine is mapped onto some combination of transitions in the underlying machine. This means the new values added to the bottom of any of those first N columns from the matrix for the smaller machine must correspond to transitions in the underlying machine that were not needed in the emulation of the transition corresponding to that column in the smaller machine, and so won't be needed in this larger machine either.

Now consider a program that could be run in the interpreter in the smaller BF machine. In other words it only used a subset of those N transitions. In this expanded machine, the vector representing how often each transition in the underlying machine is used during execution of the program is the same as what it as in the smaller machine for those first N elements, then extended with zeroes for the additional transitions in the expanded machine that it does not use. The same thing applies to the vector representing the start-up section of the self-interpreter.

Thus when we apply the same matrix operations as described in earlier posts using the expanded matrix and vectors, we get essentially the same results (but also with additional zeroes padding things out in various vectors). Thus the total execution time for any such smaller program when run on top of a fixed size stack of our self-interpreter in the expanded machine must be the same as what it was in the smaller machine, and hence the eigenratio must also be the same, at least when we consider the set of smaller programs that could be run on the smaller BF machine.

Really, all of this is a long winded way of showing that a small program that could run in the interpreter on the smaller machine will also run in the interpreter on the larger machine using exactly the same states and transitions as it did previously which may seem rather obvious anyway!

4. But this also means the same self-interpreter must have the same eigenratio for any valid programs in this larger machine! For if it had a different eigenratio when running one of the larger programs (one that couldn't be run in the smaller machine but can be in the expanded machine) we would then be contradicting the result from step one that says a self-interpreter on some particular FSM has a fixed eigenratio for all programs that can be run in that machine.

So now we see that when a reduced BF machine (but still big enough to run a self-interpreter) is expanded to a larger "size", the same self-interpreter must still have the same eigenratio in the new machine, even though it could now also be running some programs that require a larger amount of memory and/or larger range of values in each data cell.

5. We can repeat this expansion of a smaller BF machine into a larger version, as often as we need and by as much as we need, without changing the eigenratio of any particular self-interpreter that could run on the smaller version. Hence we can make a BF machine arbitrarily "big", and from this conclude that a self-interpreter running on the Turing complete version (with an "infinite" number of data cells and/or "infinite" range of values in each cell) will exhibit the same eigenratio.

When we are looking at a Turing complete version of BF, there are countably infinite states (and transitions) to consider. But we can still (in a theoretical sense at least) construct a kind of matrix that corresponds exactly to a self-interpreter on that machine by allowing each column to extend down forever, and each row to extend to the right forever. Similarly the vector of costs is infinite, and so is the vector of counts corresponding to the execution of any particular program (one that also halts). In other words, exactly like the finite dimensional approach use previously but extended (to the "right" and "down") to infinity. Then using an extended notion of matrix multiplication we can use this infinite matrix and related vectors to calculate exactly the number of state transitions required to run any particular program through the self-interpreter. We'll be summing an infinite sequence of products but there are only a finite number of non-zero values in each case.

Finally (with my tongue firmly in my cheek), if anybody has any doubts about any of this, we will now also have enough memory to actually build an infinite stack of copies of our self-interpreter to test that the theoretical eigenratio (dominant eigenvalue of the corresponding matrix) exactly matches the observed one (time taken for N+1 stacked copies of the self-interpreter to run any other valid halting program, divided by the time taken for N stacked copies to run the same top level program)!

Time to get back to finding that eigenratio...