The first part of the contest involved writing an emulator for a virtual machine (dubbed the "Universal Machine", or UM for short) according to the provided specifications. You used your emulator to run a UM binary provided by the organisers, and then when you got that running correctly (and fast enough!) it generated output that was used to create yet another UM binary. And that one was an fun filled simulation of a UNIX type operating system called UMIX! The remainder of the contest involved hacking into various accounts on UMIX, and then finding and solving various problems in order to gain additional points. At least I think it did... I say that because our team has yet to even find some of those problems! We spent far too much time trying to get a Python version (and then a Python/Pyrex version) running correctly and fast enough. In the end we reverted to C. But that's another story...
After the contest the organisers released what they called a "reference implementation" of an interpreter for the UM, called - wait for it - "um.um"! This was written in the UM machine language directly and can run any other UM binary, including itself. Thus it is a "self-interpreter" (or alternatively, a "meta-circular interpreter").
Of course, being written in the UM language itself, you still need another working interpreter (such as your own) to run it. (The UM doesn't yet exist as hardware!) But once you have that then it is quite easy to do something like have your own interpreter run a copy of um.um, which runs another copy of itself, which runs another copy, (insert as many levels as you feel like!) and then finally runs some other UM program. Running that final ("top level") program directly in your simulator is easier of course (and faster) but perhaps not quite as mind bending!
I did a few experiments with um.um after the contest ended. In particular I was interested in looking at how much slower the whole stack of multiple copies of um.um ran some program than (say) if there was one less copy in the stack. What I found was that after you had added "enough" copies, then adding another seemed to make the whole thing about 24.5 times slower. This ratio didn't seem to change even with different programs on the very top of the stack (although it might take more levels of um.um before "converging"), and it also didn't seem to matter which (other language) interpreter I used at the very bottom level. I had several to choose from for that very bottom level, three based on the implementations I worked on during the contest (and tidied up later), plus some other Python variants created after the contest in an attempt to find a faster way to do it.
It seemed that the um.um interpreter (once you had enough instances in the "stack") always made things run about 24.5 times slower for each additional instance. I wondered how small that ratio could be made, and also what the lower bound for such a ratio could be for any self-interpreter, in any language, and on any possible computer architecture!
Next post... what I discovered when I dug a little deeper.