countercomplex

The 16-byte frontier:

Extreme results from extremely small programs.

While mainstream software has been getting bigger and more bloated year after year, the algorithmic artists of the demoscene have been following the opposite route: building ever smaller programs to generate ever more impressive audiovisual show-offs.

The traditional competition categories for size-limited demos are 4K and 64K, limiting the size of the stand-alone executable to 4096 and 65536 bytes, respectively. However, as development techniques have gone forward, the 4K size class has adopted many features of the 64K class, or as someone summarized it a couple of years ago, "4K is the new 64K". There are development tools and frameworks specifically designed for 4K demos. Low-level byte-squeezing and specialized algorithmic beauty have given way to high-level frameworks and general-purpose routines. This has moved a lot of "sizecoding" activity into more extreme categories: 256B has become the new 4K. For a fine example of a modern 256-byter, see Puls by Rrrrola.

The next hexadecimal order of magnitude down from 256 bytes is 16 bytes. Yes, there are some 16-byte demos, but this size class has not yet established its status on the scene. At the time of writing this, the smallest size category in the pouet.net database is 32B. What's the deal? Is the 16-byte limit too tight for anything interesting? What prevents 16B from becoming the new 256B?

Perhaps the most important platform for "bytetros" is MS-DOS, using the no-nonsense .COM format that has no headers or mandatory initialization at all. Also, in .COM files we only need a couple of bytes to obtain access to most of the vital things such as the graphics framebuffer. At the 16-byte size class, however, these "couples of bytes" quickly fill up the available space, leaving very little room for the actual substance. For example, here's a disassembly of a "TV noise" effect (by myself) in fifteen bytes:

addr  bytes     asm
0100  B0 13     MOV AL,13H
0102  CD 10     INT 10H
0104  68 00 A0  PUSH A000H
0107  07        POP ES
0108  11 C7     ADC DI,AX
010A  14 63     ADC AL,63H
010C  AA        STOSB
010D  EB F9     JMP 0108H
The first four lines, summing up to a total of eight bytes, initialize the popular 13h graphics mode (320x200 pixels with 256 colors) and set the segment register ES to point in the beginning of this framebuffer. While these bytes would be marginal in a 256-byte demo, they eat up a half of the available space in the 16-byte size class. Assuming that the infinite loop (requiring a JMP) and the "putpixel" (STOSB) are also part of the framework, we are only left with five (5) bytes to play around with! It is possible to find some interesting results besides TV noise, but it doesn't require many hours from the coder to get the feeling that there's nothing more left to explore.

What about other platforms, then? Practically all modern mainstream platforms and a considerable portion of older ones are out of the question because of the need for long headers and startup stubs. Some platforms, however, are very suitable for the 16-byte size class and even have considerable advantages over MS-DOS. The hardware registers of the Commodore 64, for example, are more readily accessible and can be manipulated in quite unorthodox ways without risking compatibility. This spares a lot of precious bytes compared to MS-DOS and thus opens a much wider space of possibilities for the artist to explore.

So, what is there to be found in the 16-byte possibility space? Is it all about raster effects, simple per-pixel formulas and glitches? Inferior and uglier versions of the things that have already made in 32 or 64 bytes? Is it possible to make a "killer demo" in sixteen bytes? A recent 23-byte Commodore 64 demo, Wallflower by 4mat of Ate Bit, suggests that this might be possible:

The most groundbreaking aspect in this demo is that it is not just a simple effect but appears to have a structure reminiscent of bigger demos. It even has an end. The structure is both musical and visual. The visuals are quite glitchy, but the music has a noticeable rhythm and macrostructure. Technically, this has been achieved by using the two lowest-order bytes of the system timer to calculate values that indicate how to manipulate the sound and video chip registers. The code of the demo follows:

* = $7c
ora $a2
and #$3f
tay
sbc $a1
eor $a2
ora $a2
and #$7f
sta $d400,y
sta $cfd7,y
bvc $7c
When I looked into the code, I noticed that it is not very optimized. The line "eor $a2", for example, seems completely redundant. This inspired me to attempt a similar trick within the sixteen-byte limitation. I experimented with both C-64 and VIC-20, and here's something I came up with for the VIC-20:
* = $7c
lda $a1
eor $9004,x
ora $a2
ror
inx
sta $8ffe,x
bvc $7c
Sixteen bytes, including the two-byte PRG header. The visual side is not that interesting, but the musical output blew my mind when I first started the program in the emulator. Unfortunately, the demo doesn't work that well in real VIC-20s (due to an unemulated aspect of the I/O space). I used a real VIC-20 to come up with good-sounding alternatives, but this one is still the best I've been able to find. Here's an MP3 recording of the emulator output (with some equalization to silent out the the noisy low frequencies).

And no, I wasn't the only one who was inspired by Wallflower. Quite soon after it came out, some sceners came up with "ports" to ZX Spectrum (in 12 or 15 bytes + TAP header) and Atari XL (17 bytes of code + 6-byte header). However, I don't think they're as good in the esthetic sense as the original C-64 Wallflower.

So, how and why does it work? I haven't studied the ZX and XL versions, but here's what I've figured out of 4mat's original C-64 version and my VIC-20 experiment:

The layout of the zero page, which contains all kinds of system variables, is quite similar in VIC-20 and C-64. On both platforms, the byte at the address $A2 contains a counter that is incremented 60 times per second by the system timer interrupt. When this byte wraps over (every 256 steps), the byte at the address $A1 is incremented. This happens every 256/60 = 4.27 seconds, which is also the length of the basic macrostructural unit in both demos.

In music, especially in the rhythms and timings of Western pop music, binary structures are quite prominent. Oldschool homecomputer music takes advantage of this in order to maximize simplicity and efficiency: in a typical tracker song, for example, four rows comprise a beat, four beats (16 rows) comprise a bar, and four bars (64 rows) comprise a pattern, which is the basic building block for the high-level song structure. The macro-units in our demos correspond quite well to tracker patterns in terms of duration and number of beats.

The contents of the patterns, in both demos, are calculated using a formula that can be split into two parts: a "chaotic" part (which contains additions, XORs, feedbacks and bit rotations), and an "orderly" part (which, in both demos, contains an OR operation). The OR operation produces most of the basic rhythm, timbres and rising melody-like elements by forcing certain bits to 1 at the ends of patterns and smaller subunits. The chaotic part, on the other hand, introduces an unpredictable element that makes the output interesting.

It is almost a given that the outcomes of this approach are esthetically closer to glitch art than to the traditional "smooth" demoscene esthetics. Like in glitching and circuit-bending, hardware details have a very prominent effect in "Wallflower variants": a small change in register layout can cause a considerable difference in what the output of a given algorithm looks and sounds like. Demoscene esthetics is far from completely absent in "Wallflower variants", however. When the artist chooses the best candidate among countless of experiments, the judgement process strongly favors those programs that resemble actual demos and appear to squeeze a ridiculous amount of content in a low number of bytes.

When dealing with very short programs that escape straightforward rational understanding by appearing to outgrow their length, we are dealing with chaotic systems. Programs like this aren't anything new. The HAKMEM repository from the seventies provides several examples of short audiovisual hacks for the PDP-10 mainframe, and many of these are adaptations of earlier PDP-1 hacks, such as Munching Squares, dating back to the early sixties. Fractals, likewise producing a lot of detail from simple formulas, also fall under the label of chaotic systems.

When churning art out of mathematical chaos, be that fractal formulas or short machine-code programs, it is often easiest for the artist to just randomly try out all kinds of alternatives without attempting to understand the underlying logic. However, this easiness does not mean that there is no room for talent, technical progress or rational approach in the 16-byte size class. Random toying is just a characteristic of the first stages of discovery, and once a substantial set of easily discoverable programs have been found, I'm sure that it will become much more difficult to find new and groundbreaking ones.

Some years ago, I made a preliminary design for a virtual machine called "Extreme-Density Art Machine" (or EDAM for short). The primary purpose of this new platform was to facilitate the creation of extremely small demoscene productions by removing all the related problems and obstacles present in real-world platforms. There is no code/format overhead; even an empty file is a valid EDAM program that produces a visual result. There will be no ambiguities in the platform definition, no aspects of program execution that depend on the physical platform. The instruction lengths will be optimized specifically for visual effects and sound synthesis. I have been seriously thinking about reviving this project, especially now that there have been interesting excursions to the 16-byte possibility space. But I'll tell you more once I have something substantial to show.


Comment this post on Blogspot.