This is my (Viznut/PWP) entry to the Binary Golf Grand Prix challenge in
July 2020.

The task was to write an executable that is an ambigram (i.e. palindrome,
i.e. a file that is identical when the byte order is reversed) while being
as small as possible and executing as many of its bytes as possible. Any
executable file format on any OS and architecture was allowed.

Entry name: VIZPALAPZIV
Executable format: Commodore 8-bit PRG format on Commodore VIC-20
Number of bytes: 20 (including the 2-byte start address)
Executed bytes: 18 (every byte except the start address, i.e. 90%)

Bytes: 7c 00 8f 0f 90 25 48 48 73 a9 a9 73 48 48 25 90 0f 8f 00 7c

To generate the binary in a unix-like shell:
echo fACPD5AlSEhzqalzSEglkA+PAHw= | base64 -d > vizpalapziv.prg

You can also download it.

HOW TO RUN:

Manually:
  Put the program on a disk (or mount the directory in an emulator)
  LOAD"VIZPALAPZIV",8,1
  After it is loaded in memory, just press RETURN another time.

Automatically with the VICE emulator:
  Pass it as a command-line argument: xvic vizpalapziv.prg
  or:
  start the emulator and choose "autostart image" from the menu.

The memory configuration does not matter. It even runs on the C-64 (although
it obviously does not do anything visible because the video chip is mapped
elsewhere).

WHAT IT DOES:

The program effectively scans thru the address space and flashes the screen
colors according to each read byte. There is no exit condition, so the scan
wraps to $0000 after $FFFF. The different areas of the memory space produce
different kinds of flash patterns.

The executable obtains much of its functionality from the RAM environment it
injects itself into - that is, a RAM-resident subroutine called CHRGET.

CODE:

Bytes: 7c 00 8f 0f 90 25 48 48 73 a9 a9 73 48 48 25 90 0f 8f 00 7c

The first two bytes specify the start address ($007c). The rest of the bytes
are loaded to the memory starting from this address.

Assembly source (compiles with the ACME Cross-Assembler):

!cpu 6510 ; there's a misconception that the undoc ops are 6510-specific
!to "vizpalapziv.prg",cbm
*=$7c
sax $900f
and $48
pha
rra ($a9),y
lda #$73
pha
pha
and $90
slo $008f
!byte $7c

MEMORY BEHAVIOR:

In the VIC-20 BASIC environment, there is a RAM-resident subroutine called
CHRGET starting from the address $0073. This is a very commonly called
routine and therefore placed in the RAM to facilitate some optimizations.
Normally, it looks like this:

0073 INC 7A     e6 7a 
0075 BNE 0079   d0 02 
0077 INC 7B     e6 7b 
0079 LDA xxxx   ad xx xx (this operand is subject to modification)
007c CMP #3A    c9 3a 
007e BCS 008A   b0 0a 
0080 CMP #20    c9 20 
0082 BEQ 0073   f0 ef 
0084 SEC        38 
0085 SBC #30    e9 30 
0087 SEC        38 
0088 SBC #D0    e9 d0 
008a RTS        60

The next few bytes are not supposed to be code, but I'll disassemble them
anyway:

008b NOP #4F    80 4f 
008d DCP 52     c7 52 
008f CLI        58 
0090 RTI        40

There are some tiny C-64 programs (including demoscene prods) that overwrite
CHRGET, so this trick isn't mine. The benefit of this technique is that it
saves a couple of bytes compared to the standard executable format (no need
for a BASIC stub that starts the machine-language portion). On the downside,
the maximum size of the executable is very limited.

So, once VIZPALAPZIV is loaded in the memory, the CHRGET routine will look
like this:

0073 INC 7A     e6 7a      increment low byte of source address
0075 BNE 0079   d0 02      did it wrap to zero?
0077 INC 7B     e6 7b      if yes, increment the high byte.
0079 LDA xxxx   ad xx xx   load byte from the source address.
007c SAX 900F   8f 0f 90   write it (anded with X) to screen color register.
007f AND 48     25 48      clear the accumulator ($0048 contains zero)
0081 PHA        48         push it on the stack
0082 RRA (A9),Y 73 a9      (mess the accumulator)
0084 LDA #73    a9 73      load immediate value #73 to accumulator
0086 PHA        48         push it on the stack
0087 PHA        48         ... twice
0088 AND 90     25 90      (mess the accumulator)
008a SLO 008F   0f 8f 00   (mess the operand of the next instruction)
008d NOP 5852,X 7c 52 58   (do nothing)
0090 RTI        40         "return from interrupt", i.e. jump to $0073

THE UNCONDITIONAL JUMP

Since CHRGET injections are not "actual" executables, it is very easy to
create pieces of code that just break the routine without preventing it from
returning. A short palindromic example would be 7d 00 7d (that changes the
CMP instruction at $007c into CMP #$7D). I wanted the program to do
something more substantial (as if it were an actual program instead of a
mere bunch of injected opcodes), so I decided to make it run in a loop.

Since I wanted to maximize the percentage of executed bytes, I couldn't
write my own branch instruction at the end of the code (that is, I couldn't
find any use for the bytes 00 7c so I would've had to leave them
unexecuted).

I could have used the jump instruction at $0082 (BEQ 0073) if I had been
able to find a way to make sure that Z=1 when it is executed. I couldn't
find a way to do this in the limited space I had.

I was able to reappropriate the RTS instruction at $008a while retaining the
palindrome constraint (bytes: 7c 00 a5 98 48 73 a9 a9 73 48 98 a5 00 7c).
However, I still couldn't fit in anything graphically interesting, so I
abandoned this solution as well.

Eventually, I decided to use the byte $40 at $0090 as an RTI instruction.
RTI pops three bytes from the stack (processor status byte and the 16-bit
return address). My program pushes the bytes 00 73 73 in order to make it
jump to $0073.

PROCESS:

In order to ease up my experimentation, I wrote a Ruby program that prompts
for a hexadecimal sequence, appends the reversed half and disassembles the
result as it would appear on the standard VIC-20 zero page. This was a quick
job of about 30 minutes (including making the disassembler, thanks to the
6502 opcode matrix at http://oxyron.de/html/opcodes02.html ). Since I wanted
to be constantly aware of all the bytes I used, I always wrote them in hex.

I was able to reduce the amount of junk instructions in a few places e.g. by
using the opcode of PHA ($48) as a memory operand in the AND instruction
that clears the accumulator.

OUTSIDE THE CONTEST RULES:

Without the palindrome constraint, 4+2 bytes are enough for the same effect:

*=$7e
sta $900f
txa

When the user presses RETURN after the code is loaded in, CHRGET will be run
a couple of times with different values of X, including 0. The actual
busyloop will start when X=0.

I've also released a 5+2-byte VIC-20 demo called Five Bytes that is somewhat
similar but does more interesting stuff with the VIC chip registers.
However, since I couldn't assert Z=1, I had to waste another byte for
turning the BEQ into a BVC:

*=$7e
rol $8f10,x ; rolls in the C resulting from the preceding CMP instruction
inx         ; increments X so that all VIC registers get covered
!byte $50   ; BVC to ensure busyloop