Quine Programming Language
A long time ago, I worked at a company that has since merged with another large company. Back in the day, my employers kept a technical reference library that included back issues of a magazine, Software: Experience and Practice. A few of the earliest issues of that magazine had a “Computer Recreations” column. Volume 2, pages 397-400 had a column on self reproducing programs.
I chanced across that column when I scrounged around the technical library for some obscure structural analysis reference circa 1988. I had recently bought an AT&T 3b1 which ran System V R3 Unix. I paid extra for the software package that contained the fabulous new Korn Shell. What could be better and more fun than to write a self-reproducing Korn Shell script? Almost certainly nobody had ever done that! It took me an entire sweaty summer weekend to write that shell script.
I stumbled across my old 1988 shell script a few days ago.
st='echo st=$sq${st}$sq;echo dq=$sq${dq}$sq;echo sq=$dq${sq}$dq;echo $st'
dq='"'
sq="'"
echo st=$sq${st}$sq;echo dq=$sq${dq}$sq;echo sq=$dq${sq}$dq;echo $st
I also found myself thinking about Unix-shell-style variable interpolation in strings.
Shells treat single- and double-quoted string literals differently when doing interpolation,
escape characters can change semantics,
and two ways to denote a variable to substitute exist,
$varname and ${varname}.
I concluded that Unix-style shell scripting is scruffy but lovable,
and that systems like DEC’s DCL
or CDC’s NOS/VE
SCL command language, which strive for absolute consistency and uniformity,
are both not as charming and not as useful.
With variable interpolation on my mind,
seeing the self-replicating (change in terms!)
shell script made me think that it wouldn’t be too hard to implement a
very simple programming language that could execute the shell script.
I realized that I wouldn’t even have to implement all of Unix-style variable substitution.
My self-replicating script used double-quoted strings solely to get a string consisting
of one single-quote assigned to a variable.
My quine didn’t need ${name} notation at all:
it didn’t need to interpolate variables’ values in locations
where the variable name couldn’t otherwise be distinguished.
My quine only uses assigning string values to variables,
and an echo output.
I ended up writing an interpreter for a language
that could assign strings to variables,
had an echo command,
and did variable interpolation at the right place.
I simplified the quine a bit, to remove the ${name} shell script idiom:
st='echo st=$sq$st$sq;echo dq=$sq$dq$sq;echo sq=$dq$sq$dq;echo $st'
dq='"'
sq="'"
echo st=$sq$st$sq;echo dq=$sq$dq$sq;echo sq=$dq$sq$dq;echo $st
Interpreter source code repository. My interpreter is written in the Go programming language.
The interpreted language has variables that have C-identifier formats
(letters, digits and underscore, leading letter).
Strings, which undergo variable interpolation when unquoted,
or only double-quoted,
and an echo output command.
The interpreted language’s syntax is very shell-like,
no spaces can appear before or after the = in an assignment statement.
The interpreter’s input is via a file, named on the command line,
not line-by-line (potentially interactively).
Interpreter design
I wrote the variable interpolation code first,
because that was what I was initially investigating.
I wrote a function that had a dictionary, keyed by variable name,
of variable values, and an original string that potentially contained
variable names.
Variable names are sub strings of the original string,
occurring between $ and the next non-identifier character.
Variable value interpolation doesn’t respect single vs double quoted strings: the function calling the variable interpolation function has to decide if string literals allow variable value interpolation or note.
After getting variable value interpolation working,
I realized that a two-stage lexing process had to take place.
First, input lines had to be divided into “commands” by ; (semicolon) characters,
if one or more appeared in a line,
or end-of-line markers (newlines).
Second, whether or not variable value interpolation into individual commands
should happen had to be decided.
Single quoted string literals do not get variable values interpolated,
while double-quoted or unquoted strings do.
I wrote a lexing construct that could determine individual commands in an input buffer, and return them one by one to a function that “executed” those commands, one command per invocation. Because of variable assignment statements in the interpreted language, I passed in a Go map that was both added to when a new assignment gets executed, and read from in subsequent invocations when the function found a variable name for which to interpolate.
Input is from a single file, which simplifies procuring input.
Input becomes a slice of Go rune vales,
rather than having to read an input stream line-by-line or byte-by-byte.
Program language examples
This input:
a="a word or phrase"
echo $a
and this input:
a='a word '
b='or phrase'
echo $a$b
Both produce output “a word or phrase” on stdout.
So does this input:
b=phrase
a="a word or $b"
echo $a
Variable interpolation happens only once per command.
It happens before assignment, or before echo output.
b=phrase
a='a word or $b'
echo $a
The output of those 3 lines is “a word or $b”.
This demonstrates that the single-quoted string assigned to a
doesn’t get variables interpolated after the value of $a
gets interpolated before the echo output.
Results
My interpreter works as desired.
The quine replicates itself when interpreted by modern zsh, bash, dash,
ksh shells, and my interpreter.
All that’s necessary to write a shell script quine is assignment of string literals
to named variables,
interpolating values of variables into larger strings,
and a way to output those larger strings.
I did not need to have my interpreter distinguish between double- and single-quoted strings
in the context of variable interpolation.
I did need two different quote characters to allow interpolation of those quote characters
by variable value.
I learned that it’s difficult to write about an interpreter and the interpreted language without confusing the two dreadfully.
With slightly different interpreted language semantics,
it would be possible to get away with only one quote character.
Suppose that assignment of one quote character (name=" for example)
created a single-character string value for the variable,
and that single character is the quote character.
A quine in this slightly different interpreted language would look like this:
g='echo g=$q$g$q;echo q=$q;echo $g'
q='
echo g=$q$g$q;echo q=$q;echo $g
This is getting pretty far away from my original motivation of seeing what shell features a working quine actually requires. I think we, as a society, could use a little work on “philosophy of shells”.