Risk Limiting Audits
In 2020, when one particular political faction created a controversy by claiming that national elections might be rigged, I decided to check into how my state, Colorado, audited the results of elections.
The Colorado Secretary of State (Jena Griswold) did an excellent job getting ahead of rubbish conspiracy theories about voting and election denial in 2020.
Part of the success of vote-by-mail in Colorado is that Colorado does “Risk-limiting Audits” of ballots. State of Colorado mails every eligible voter a paper ballot that you mark by hand. The ballot is both human and machine readable, a “voter verified paper record” (VVPR). After each election, some ballots are chosen randomly and tallied to verify (within defined limits) that the correct candidate was elected.
State of Oregon has similar procedures, as does state of Washington. Election denial and efforts to require in-person, same day voting are motivated by something other than desire for secure, reliable voting.
These states base their procedures on solid mathematical or statistical basises, and on the principle of allowing anyone to check the results.
A Gentle Introduction to Risk-Limiting Audits
The Colorado Secretary of State references a paper, A Gentle Introduction to Risk-Limiting Audits, by Mark Lindemann and Philip B. Stark, from IEEE SECURITY AND PRIVACY, Special Issue on Electronic Voting, 2012
The Colorado web page links to a preprint of the paper, but as sometimes happens, the preprint differs significantly from the official, published version. The preprint, in the ballot auditing section, has some inconsistency about using percentages or proportions (i.e. 47 vs 0.47) in calculations.
Get the official version if you can, matey!
Simulation
I chose to understand A Gentle Introduction by writing a program to simulate an election, then audit the results. The idea is to use a pseudo-random number generator to create a set of data structures, each representing a physical ballot, in percentages chosen by the human user. Count the ballots, determine a winner, and that winner’s percentage of votes.
Using the ballots and the winner’s percentage of votes received, run the ballot auditing process from A Gentle Introduction. Optionally choose the “wrong” candidate as the winner to see if ballot auditing can detect large problems.
Build and Run the Simulation
I wrote the program in Go. It does not use any non-standard packages. It should be portable, but I developed and ran it under Linux.
$ go build rla.go
That should leave you with an executable named rla if you’re running a sane operating system.
Options
- -b int
- count of ballots (default 1000)
- -c string
- Something like:
A:51,B:49orSmith:37,Wesson:20,Glock:43. There’s no preset limit on number of candidates or their names. The sum of the percentages of votes they got (37%, 20%, 43% in second example) does have to equal 100.0
- Something like:
- -f int
- Voting fineness (default 1000). How many buckets to break up the probability distribution.
- -t float
- tolerance for ballot audit, 0.0 chooses maximum tolerance
- The winning proportion of votes minus tolerance has to be greater than 0.50
- -z
- Lie about who won. Instead of the winning candidate, returns the runner up, with the winning candidate’s vote count
Running it
$ ./rla -c A:51.0,B:49.0 -t .10 -b 100000
Desired results for 100000 ballots:
Candidate A: 51.00%
Candidate B: 49.00%
Generated results:
Candidate Count Desired Generated
A 50830 51.00 50.83
B 49170 49.00 49.17
Vote counting results:
A declared winner, with 50.8300%
Audit results:
17054 ballots of 100000 examined
Audit confirms winner
This simulates an election with two candidates, imaginatively named “A” and “B”. I wanted “A” to get about 51% of the votes, and “B” to get 49% “A” got a simulated 50.83%, and “B” got 49.17%. The simulated results are generated via a pseudo-random number generator, so they’re not identical from run to run, even if all the command line flags are identical.
I told the program to use a tolerance t of 0.10%.
Because it was a fairly tight contest,
the ballot audit had to examine 17,054 ballots, or 17.054%,
to get to 90% sure that “A” really did win.
The same election with -z flag, which should tell you to do a hand recount:
$ ./rla -c A:51.0,B:49.0 -t .10 -b 100000 -z
Desired results for 100000 ballots:
Candidate A: 51.00%
Candidate B: 49.00%
Generated results:
Candidate Count Desired Generated
A 51019 51.00 51.02
B 48981 49.00 48.98
Vote counting results:
B declared winner, with 51.0190%
Audit results:
9140 ballots of 100000 examined
Hand recount to confirm
The audit does detect that “B” did not actually receive 51% of the votes, and indicates a recount should occur. The program only had to examine 9140 ballots, about 9.1% of the total, to determine that a recount should occur.
Experience
Futzing around with the rla program does show that the number of ballots
to check in an audit goes up when the winner gets closer to 50% of the votes.
The audit does reliably detect when the winner has more than 50% of the vote,
or when the runner up is claimed as the winner.
Counter-intuitively, but as the paper says,
choosing to use the maximum tolerance causes the program to examine many more
ballots during the audit.
Setting extremely close elections (-c A:50.1,B:49.9) can cause the algorithm
to examine many times the total number of ballots.
I think this is acceptable,
because such a small margin of victory typically triggers hand recounts.
Technically this works because ballot selection for audit is done “with replacement”.
Understanding
I don’t understand the ballot auditing algorithm. It’s based on Sequential Tests of Statistical Hypotheses, Abraham Wald, The Annals of Mathematical Statistics, Vol. 16, No. 2 (Jun., 1945), pp. 117-186. That’s 70 pages of material in 1940s technical prose.
Wald is the famed “airplane bullet holes” analyst. He was part of the WW2 Statistical Research Group, another of the wartime tendrils that seem to permeate the basis of modern society. One interesting fact from the first few pages of Sequential Tests: The work was initially classified. There’s no hint about the WW2 motivation for it.