Ballot Encoding
Introduction
When encrypting ballots using ElGamal over multiplicative groups, ballot information must converted to an integer. This integer m
, needs to fall
in the range 1 <= m <= q
where q
is the order of the subgroup used in the election parameters. Otherwise the ballot would require
multiple ciphertexts and that would make things more complex and slow.
Thus we have devised a specific procedure that is reasonably efficient to encode the ballots. It is a two steps process:
- First a json ballot needs to be encoded into a mixed-radix representation, which is equivalent to a list of bases and an equally sized list of digits.
- Then we obtain the integer corresponding to the mixed-radix representation, following a generalization of horner's method as applied to positional number systems.
So the first step we apply is to encode the ballot as a list of bases and a list of digits, which consitutute the mixed-radix representation of the ballot. Then we convert this mixed-radix representation to its equivalent integer, to be encrypted with ElGamal.
Note that the required bases present in a mixed-radix representation are determined on a per-contest basis, according to the range of ballot values that contest needs to encode. These bases are therefore common to all ballots for the contest. Conversely, the list of values that a particular ballot encodes is naturally specific to that ballot.
Basics
If in a plurality-at-large
there are three candidates A
, B
,
and C
with answer ids 0
, 1
and 2
, and the voter wants to
vote to candidates A
and C
, then his ballot choices (obtained
using encodeRawBallot) will be choices = [1, 0, 1]
and the encoded
choices will be encoded this way:
/-------- candidate A
| /----- candidate B
| | /-- candidate C
| | |
bases = [2, 2, 2]
choices = [1, 0, 1]
encodedChoices = choices[0] + choices[1]*bases[0] + choices[2]*bases[0]*bases[1]
encodedChoices = choices[0] + bases[0]*(choices[1] + bases[1]*choices[2])
encodedChoices = 1 + 2*(0 + 2*1) = 1 + 4*1 = 5
And the bases are bases = [2, 2, 2]
. The reason the bases are 2 here
is because plurality-at-large is a non-preferential voting system
and each base is representing if the voter chose (then we use
choices[x] = 1
) or not (then we use choices[x] = 0
), and the base is in
this case max(v[x])+1`.
Preferential systems
In a preferential system, the voter can choose a specific ordering.
If we reuse the previous example, the voter might have chosen for
the first choice in his ballot candidate A
, and for his second
choice candidate B
. Not choosing a candidate would be encoded as
value 0
, so choosing it as first position would be value 1
and
so on. If the voter can choose up to 3 candidates, then the base
would be maxChoices+1 = 3+1 = 4
, and thus bases will be
bases = [4, 4, 4]
and choices would be choices = [1, 0, 2]
and the
encoded choices would be calculated as:
/-------- candidate A
| /----- candidate B
| | /-- candidate C
| | |
bases = [4, 4, 4]
choices = [1, 0, 2]
encodedChoices = choices[0] + choices[1]*bases[1] + choices[2]*bases[1]*bases[2]
encodedChoices = choices[0] + bases[0]*(choices[1] + bases[1]*choices[2])
encodedChoices = 1 + 4*(0 + 4*2) = 1 + 16*2 = 33
Invalid Ballot Flag
What was outlined before is the basics, but actually it does not
work exactly like that. The first value (choices[0]
) in the raw
ballot does not really represent the vote for the first candidate
answer, but it's always a flag saying if the ballot was marked as
invalid or not by the voter. Note that this is not the only way to
create an invalid ballot. For example the voter could vote to more
options than allowed, and that would also be an invalid ballot.
We asume the invalid ballot flag is represented in the question
as a answer inside question.answers
and it is flagged by having
an element in answer.urls
as
{title: 'invalidVoteFlag', url: 'true'}
. If this invalid vote
flag candidate is not present, the voter interface will simply not
allow to cast an invalid vote.
Using the last example of a preferential vote, the bases would not
be bases = [4, 4, 4]
but bases = [2, 4, 4, 4]
. The first base
encodes always the invalid flag, whose max value is 1
so the base
is always 2
.
The choices would not be choices = [1, 0, 2]
but, if the vote was
not marked as invalid, choices = [0, 1, 0, 2]
and thus the encoded
choices would be calculated as:
/----------- invalid vote flag
| /-------- candidate A
| | /----- candidate B
| | | /-- candidate C
| | | |
bases = [2, 4, 4, 4]
choices = [0, 1, 0, 2]
encodedChoices = choices[0] + bases[0]*(choices[1] + bases[1]*(choices[2] + bases[2]*choices[3])
encodedChoices = 0 + 2*(1 + 4*(0 + 4*2)) = 2*1 + 2*4*4*2
encodedChoices = 2*1 + 32*2 = 66
Cumulative voting system
In a cumulative voting system, the voter would have a total number of integer points to assign to candidates, and the voter can assign them to the available candidates with a maximum number of options that can be assigned to each candidate.
For example, the voter might be able to assign up to 2 points to each candidate and assign a total of 3 points. In practice, the encoding is done in a very similar format as with preferential voting system. For each candidate, the value we assign is a number that represents the points assigned to the candidate, and the base used is the maximum number of assignable points plus one.
Retaking the previous example used for plurality-at-large and used
for a preferential voting system, if the voter can assign a
maximum of 4 points, and he wants to assign 2 points to candidate
A
and 2 points to candidate C
and he didn't mark his ballot
as invalid, then his choices would be choices = [0, 2, 0, 1]
,
the bases would be bases = [2, 5, 5, 5]
and the encoded choices
would be calculated as:
/----------- invalid vote flag
| /-------- candidate A
| | /----- candidate B
| | | /-- candidate C
| | | |
choices = [0, 2, 0, 1]
bases = [2, 5, 5, 5]
encodedChoices = choices[0] + bases[0]*(choices[1] + bases[1]*(choices[2] + bases[2]*choices[3])
encodedChoices = 0 + 2*(2 + 5*(0 + 5*1)) = 2*2 + 2*5*5*1
encodedChoices = 2*2 + 50*1 = 54
Write-ins
This encoder supports write-ins. The idea of write-ins is that the
voter can choose candidates that are not in the preconfigured list
of candidates. The maximum number of write-ins allowed is
calculated automatically by suppossing the voter tries to
distribute his vote entirely just for write-in candidates, which
is usually question.max
.
The vote for each write-in is encoded using the same procedure as
for normal candidates, in order and as if the write-ins were in
the list of candidates. It asumes all write-ins (even if not
selected) are in the list of candidates and they are flagged as
such simply by an element in answer.urls
as
{"title": "isWriteIn", "url": "true"}
.
For example in a plurality-at-large question example with three
normal candidates A
, B
and C
and two write-ins, where the
voter can choose up to 2 candidates, if the voter wants to cast
a valid ballot to his 2 write-ins, then the bases, the choices
and the encoded choices would be:
/----------------- invalid-flag
| /-------------- A
| | /----------- B
| | | /-------- C
| | | | /----- 1st write-in
| | | | | /-- 2nd write-in
| | | | | |
bases = [2, 2, 2, 2, 2, 2]
choices = [0, 0, 0, 0, 1, 1]
encodedChoices = 1*2^4 + 1*2^5 = 48
Write-in names
Of course that's not a complete ballot with write-ins. If the voter
voted to the write-ins, we would also have to encode the free text
string of the name of the write-ins. This is done by converting the
text from UTF-8 to numeric bytes, and encoding each byte using
2^8 = 256 as a base. The separation between the different write-in
names is done using an empty byte (so choices[x] = 0
).
So if in our case the name of the voter's two write-ins is D
and
Ed
, and knowing that character D
is encoded as number 68
, E
is 69
and d
is 100
, then the bases, the choices and the
encoded choices would be:
/----------------- invalid-flag
| /-------------- A vote
| | /----------- B vote
| | | /-------- C vote
| | | | /----- 1st write-in vote
| | | | | /-- 2nd write-in vote
| | | | | |
| | | | | | /---- 1st write-in name ("D")
| | | | | | |
| | | | | | | /---- 2nd write-in name ("Ed")
| | | | | | | |
| | | | | | /----\ /---------\
| | | | | | | | | |
bases = [2, 2, 2, 2, 2, 2, 256, 256, 256, 256, 256]
choices = [0, 0, 0, 0, 1, 1, 68, 0, 69, 100, 0 ]
encodedChoices = 1*(2^4) + 1*(2^5) + 68*(2^6) + 69*(2^6)*(256^2) + 100*(2^6)*(256^3) = 107663593776
Even if the voter voted to no write-in, we still encode each of the write-ins
with a 0 choice. So for example if the voter voted to the first candidate (A
in our example), the full encoding would be:
/----------------- invalid-flag
| /-------------- A vote
| | /----------- B vote
| | | /-------- C vote
| | | | /----- 1st write-in vote
| | | | | /-- 2nd write-in vote
| | | | | |
| | | | | | /---- 1st write-in name (empty string)
| | | | | | |
| | | | | | | /---- 2nd write-in name (empty string)
| | | | | | | |
| | | | | | | |
bases = [2, 2, 2, 2, 2, 2, 256, 256]
choices = [0, 1, 0, 0, 0 0, 0, 0 ]
encodedChoices = 1*(2**1) = 2