This paper shows how scalar blinding can provide protection against side channel attacks when performing elliptic curve operations with modest cost, even if the characteristic of the field has a sparse representation. This may indicate that, for hardware implementations, random primes might not have as large of an advantage over special primes as previously claimed.

Elliptic curves are a useful tool within cryptography. An Elliptic Curve is a mathematical group, and some Elliptic Curves have this useful property: given a group member (point) \(G\) and an integer \(k\), the point \(H = kG\) can be computed in time proportional to \(\log k\); however given two points \(G, H\), computing the integer \(k\) such that \(H = kG\) takes time proportional to \(\sqrt{k}\) (using the best known algorithm). By selecting \(k\) (and the Elliptic Curve) to be an appropriate size, we can make finding \(H\) given \(k\) and \(G\) (known as point multiplication) relatively quick, while making finding \(k\) given \(H\) and \(G\) (known as the discrete logarithm problem) infeasible.

The most common Elliptic Curves used in practice are defined over a prime field \(GF(p)\), for a large (perhaps 256 bit) prime \(p\) (the characteristic) that we pick when we generate the curve. One thing this means in practice is that when we compute a point multiplication \(kG\), we spend the majority of the time computing the modular multiplication \(a \times b \pmod p\) for two values \(a, b \in GF(p)\).

To accelerate this operation, one approach is to select a prime of the form \(p = 2^e - c\), where \(c\) has a simple representation in binary, and is considerably smaller than \(2^e\). This allows us to accelerate the computation of the modular reduction by taking advantage of the identity: \[a \cdot 2^e + b \equiv a \cdot c + b \pmod{2^e-c}\] If the binary representation of \(c\) is simple enough, we can compute \(a \cdot c\) without doing a full multiply, and hence we can compute this modular reduction significantly faster than we could for an arbitrary prime. This allows us to compute the modular multiplication of two numbers in not much more time than it takes to perform a bignum multiplication of those two numbers. Examples of Elliptic Curves that allow this optimization include the so-called NIST curves(NIST 1999), Curve25519(Bernstein 2005), and the Microsoft NUMS curves(B. Black 2014).

If we instead select a random prime without such a special structure (such as was done when defining the Brainpool curves(Lochter 2010)), there are still some optimizations we can do beyond the obvious ’perform a multiply, and then perform a generic modulo reduction’. We can implement Montgomery Multiplication, which replaces the modulo operation with some multiplies and shifts; the net result is that a multiplication followed by a modular reduction can be done in the time of approximately two bignum multiplications; in other words, modular multiplication of a special form prime can be done approximately twice as fast as an arbitrary prime.

If this were the only consideration, the choice of whether we should use a prime with special structure would be an obvious one. However, there is another issue. Sometimes, Elliptic Curves are implemented by hardware that needs to operate in hostile environments, and can be expected to be subject to side channel attacks, such as Differential Power Analysis. In these types of attacks, the cryptanalyst runs the system, performing the same operation repeatedly, and takes careful measurements of power consumed (or EM radiation emitted) on a cycle-by-cycle basis; by statistically combining these measurements, the attacker hopes to recover the internal states (which includes the private key).

To combat these sorts of attacks, one of the strategies that we need to employ is blinding; we include random data in our computations, and while the end results is independent of the random value, the intermediate values are strongly dependent, and thus the correlations between the intermediate states and anything that the attacker wants (such as the private key) is much weaker.

One such method of blinding Elliptic Curve calculations (first published by Coron(Coron 1999)) takes advantage of a property of Elliptic Curve groups; we know an integer \(n\) such that \(nG = 0\) (this value \(n\) is known as the order of the point \(G\)). Coron’s method to compute \(kG\) would be to select a random value \(t\) and computing first \(nt+k\), and then \((nt + k)G\). Everytime we would perform a point multiplication, we would select a random \(t\), and hence the bits of the integer we’re giving to the point multiplication logic are independent of the integer \(k\) we’re actually multiplying by. Because the time taken by the point multiplication is proportional to the log of the integer, this blinding method increases the time by a value proportional to be size of \(t\); if we select (for example) a 64 bit \(t\), this increase is relatively small compared to the time we would have taken computing \(kG\) anyways.

However, as observed in (Ciet 2003), this straight-forward approach turns out not to work as well from primes with special structure. The order of the curve is always within the Hasse Interval; that is, we have: \[p + 1 - 2\sqrt{p} \le hn \le p + 1 + 2\sqrt{p}\]

where \(h\) is the cofactor of the curve, and is usually a small power of 2. What this implies is that \(n \approx p/h\), and if the upper bits of \(p\) have a sparse structure, then the upper bits of \(n\) will as well. In other words, if \(p\) is a special structure prime, and if \(t < \sqrt{p}\), then some of the bits of \(nt+k\) will be strongly correlated to some bits of \(k\), and hence this supposed blinding operation does leak some information about \(k\). This would appear to imply that primes with special structure would require significantly larger \(t\) values than random primes. And because the time taken to do a point multiplication is proportional to the length of the integer being multiplied, this would appear to imply that primes with special structure can be slower than random primes when implemented on hardware.

One common way to compute the point multiplication \(kG\) is to express \(k\) in base \(b\), as: \[k = d_i b^i + d_{i-1} b^{i-1} + d_{i-2} b^{i-2} + ... + d_2 b^2 + d_1 b^1 + d_0 b^0\] and then perform the computation: \[kG = d_0G + b \cdot ( d_1 G + b \cdot (d_2 G + ... + b \cdot(d_{i-1}G + b \cdot (d_i G)))...)))\]

In the straight-forward way, this takes \(b-2\) additions to evaluate the values \((0G, 1G, 2G, ..., (b-1)G)\), and then \(i\) cycles of multiplying an intermediate point by the small integer \(b\) and adding the point that corresponds to the next digit. There are a number of variants to this approach, both to try to achieve constant time, and to reduce the number of additions required (for example, by using the digits in the range \((-b/2, b/2)\), taking advantage of the fact that we can compute the inverse \(-G\) cheaply within an Elliptic Curve group).

The obvious choice is to make \(b\) a power of two (so \(b = 2^m\)); this yields two immediate advantages:

If \(k\) is already expressed in binary, the decomposition into the form \((d_i, d_{i-1}, ..., d_0)\) is just extracting bits

The operation of multiplying a point by \(b\) can be efficiently done by doing \(m\) point doublings

However, if we look at that value of \(n\) expressed in such a base \(b\) if the prime has special structure, we see a regular pattern. For example, the value of \(n\) for the Elliptic Curve Curve25519 (which has the special form prime \(2^{255}-19\)) expressed in base \(b=32\) is:

04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 27 27 28 29 29 08 23 23 19 19 11 05 16 04 19 03 03 09 14 15 11 20 31 13

Note the long string of zero’s at the beginning; these are what makes scalar randomization less effective. As one might expect, \(tn \approx t2^{252} + t2^{124.4}\), and if \(t < 2^{128}\), then bits 251 and below of \(k + nt\) will be strongly correlated to the corresponding bits of \(k\) (because the bits of \(nt\) with nontrivial contributions to those bits of the sum will be zero). Other special form primes don’t have quite as striking of a form (I chose Curve25519 because the form of its \(n\) makes it quite obvious), but the other special form primes also have long strings of 0’s or \(b-1\) digits at the beginning, which yields the corresponding weakness.

However, let us consider what happens if we consider a \(b\) which is not a power of 2. For example, if we were to take the same \(n\) expressed in base \(b=48\), we get:

01 28 34 41 23 00 42 31 16 04 20 44 19 13 01 16 37 17 42 41 16 36 22 39 40 06 14 29 09 44 17 13 24 04 31 28 46 05 19 16 46 07 18 12 42 13

Here, we don’t get any regular pattern, and this value would, at first glance, appear random. And, in fact, if we look at the digits of \(k+nt\) (for modest random \(t\)) expressed in base 48, we don’t detect any correlation between the digits of that and the digits of \(k\) expressed in base 48. This implies that blinding with this value (in base 48) is likely as effective as blinding based on a random prime in base 32.

This is not only true of Curve25519; the same thing happens for the special prime curve P256, which has \(n\) in base \(b=32\) as:

01 31 31 31 31 31 31 16 00 00 00 00 00 03 31 31 31 31 31 31 31 31 31 31 31 31 29 28 28 27 29 10 27 09 24 23 19 26 02 15 07 14 14 10 24 11 30 06 06 09 10 17

(where the streaks of 31’s and 00’s cause a similar effect to the long string of 00’s in the Curve25519 \(n\)). In contrast, in base \(b=48\), it is:

25 27 29 39 32 12 33 29 24 47 03 05 12 17 33 24 45 02 26 34 11 37 13 27 43 25 45 05 37 02 04 08 39 09 42 06 04 27 15 01 47 30 12 25 23 01

Of course, once we consider nonpower-of-2 bases, we lose the two advantages that we formally had; let us examine how we can accommodate the loss of these two advantages.

The first thing we want to look at is how much more expensive it is to use a base which is not a power of 2. After all, a base which is a power of 2 allows us to implement the fixed multiplication by \(b\) by using a handful of doublings, which is the most efficient method possible. However, it turns out there are other bases that are almost as cheap.

To make a concrete comparison, we’ll outline the costs with both a power-of-2 base, and a nonpower-of-2 base. In both cases we’ll assume that we’re dealing with an Elliptic Curve with a 256 bit subgroup, and that we’ll use a 64 bit blinding value \(r\), yielding an exponent which is 320 bits long. We’ll use the radix method, using a balanced representation of the digits (that is, the digits are values in the range \([-b/2, b/2]\), and we’ll assume that the addition-by-0 is masked somehow (whether the low-level addition routines handle it without a special case, or because in that case we’ll add by an arbitrary point and discard the result).

To implement this using radix \(b=32\) (which is optimal over all powers-of-2 in this scenario), we’ll first compute the digits \((-16G, -15G, ..., 15G, 16G)\) with 7 doublings, 7 additions^{1} and 15 negations^{2}. Then, we implement the actual addition chain; in this case, 320 bits is 64 digits^{3}; this is 63 rounds of multiplying the current point by 32 (which is 5 doublings), and adding in the next digit (which is a single addition); this step takes us 315 doublings, and 63 additions, for a total of 322 doublings, 70 additions, and 15 negations.

Now, let us look at the radix \(b=48\) case; computing the digits \((-24G, -23G, ..., 24G)\) requires 11 doublings, 11 additions and 23 negations. Then, we implement the actual addition chain; in this case, 320 bits can be expressed in 58 base-48 digits; this is 57 rounds of multiplying the current point by 48 (which can be implemented by 5 doublings and an addition), and adding in the next digit (which is a single addition); this step requires 285 doublings, and 114 additions, giving us a grand total of 296 doublings, 125 additions and 23 negations.

If our Elliptic Curve representation makes addition as cheap as doubling (which some do), and we ignore the negations (which are comparatively cheap), then the base-32 method turns out to be 7.3% faster than the base-48 method. If we instead assume that a doubling is 80% of the cost of an addition (another common assumption), then the base-32 method turns out to be 10.4% faster than the base-48 method. In other words, from this perspective, we can implement the blinding on a special format prime, and be within 7-10% of the performance of a random prime. These results are fairly stable if we tweak our assumptions (e.g. change the size of the group order or the size of \(t\))

When we multiply by a fixed point (for example, the curve generator \(G\)), one common optimization is to precompute various multiples

## Share on Social Media