acm-header
Sign In

Communications of the ACM

Research Highlights

On the (In)Security of ElGamal in OpenPGP


turnstile jumper

Credit: MTA

Roughly four decades ago, Taher ElGamal put forward what is today one of the most widely known and best understood public key encryption schemes. ElGamal encryption has been used in many different contexts, chiefly among them by the OpenPGP email encryption standard. Despite its simplicity, or perhaps because of it, in reality there is a large degree of ambiguity on several key aspects of the cipher. Each library in the OpenPGP ecosystem seems to have implemented a slightly different "flavor" of ElGamal encryption. While-taken in isolation-each implementation may be secure, we reveal that in the interoperable world of OpenPGP, unforeseen cross-configuration attacks become possible. Concretely, we propose different such attacks and show their practical efficacy by recovering plaintexts and even secret keys.

Back to Top

1. Introduction

The 1984 ElGamal cryptosystem9 is one of the oldest and best-known public key encryption schemes. In the 80s and 90s, it earned wide adoption for being simultaneously efficient and patent-free. Its most prominent use is in OpenPGP,6 a standard for consumable and interoperable email security, where it has been the default and most popular encryption option for decades. At the time of writing, still at least one in six registered OpenPGP keys have an ElGamal subkey, with about 1000 new registrations per year.

The ElGamal scheme builds on elegant mathematical structures and can be defined very compactly. This simplicity, together with the opportunity to mature for roughly four decades now, suggests that a crisp specification with clear parameter choices, rules, and algorithms would be present in international standards, in particular in OpenPGP. Surprisingly, this turns out not to be the case: Our research reveals that OpenPGP's understanding of ElGamal encryption is open to interpretation, with several choices subject to the discretion of the implementer.

In this paper, we consider cross-configuration attacks on OpenPGP. Such attacks emerge when different interpretations ('configurations') of the same standard interact insecurely with each other. Toward identifying such conditions for ElGamal encryption, we need to first understand the universe of OpenPGP interpretations that are used in practice. We approach this challenge from various angles: We carefully study RFC48806 which defines OpenPGP, we inspect the source code of a number of relevant OpenPGP-implementing software libraries and we conduct a large-scale examination of millions of keys registered on OpenPGP key servers.

Our results reveal an insecure posture. For instance, we develop a plaintext recovery attack that can be mounted on ciphertexts produced by the ubiquitous GNU Privacy Guard against keys generated following the original ElGamal specification.9 The attack is effective against 2048-bit keys, a bit length that is considered practically secure today. We further illustrate how cross-configuration attacks can be combined with known side-channel exploitation techniques like FLUSH+RELOAD23 or PRIME+PROBE.20 Concretely, using the example of gcrypt—the cryptographic library underlying the GNU Privacy Guard that has already been fixed twice after seminal work14, 23 on side-channel attacks—we demonstrate that if standards conform 2048 bit ElGamal key is used to decrypt a cipher-text, then an attacker that is OS- or VM-colocated with the decrypter might be able to fully recover the decryption key. Given that interoperability is the explicit and almost exclusive goal of any standardization effort, and common-place in the OpenPGP world, we conclude that our cross-configuration attack conditions are as realistic as the attack results in awakening.

This manuscript is organized as follows: In Section 2, we survey (a) the meaningful options available when implementing ElGamal encryption, (b) the options adopted by different cryptographic libraries, and (c) the options picked by over 800,000 users in practice (as far as reflected on key server databases). In Section 3, we recall various standard algorithms for solving discrete logarithms. In Section 4, we describe "vanilla" cross-configuration attacks, and in Section 5, we describe those combined with side-channel attacks. In Section 6, we conduct end-to-end exploits and describe how we bring in the required side-channel information. We conclude in Section 7.

* 1.1. Related work

Since ElGamal encryption was first proposed, a number of research results revealed security issues, both in implementations (e.g., CVE-2018-6829, CVE-2018-6594) and at the specification level. For instance, Bone et al.5 identified security conditions if ElGamal encryption is used in textbook form. However, as OpenPGP employs ElGamal exclusively for key transport, the attacks do not seem to apply in practice. To the best of our knowledge, our work is the first to challenge the security of ElGamal encryption in the specific way OpenPGP builds and relies on it.

Protecting modular exponentiation from side-channel attacks has been the subject of research for years.2,10,11,15 Specifically, cache-based side channels have been explored extensively, using various techniques such as EVICT+TIME,4 PRIME+PROBE,20 or FLUSH+RELOAD.3, 23 Yarom and Falkner23 in particular target the gcrypt implementation of modular exponentiation which at the time was using square-and-multiply. Prompted by that work, the implementation was significantly revised. The changes are however not sufficient to protect against the attack we present in Section 5. Most cache-based side-channel attacks target L1 or L2. However, Liu et al.14 show that exploitation based on last-level cache is possible for attacks targeting both data and instructions. The work also targets the gcrypt implementation of modular exponentiation and the library has also been fixed to avoid leaking table accesses. Once again, however, this is not sufficient to prevent the attack described in Section 5.

Back to Top

2. Elgamal Encryption

After providing an abstract formulation of ElGamal encryption, we present common ways to instantiate the scheme in practice.

Let G be a group and gG a generator. To create a key pair (sk, pk), pick a random integer x, compute the element X = gx, and output (sk, pk) := (x, X). Given pk, to encrypt a message M, pick an ephemeral random integer y, compute the elements Y = gy and Z = Xy = gxy, and output C = (C1, C2) := (Y, M · Z) as the ciphertext. Given sk, to decrypt C, first recover element Z from C1 as per Z = Yx = gyx and then use C2, Z to recover M = C2/Z.

To instantiate the scheme, the following details have to be fixed: Which group G shall be used? How is generator g chosen? Shall g generate the full group G or just a subgroup? From which sets and according to which distributions are exponents x, y picked? Multiple configurations for the parameters are possible and promise to lead to correct and secure public key encryption instances. Below we describe four such configurations that have in common that G is a "prime field group," that is, the multiplicative group ℤ*p = (ℤ / pℤ)* of a field ℤ/pℤ where p is a large prime number (also referred to as the modulus). In such cases the order of G is given by ord(G) = p − 1 and the order of any subgroup G'G is an integer divisor of p − 1.

Configuration A: "The original". The original proposal9 demands choosing g such that it generates p − 1 elements, that is, the full group. Exponents x, y are picked uniformly from [1 … p − 1]. The only condition on p is that p − 1 has at least one large prime factor [This is to counter the Pohlig-Hellman attack on Discrete Logarithm Problem (DLP).]

If Configuration A is used, any element in G could become a public key. However, some of these public keys would have a lower multiplicative order than others, that is, generate a smaller subgroup, and thus promise less security. (E.g., in the extreme case of X = 1, the encryption operation does effectively nothing.) This can be remedied by restricting the ElGamal group operations to a prime order subgroup G'G. Indeed, if G' = 〈g〉 has prime order, then all its elements necessarily have the same order (with the one exception of the neutral element which is easily tested for). Let hence p − 1 = q0qn be the prime factor decomposition of ord(G). As p is a large prime number and hence odd, we know that one of these prime factors is 2, and we hence w.l.o.g. write p − 1 = 2q1qn. Now, for any prime q = qi in this list of factors there exists a subgroup G'G of order q. The idea of using such a group G' in cryptography goes back to Schnorr.18

Configuration B: "ElGamal over Schnorr groups". Pick a large prime group order q and a prime p such that q | (p − 1), choose a generator γ with 〈γ〉 = G and let g = γ(p−1)/q and G' = 〈g〉. Note that this implies ord(G') = q. Pick exponents x, y in [1 … q − 1].

This setup satisfies the Pohlig–Hellman condition and ensures that all non-trivial elements of G' have the same order. Further, if q << p, as exponents x, y are now picked from [1 … q − 1] instead of [1 … p − 1], implementations of Configuration B are very efficient.

While Configuration B removes the risk of accidentally relying on small order elements, an adversary can still exploit the existence of small subgroups in an active attack: Observe that for any t with t | (p − 1) there exists an element α of order t, that is, with αt = 1 and 1 ∉ {α, α2, …, αt−1}. Fix such a pair (t, α), let C = (C1, C2) be an honestly generated ciphertext as above, and assume an adversary intercepts C and re-injects it as cacm6606_q.gif, for h ∈ [1 … t]. When processing C', the decrypter obtains cacm6606_r.gif. Note that M'M iff αh-x = 1 iff h ≡ x (mod t); otherwise M'M. That is, by testing whether the injected ciphertext C' properly decrypts, the adversary can confirm or refute a guess on x (mod t). Recovering x (mod t1), x (mod t2), … for carefully picked t1, t2, … allows recovering the full exponent x via the Chinese Remainder Theorem. As for each t−value the adversary has to request up to t decryptions, small t-values are a precondition for the attack to be effective. So is the existence of sufficiently many suitable t-values. Hence, one countermeasure against the attack is to ensure that the number of distinct divisors t of p − 1 is very small, and another is to arrange that all admissible t-values are very large. Configurations C and D formalize these ideas and can be traced back to Pollard17 and Lim and Lee.13

Configuration C: "ElGamal over safe primes". The number of subgroups is minimized: Pick a large prime group order q and a prime modulus p such that p − 1 = 2q, choose a generator γ with 〈γ〉 = G and let g = γ(p−1)/q and G' = 〈g〉. Pick exponents x, y in [1 … q − 1]. Note that pq. Choosing g = 4 = (±2)2 is always feasible, leading to smaller parameters.

Configuration D: "ElGamal over Lim-Lee primes". All non-trivial subgroups are large: Pick a large prime modulus p such that for the prime factorization p − 1 = 2q1qn all qi have a similar, large bit length. (Breaking DLP is then hard for all group orders qi.) Choose a generator γ with 〈γ〉 = G and let g = γ(p−1)/q and G' = 〈g〉. Pick exponents x, y in [1 … q − 1].

As exponents in Configuration C became large again, some implementations work with "short exponents" for improved performance. Configuration D maintains good performance without such tricks. However, its parameters are larger. When building a secure system from scratch, it seems advisable to use one of these two options.

* 2.1. ElGamal in OpenPGP

The standard. A widely-deployed cryptography standard that suggests using, and mandates implementing, ElGamal encryption is OpenPGP. While its first version was put forward in 1998, the latest official version appeared in 2007 as RFC4880.6 We observe that the RFC refers to external documents for the specification of the ElGamal cryptosystem. Precisely, Section 9.1 provides two references that both ultimately resolve to ElGamal's original paper,9 that is, our "Configuration A". We note that Sections 5.1 and 5.5, which define how ElGamal ciphertexts, public keys, and secret keys are to be represented as binary strings, add no further conditions on the generation and use of ElGamal keys and ciphertexts.

OpenPGP libraries. We inspected how OpenPGP libraries interpret the standard. Here are our findings on three popular ones:

  • Go Go does not provide code to generate ElGamal keys; it only provides algorithms to encrypt and decrypt. The ephemeral exponent y used in encryption is chosen from [0 … p − 1]. This fully conforms with RFC4880.
  • Crypto++ The Crypto++ library follows Configuration C by generating a random safe prime and choosing for the generator the smallest quadratic residue. It though deviates from Configuration C by picking both x and y from a short interval of size roughly twice the security parameter. (See the full version7, 8 for details.) As the group generator is not primitive, this setting conflicts with RFC4880.
  • gcrypt The gcrypt library generates a Lim-Lee modulus like in Configuration D; the minimum size of the prime factors qi | (p − 1) is roughly twice the security parameter. (See the full version7, 8 for details.) Unlike in Configuration D, the generator is chosen to be the smallest integer generating the full group ℤxp. Both exponents x and y are sampled from short intervals of size roughly q3/2. Due to the short exponents, this setting conflicts with RFC4880.

ElGamal parameters in the wild. To get a complete picture of the use of ElGamal in OpenPGP, it is not sufficient to look at the prominent open-source libraries. Indeed a large portion of the user base relies on proprietary or exotic implementations, which are impossible to track. To address these, we look at the OpenPGP keys registered on public key servers. We analyze an OpenPGP server dump1 produced on Jan 15, 2021 subkeys.

An OpenPGP ElGamal public key consists of the triplet (p, g, X). This information alone is not sufficient to ascribe the key to one configuration with certainty, however, partial information can be deduced by attempting to factor p − 1. For example, safe primes are easily recognized by running a primality test on (p − 1)/2, then a quadratic residuosity test reveals whether g generates the prime order subgroup or the full group. For random primes of the sizes we look at, it is in general infeasible to obtain a full factorization of p − 1, however, partial factorizations and residuosity tests let us at least formulate credible hypotheses on the key generation process.

To classify public keys, we conducted trial division on p − 1 with primes up to 225, then repeatedly applied the elliptic curve factorization method (ECM),12 until we felt guilty for the carbon emissions. Then we applied n-th residuosity tests to g for the factors we found. We did not attempt to gain information on the exponent x that defines X = gx. In our findings, 69.4% of the keys use safe primes, 25.3% use moduli that were very likely generated using the Lim–Lee method, and 5.0% use what we call "quasi-safe primes", that is, primes of the form p = Q · q + 1 where q is an unusually large prime (0.988 times the size of p on average) and Q > 2. Finally, there are only 2158 moduli (less than 0.4%) for which we found non-trivial factors, but we were not able to finish the factorization, indicating that they were either chosen at random or like in Configuration B. These would almost be irrelevant if it was not for the attack we describe in Section 4.

Back to Top

3. Computing Discrete Logarithms

In the next sections, we will need to solve discrete logarithms given partial knowledge of the exponent. We review here the necessary algorithms. In what follows, let g be a group generator of order N, and let X = gx be the element of which we seek the discrete logarithm.

Pollard's Rho algorithm17 is the most efficient generic algorithm to compute discrete logarithms. On average, it performs cacm6606_s.gif group operations and uses a constant amount of memory. Pollard17 also introduced the lesser-known Lambda method, which performs better when it is known that x < B << N, requiring only cacm6606_t.gif group operations and O(log(B)) memory (Section 5.1 of van Oorschot and Wiener22).

When N = LM, we can compute x mod L by solving a discrete logarithm in the group of order L generated by gM. The Pohlig–Hellman algorithm16 applies this to all prime factors of N, and then recovers x via the Chinese Remainder Theorem (CRT).

We will combine all of these techniques in the case where N = q0 ··· qn, and x < B. Assume q0 < ··· < qn, and let Q = q0 ··· qn−1. We first compute x mod qi for 0 ≤ i < n using Pollard's Rho, then use the CRT to compute w := x mod Q. Thus gx = gzQ+w for some unknown z < ⌈B/Q⌉. Finally, we recover z as the discrete logarithm of gy/gw to base gQ, using Pollard's Lambda. The total cost is cacm6606_u.gif time and O(log(B/Q)) storage. We stress that this strategy is only better than Pohlig-Hellman when B << N. It was proposed by van Oorschot and Wiener to reduce the security of variants of Configuration A that use short exponents for public keys.21 We will use it, instead, to recover ephemeral secrets in cross-configuration scenarios described in Section 4.

In Section 5, we will need to solve discrete logarithm instances where some non-adjacent bits of x are known. Neither the Rho nor the Lambda method can take advantage of this information, but the simpler baby-step/giant-step (BSGS) method19 can. If x has n unknown bits, BSGS performs 1.5 · 2n/2 group operations on average, and stores 2n/2 hash table entries. A linear time/memory trade-off is possible in BSGS.

However, as n becomes larger, BSGS has two important drawbacks: it uses unrealistically large amounts of memory, and it parallelizes poorly. A better alternative is van Oorschot and Wiener's (vOW) parallel collision search applied to meet-in-the-middle algorithms (Section 5.3 of van Oorschot and Wiener22), which is much more memory efficient, and promises a linear parallel speed-up. Based on their analysis, vOW is expected to require 7 · 23n/4-m/2-1n group operations, where 2m is the amount of storage available, counted as a number of hash table entries, and subject to the constraint mn/2.

Back to Top

4. Cross-Configuration Attacks

The disagreements on the interpretation of the OpenPGP standard may raise doubts about the interoperability between the libraries. For instance, in an imaginary setting where the Crypto++ code is used to generate a key pair, the Go code is used to encrypt a message to the public key, and the gcrypt code is used to decrypt the ciphertext, it has to be asked whether confidentiality is maintained. While in a basic scenario these three libraries can, to the best of our knowledge, in fact, interoperate securely, we shall now see that some choices made by Crypto++ and gcrypt prove to be fatal in a broader context.

As van Oorschot and Wiener21 have already observed: if (1) p − 1 contains enough small factors, and (2) g generates the full group (e.g., in Configuration A), then using short exponents in the public key X = gx can lead to key recovery, as described in Section 3. As we have seen, Crypto++ uses safe primes, so it is not at risk. Although gcrypt generates p − 1 with many distinct factors, none of these is small, and it is thus also safe.

However, both Crypto++ and gcrypt also use short exponents in the ephemeral key Y = gy. But the generator g is part of a public key that may have been generated by a different library, possibly one adhering to Configuration A. Our analysis of registered keys shows that such public keys, albeit rare, do exist. This immediately leads to a message recovery attack: given a ciphertext (gy, M·gxy), the attacker finds y, uses the public key to compute gxy, and recovers the cleartext M.

To recap, whenever: (1) the public key of the receiver defines a generator g of a group containing "enough" small-order subgroups, and (2) the sender uses short exponents in the ephemeral key, an attacker who intercepts messages can decrypt any communication from sender to receiver. The exact cost of the attack depends on the number and size of the small subgroups, which we analyze next.

* 4.1. Practical exploitation

Like in Section 3, let N = q0 ··· qn =: Qqn be the group order of a public key, with qn not necessarily prime. Assume that ephemeral exponents for encryption are sampled from a short interval [0 … B[, with B << N. As we saw, the cost of recovering the exponent depends on two factors: the largest divisor qn−1 of Q, and B/Q (note that, while Q and the qi's depend on the receiver's public key, the bound B is set by the sender).

To quantify the threat, we look at these parameters for those groups in the OpenPGP key dump that contain small factors. None of the "quasi-safe" moduli are seriously at risk, since Q is too small. From the remaining 2158 keys, we narrow down the selection to those that had 2048 bits modulus, were not created before 2016, and were not expired nor revoked, leaving us with 2071 keys. In Figure 1, we plot the (normalized) cumulative counting function

f1.jpg
Figure 1. Normalized cumulative distribution function Φ(λ,ρ) of group orders N that contain a 2ρ-smooth factor Q of at least λ bits.

ueq01.gif

which counts, among the 2071 keys, how many are exposed to message recovery with a Rho effort at most cacm6606_v.gif and a Lambda effort at most cacm6606_w.gif. Note that the factors of a single group order can be arranged in more than one way, and thus appear in more than one bucket.

From ϕ(λ,ρ), we obtain the number of keys for which message recovery is possible with a given effort with respect to a given encrypting library. For example, for 2048-bit moduli gcrypt implements the sampling of y such that y < 2344: limiting the Lambda and Rho efforts to roughly 240 modular multiplications, a computation well within reach of a casual attacker, we find that there are at least ϕ(344-80, 80) = 15 vulnerable keys. Even more concerning is the case of Crypto++, which samples exponents in [1 … 2226]: in this case, the number of weak keys goes up to ϕ(226-80, 80) ≥ 231. It is plausible that a state-level attacker could target as many as ϕ(344-140, 140) ≥ 83 (gcrypt) and ϕ(226-140, 140) ≥ 895 (Crypto++) keys.

To confirm the vulnerability we used GPG to encrypt a message to the only key contributing to ϕ(344-56, 59), expecting to complete the attack using roughly 230 modular multiplications. We were able to recover the plaintext in less than 2.5 hours on a single Intel E5-2640 core clocked at 2.40GHz.

Back to Top

5. Attacks Enabled By Side-Channels

The attack conditions exploited in Section 4 did not depend on implementation flaws but were solely implied by different implementers coming up with different conceptual interpretations of ElGamal encryption. The current section considers cross-configuration attacks that are also based on implementation weaknesses. We focus specifically on runtime leakage of the modular exponentiation algorithm of gcrypt, which represents a crucial component of its ElGamal implementation. Specifically, an exponentiation algorithm (EA) computes R = Bx from B and x, where base B is an element of a cyclic group and exponent x is a non-negative integer. While the gcrypt implementation took explicit measures to prevent side-channel leakage (in particular, to protect exponent x), by exploiting a condition that the authors allegedly overlooked we present a key recovery attack that significantly reduces the security margin of gcrypt. In isolation, that is, not in the cross-configuration setting, the attack is only practically exploitable against gcrypt-generated keys with moduli up to 1024 bits. However, we show that the vulnerability is more worrisome in an interoperability context where gcrypt decryption is used in combination with a Crypto++ key: in this case, we experimentally confirm a practical key recovery attack against a non-negligible fraction of Crypto++ keys with 2048 bits modulus. In the full version,7,8 we provide similar analyses of the EAs of the Go and Crypto++ libraries.

The gcrypt library implements its EA via the sliding-window method, which is derived from the basic Square-and-Multiply algorithm. The algorithm depends on a parameter w (for "width" or "window length") that is instantiated such that 2 ≤ w ≤ 5. The first step of the EA is independent of exponent x and tabulates the values B1, B3, …, B2w−1. The second step is independent of base B and regroups the exponent bits into an odd-digit representation (ODR). Precisely, an ODR of a non-negative integer x is a sequence Γ = (γl, …, γ0) such that

ueq02.gif

The exponent x is first converted into ODR form that has leading coefficient γl { = 1,a the ODR is then encoded into a sequence (e0, c1, e1, c2, e2, …, cL, eL, cL+1) such that its odd-index subsequence (e0, e1, …, eL) coincides with the support (γj)jj≠0 of Γ (i.e., the set of non-zero elements; note here we have e0 = γl = 1, and for all ei we have ei ∈ {1, 3, …, 2w−1}), and the values c1, c2, …, cL+1 correspond with the lengths of the vanishing subsequences of Γ (i.e., the contiguous runs of zero elements). Precisely, the encoding is such that

eq01.gif

The third step of the EA is a computation-intensive online phase in which the results of the first two steps are combined.

We provide a pseudo-code version of gcrypt's EA in Figure 2. The function fw invoked in line 2 (and specified in the full version7,8) computes the exponent encoding (e0, c1, e1, c2, e2, …, cL, eL, cL+1) as follows: it sets e0 = 1, then alternates between (a) advancing through the bit sequence of x (from most to least significant bit) until it encounters a bit set to 1, and (b) greedily outputting the largest possible pair (ci, ei) subject to ei ∈ {1, 3, …, 2w−1}. Also lines 5–8 might require further explanation: The code is functionally equivalent with 'For j ← 1 to ci: RR · R' followed by 'RR · T[ei]', but the implementation reduces side-channel leakage by hiding for each multiplication whether the factor W is R or T[ei], and, in the latter case, which index ei is used for the table look-up.b

f2.jpg
Figure 2. Core of exponentiation routine of gcrypt.

Despite its built-in protection measures, we identify a side-channel condition in gcrypt's EA that can lead to full exponent recovery. The root of the problem is that the algorithm implements sliding-window exponentiation. Intuitively, this method covers the digits of the bit-representation of the exponent x with a sequence of w−wide non-overlapping windows such that every 1-bit of the exponent is covered by a window and conditioned on this the number of windows is minimized. In Equation (1), value L corresponds with the number of windows used for this, any coefficient i corresponds with the position of the i-th window, and any coefficient ei encodes the w exponent bits that the i-th window covers. In the EA implementation, processing a window corresponds with a multiplication RR · T[ei] while bridging a gap between two windows corresponds with a sequence of squaring operations RR·R. These operations are jointly implemented in lines 7,8, and each iteration of the loop of line 4 processes one gap-window pair.

The approach of our side-channel attack is to closely monitor the execution of line 4: The number of iterations of the loop during one exponentiation immediately reveals the encoding length L, and the time taken for the i-th iteration is, by line 6, linear in ci + 1 and thus leaks, one by one, the coefficients c1, …, cL.c Finally, we can recover the coefficient cL+1 by similarly monitoring the loop of line 10. That is, if (e0, c1, e1, …, cL, eL, cL+1) = fw(x) is the encoding of exponent x then our measurements reveal all the ci components while, for now, the ei components remain hidden.

Before assessing the options for amplifying the extracted information toward recovering the full exponent, we make a final observation on gcrypt's EA. Recall that line 7 tries to hide both the table index ei and whether the multiplication in line 8 is with T[ei] or R. However, as our method recovers the coefficients ci straightaway, and exclusively the last iteration of the loop of line 6 will perform multiplication with W = T[ei] (rather than with W = R), line 7 de facto just hides the ei coefficients. Thus, replacing lines 6–8 with the instructions "For j ← 1 to ci: RR·R" followed by "Securely RR · T[ei]" would result in code that is equivalent from a security perspective yet simpler and more efficient.d We, thus, conclude that the authors of gcrypt were likely not aware of the fact that the coefficients c1, …, cL+1 leak so easily.

Now that we retrieve copies of the coefficients c1, …, cL+1, the next subsections are dedicated to assessing the value of this information toward an attack that recovers the full exponent.

* 5.1. Modeling the leakage

As a warm-up, observe that when w = 1 the coefficients ci leak the whole exponent, indeed all ei are necessarily equal to 1 in this case. For larger values of w, however, a search over the possible values of ei is necessary. Our goal here is to quantify this partial leakage.

Letting n+1 = ⌈log2(x+1)⌉ be the bit length of x, we start by observing that n = Σci and is thus fully known. The sequence of ci's indicates the position of each ei in the binary writing of x, and from the description of fw we know that ei is odd and ei < min(2w, 2ci), leaving us with min(w, ci) − 1 unknown bits in ei. Note that there is no eL+1 associated to cL+1, which indeed leaks the number of trailing zeros of x. Thus, the total number of unknown bits of x, which we shall call the entropy of x, is exactly

eq02.gif

Given h, hx and the leakage for x, the computational effort for finding x grows exponentially with Hw(x). The value Hw(x) depends on x, on the window size w, and on the algorithm fw used for the encoding. We are thus interested in estimating statistics on Hw(x) for secrets x of n bits.

To this end, we model the encoding function fw as a Markov process: Model x as a continuous stream of independent and uniformly distributed bits, for any fixed w the leakage of the encoding fw is described by a finite state machine (see the full version7,8 for a detailed description), which we then convert to a Markov chain.

Using the Markov chain central limit theorem, we compute that as n → ∞ the entropy Hw(x) tends to a normal variate of mean and variance 2, where μ and σ2 are the "per bit" mean and variance tabulated in Table 1. For example, for w = 2, the average per-bit entropy is μ = 7/24, meaning that, given a uniformly sampled n+1-bits exponent and its leakage c1, …, cL+1, on average 7n/24 bits remain to be found.

t1.jpg
Table 1. Per bit statistics of the entropy Hw(x) as n → ∞ for different values of the window width w.

Of course, these asymptotic approximations only give a rough estimate of the actual probability distribution of Hw(x) for a given bit length n, however, in our experiments we found them to be quite accurate in practice even for small values of n. Using these approximations, we will give rather precise estimates of the computational effort needed to recover random secret keys generated by gcrypt and other libraries.

* 5.2. Key recovery

To assess the practical impact of the leakage of sliding window exponentiation exposed so far we need to look into the exact parameters used by the various libraries. Recall that gcrypt uses short secret exponents. The window size used for exponentiation goes from 1 to 5, depending on the bit size of the exponent. Table 2 summarizes the parameters adopted by gcrypt for some common modulus sizes.

t2.jpg
Table 2. Group (|q|) and secret exponent (|x|) bit sizes, and window size (w) used by gcrypt for each modulus size (|p|). E(Hw(x)) is the mean leakage entropy estimated using Table 1. "Pollard" indicates the (base 2 log of the) expected running time of Pollard's Rho algorithm in a group of size q, as a number of modular multiplications. "vOW" indicates the expected running time of van Oorschot and Wiener's algorithm using a table of 260 entries.

Based on gcrypt's parameters, Table 2 also reports on the mean leakage entropy Hw(x) as deduced from Table 1. We can thus compare the expected difficulty of recovering the secret exponent via the leakage against the target difficulty of solving discrete logarithms directly. gcrypt's moduli are parameterized so that the best algorithm for computing discrete logarithms is, indifferently, index calculus in ℤp, or Pollard's Rho in the subgroup of largest prime order q. For simplicity, we use the cost of Pollard's Rho, namely cacm6606_x.gif, as our reference difficulty.e

To exploit the leakage, we resort to vOW parallel collision search (see Section 3), which costs 7.23Hw(x)/4-m/2-1Hw(x) modular multiplications, where 2m is the amount of RAM available, counted as a number of hash table entries (e.g., for |p| = 2048 an entry occupies slightly more than 256 bytes). Both Pollard's Rho and vOW can be parallelized with a linear speed-up (Section 5.1–3 of van Oorschot and Wiener22), justifying the comparison.

The last two columns of Table 2 report the base 2 logarithm of the expected efforts of Pollard's Rho, and of vOW assuming a hash table with at most 260 entries (note that even this "little" memory is quite unrealistic). Comparing the columns, the leakage of windowed exponentiation does not appear to be a serious threat for the average gcrypt secret key. However this assessment is deceiving on two accounts: (a) the standard deviation of Hw(x) is rather large, and thus some keys will be much easier to break than others; and (b) gcrypt is assumed to be secure even when it operates on secret keys generated by a different library, as long as these are valid OpenPGP keys. Some libraries, such as Crypto++, generate secret exponents with as few as |q| bits, their keys are thus easier targets.

Taking for reference the most commonly used modulus size |p| = 2048, Crypto++ exponents are between 1 and 226 bits long, and the window size used by gcrypt is at most w = 3, leading to an average entropy of 98.0 bits. Worse still, more than one in 10,000 keys is expected to have at most 80 bits of leakage entropy: for such a small search space, vOWis expected to only require 250 modular multiplications (roughly 235 Gcycles in software) for a hash table of 235 entries (roughly 4 TiB). We thus conclude that attacking a non-negligible proportion of Crypto++-generated public keys within gcrypt's decryption routine is well within reach of a moderately resourceful adversary.f

* 5.3. Proof of concept

To confirm the reality of the threat, without going as far as spending several thousands of dollars, we implement a simple parallelized BSGS using GMP 6.1.2, and test it on a node equipped with 20 Xeon E5-2640 cores clocked at 2.40GHz and 64GiB of RAM. Due to memory contention, the parallel speed-up is sublinear, nevertheless, we observe a speed-up slightly above 10x using all 20 cores, thus justifying our preference for BSGS over the more complex vOW algorithm which is marred by larger constants. Our implementation uses hash table entries of 16 bytes, independently of the modulus size; the node can thus accommodate for tables with as many as 232 entries, ensuring the running time scales like the square root of the entropy, up to 64 bits of entropy. Figure 3 plots the running time of BSGS for increasing values of Hw(x), and super-imposes the cumulative distribution function of Hw(x) for Crypto++ and gcrypt, indicating what percentage of keys is broken in a given time. Since BSGS is deterministic, we use for the benchmarks the exponent that is tested by BSGS last, thus the expected running time for an average exponent of given entropy is half the time reported in Figure 3.

f3.jpg
Figure 3. Running time of BSGS for increasing values of the entropy Hw(x), on our node equipped with 20 cores and 64GiB of RAM. The dashed line indicates extrapolated running times. Also represented the cumulative distribution functions of the normal distributions of parameters (225 · 7/16, 225 · 0.098632) and (339 · 341/640, 339 · 0.126731), approximately corresponding to the entropy distributions of Crypto++ and gcrypt for a modulus of 2,048 bits.

As a final confirmation, we instrument the Crypto++ code to generate random secret exponents in a tight loop, and output them along with their leakage entropy. Over 228 samples, we obtained a 226 bits exponent with 62 bits of leakage entropy. Our probing strategy described in Section 6 produces the expected leakage pattern, from which our BSGS code recovers the exponent in approximately 30 min using 20 cores.

Back to Top

6. Practical Exploitation

In this section, we prototype the attacks described in Section 5 to show that practical exploitation is possible. The attacks are mounted on a Core i7-8650U CPU with Ubuntu 18.04.5. We demonstrate how FLUSH+RELOAD may be employed to extract leakage from the sliding window modular exponentiation implementation of gcrypt and use that to obtain full key recovery. The exploitation of gcrypt presents interesting aspects for two reasons: (i) the library was patched to protect its modular exponentiation routine against cache-based side channels attacks highlighted by Yarom and Falkner23 and by Liu et al.14; and (ii) the attack can be conducted on commodity hardware specifically owing to the ElGamal interoperability issues highlighted in Section 2.

* 6.1. Threat model

We consider a victim process that uses a private key to decrypt ElGamal ciphertexts. We assume the attacker has knowledge of the victim program and of the specific decryption algorithm that is used. We also assume that the attacker may cause the victim process to execute, for example, because the victim uses a PGP-enabled mail client that automatically decrypts emails upon reception. Consequently, we assume that the attacker is able to synchronize side channel measurements with the victim execution. No physical access is required to perform the attacks. Given that we target cache-side channels, the attacker may either be local, or it may be running on a different, colocated, VM. The former may target any cache level, the latter may only target LLC (unless the VM is hyperthread-colocated). Wlog we conduct the exploitation against the L1 cache: as shown by Liu et al.,14 the same exploitation may be achieved against higher cache levels.

* 6.2. Environment

We target the function _gcry_mpi_powm of the latest version gcrypt shipped by Ubuntu 18.04.5 LTS. The function implements the approach described in Figure 2. The implementation takes a few precautions to avoid leaking sensitive data and instruction cache accesses. In particular, it hides (i) whether the operation at line 8 is a modular multiplication or squaring; and (ii) whether, at line 7, W is initialized with R or with an entry from table T (and in that case, it also hides which position ei of the table is loaded). This is accomplished by way of a conditional memcpy-like operation (mpi_set_cond) that—depending on whether a flag argument is one or zero—copies a source operand into a destination operand, or leaves the destination operand unchanged. Masking is employed to ensure that an attacker cannot learn the nature of the operation by observing the control flow or changes to data or instruction cache. Furthermore, the flag is set using branchless operations.

Despite these precautions, considerable leakage can still be obtained by an attacker who can observe control flow-dependent cache perturbations: in particular, the full set of values c1, …, cL can be recovered by counting the number of iterations of the for the loop at line 6 for each of the iterations of the loop at line 4. The attacker may obtain this information by monitoring the state of the instruction cache line that contains code that is executed as part of an iteration of the outer loop and not of the inner loop. In practice, this can be achieved with the FLUSH+RELOAD side channel applied on a line of the instruction cache that is executed once per outer loop before the start of the inner one. The probe reveals when such a cache line is executed, and the time between two probes would depend on the execution time of the inner for loop. Crucially, the latter depends on the set of values c1, …, cL, which would thus be leaked. This strategy is applicable to the _gcry_mpi_powm function: the implementation inlines the logic to determine the ci values between the start of the first and the start of the second loop and so it fills more than one cache line; the invocation of the multiplication in the inner loop makes one iteration sufficiently long and constant-time so that inter-probe timings measurably encode the number of iterations of the inner loop.

* 6.3. Evaluation

To validate that the assumptions are correct, we prototype an end-to-end attack in the SMT-colocated threat model. The prototype uses two SMT-colocated processes, attacker and victim. The victim process may be triggered by the attacker to perform ElGamal decryption. We use Crypto++ to generate a 2048-bit ElGamal instance. The attacker process uses L1i FLUSH+RELOAD on the virtual address of the memory-mapped libgcrypt.so a shared object that contains code that is executed by the outer loop before the inner loop begins. To collect side channel data, the attacker partitions time into N slots using the rdtsc instruction: at the beginning of each slot, the target line is loaded into the cache and the access time is measured; after that, the cache line is flushed and finally, the remaining (if any) time of the slot is spent in a busy loop. As we shall discuss later, we do not have to change any of the settings that might influence the running time of the operation (e.g., c/p states or Turbo Boost). The attacker thus collects N timing samples for load instructions from the target cache line. We establish a threshold for the target system to determine whether the load is likely to have been served from the cache (cache hit). We then collect 10,000 measurements by repeatedly triggering the victim. We keep track of each slot across all runs with a set of per-slot counters, all initialized to zero. For each run, whenever the sample of a slot indicates a hit based on the threshold, we increment the value of the corresponding counter.

Figure 4a plots the value of the counter of each slot for all runs. While patterns are discernible, the leakage cannot be obtained yet without further data processing. We employ a clustering strategy based on execution time, under the hypothesis that Figure 4a contains samples that are generated at different clock frequencies. Influencing factors are likely to include p-states, c-states, frequency scaling, and the power state of certain execution units. Figure 4b confirms the hypothesis: it plots a histogram of the running times for the operation. The distribution of running times has a long tail, but most of the mass is concentrated in the three peaks. We use this information to cluster samples and show the results of the clustering in Figures 4c and 4d, showing probes whose running time falls in the interval covered by the highest, and second highest peak, respectively. The peak intervals in Figures 4c and 4d accurately encode the ci leakage for the key used in the exponentiation. Figures 4c and 4d are identical modulo a scaling factor which depends on the difference in running time. With the leakage thus obtained, we refer back to Section 5 for a detailed description of how the secret exponent is fully recovered.

f4.jpg
Figure 4. FLUSH+RELOAD attack using 10,000 decryption operations with gcrypt's modular exponentiation function. a): number of samples with an icache hit on the target cache line across all runs. b): histogram of the running times (tsc count) for the operation. c) as a) but only considering samples whose total running time lies in the [2.25 · 106, 2.31 · 106] interval. d) as a) but only considering samples whose total running time lies in the [2.2 · 106, 2.25 · 106] interval.

Back to Top

7. Conclusion

We analyze one of the oldest, best understood and most historically used asymmetric encryption schemes, ElGamal.9 We reveal that despite its popularity and longevity, when we speak about ElGamal we are referring to several different flavors of it, with key choices being left at the discretion of vendors and implementers. We show how some of these choices create interoperability challenges that lead to insecurity. We propose two "cross-configuration" attacks that are attributable to different, and—from a security standpoint—incompatible, configurations that operate together in the interconnected OpenPGP ecosystem. We believe our work can improve the security of the OpenPGP community and influence the new revision of the standard that is being drafted at the time of writing.

Back to Top

Acknowledgments

The authors thank Werner Koch, Anil Kurmus, Yutaka Niibe, and Filippo Valsorda for the discussions and feedback, and the CCS'21 reviewers for their comments that helped us improve the paper.

Back to Top

References

1. OpenPGP server key dump from 15/01/2021. https://pgp.key-server.io/dump/, 2021. [Accessed 15/01/2021].

2. Aciiçmez, O., Koç, Ç., Seifert, J.-P. Predicting secret keys via branch prediction. In CT-RSA 2007. M. Abe, ed. Volume 4377 of LNCS (2007). Springer, Heidelberg, 225–242.

3. Apecechea, G.I., Inci, M.S., Eisenbarth, T., Sunar, B. Wait a minute! A fast, cross-VM attack on AES. IACR Cryptology ePrint Archive, Report 2014/435 2014, https://eprint.iacr.org/2014/435.

4. Bernstein, D.J. Cache-timing attacks on AES. Technical report, 2005.

5. Boneh, D., Joux, A., Nguyen, P.Q. Why textbook ElGamal and RSA encryption are insecure. In ASIACRYPT 2000. T. Okamoto, ed. Volume 1976 of LNCS (2000). Springer, Heidelberg, 30–43.

6. Callas, J., Donnerhacke, L., Finney, H., Shaw, D., Thayer, R. RFC4880: OpenPGP message format, 2007.

7. De Feo, L., Poettering, B., Sorniotti, A. On the (in)security of ElGamal in OpenPGP. In ACM CCS 2021, G. Vigna and E. Shi, eds. ACM, NY, 2021, 2066–2080.

8. De Feo, L., Poettering, B., Sorniotti, A. On the (in)security of ElGamal in OpenPGP. Cryptology ePrint Archive, Report 2021/923, 2021. https://eprint.iacr.org/2021/923.

9. ElGamal, T. A public key cryptosystem and a signature scheme based on discrete logarithms. In CRYPTO'84. G.R. Blakley and D. Chaum, eds. Volume 196 of LNCS (1984). Springer, Heidelberg, 10–18.

10. Genkin, D., Pachmanov, L., Pipman, I., Tromer, E. Stealing keys from PCs using a radio: Cheap electromagnetic attacks on windowed exponentiation. In CHES 2015. T. Güneysu and H. Handschuh, eds. Volume 9293 of LNCS (2015). Springer, Heidelberg, 207–228.

11. Hachez, G., Quisquater, J.-J. Montgomery exponentiation with no final subtractions: Improved results. In CHES 2000. Ç.K. Koç and C. Paar, eds. Volume 1965 of LNCS (2000). Springer, Heidelberg, 293–301.

12. Lenstra, H.W. Factoring integers with elliptic curves. Ann. Math. 126, (1987), 649–673.

13. Lim, C.H. Lee, P.J. A key recovery attack on discrete log-based schemes using a prime order subgroup. In CRYPTO'97. B.S. Kaliski Jr., ed. Volume 1294 of LNCS (1997). Springer, Heidelberg, 249–263.

14. Liu, F., Yarom, Y., Ge, Q., Heiser, G., Lee, R.B. Last-level cache side-channel attacks are practical. In 2015 IEEE Symposium on Security and Privacy, SP 2015 (San Jose, CA, USA, May 17–21, 2015), IEEE Computer Society, NY, 2015, 605–622.

15. Messerges, T.S., Dabbish, E.A., Sloan, R.H. Power analysis attacks of modular exponentiation in smartcards. In CHES'99. Ç.K. Koç and C. Paar, eds. Volume 1717 of LNCS (1999). Springer, Heidelberg, 144–157.

16. Pohlig, S., Hellman, M. An improved algorithm for computing logarithms over GF(p) and its cryptographic significance. IEEE Trans. Inf. Theor. 24, 1 (Jan. 1978), 106–110.

17. Pollard, J.M. Monte Carlo methods for index computation mod p. Math. Comput. 32 (1978), 918–924.

18. Schnorr, C.-P. Efficient identification and signatures for smart cards. In, CRYPTO'89. G. Brassard, ed. Volume 435 of LNCS (1990). Springer, Heidelberg, 239–252.

19. Shanks, D. Class number, a theory of factorization, and genera. Proc. Symp. Pure Math. 20 (1971), 41–440.

20. Tromer, E., Osvik, D.A., Shamir, A. Efficient cache attacks on AES, and countermeasures. J. Cryptol. 23, 1 (2010), 37–71.

21. van Oorschot, P.C., Wiener, M.J. On Diffie-Hellman key agreement with short exponents. In EUROCRYPT'96. U.M. Maurer, ed. Volume 1070 of LNCS (1996). Springer, Heidelberg, 332–343.

22. van Oorschot, P.C. Wiener, M.J. Parallel collision search with cryptanalytic applications. J. Cryptol. 12, 1 (Jan. 1999), 1–28.

23. Yarom, Y., Falkner, K. FLUSH+RELOAD: A high resolution, low noise, L3 cache side-channel attack. In Proceedings of the 23rd USENIX Security Symposium (San Diego, CA, USA, August 20–22, 2014). K. Fu and J. Jung, eds. (2014). USENIX Association, Berkeley, CA, 719–732.

Back to Top

Authors

Luca De Feo ([email protected]), IBM Research Europe, Zurich, Switzerland.

Bertram Poettering ([email protected]), IBM Research Europe, Zurich, Switzerland.

Alessandro Sorniotti ([email protected]), IBM Research Europe, Zurich, Switzerland.

Back to Top

Footnotes

a. Every x ≥ 1 has at least one, but often multiple, ODRs with leading coefficient 1.

b. Precisely, line 7 implements the instruction "If jci then WR; else, if j = ci + 1, then WT[ei]" but ensures it neither leaks which if-branch is taken nor what the value of ei is.

c. This argument assumes multiplications and squarings require uniform time. This is the case for the gcrypt routines.

d. The performance gain comes from the possibility of removing some of the decoy code required for securely implementing line 7. For further details, see the source code.

e. Since in practice q is unknown to an attacker, this ignores the cost of factoring p − 1 first, but accounting for it would only make the side channel attack look better.

f. At the time of writing, 8 Amazon EC2 r6g.metal instances with 0.5 TiB of RAM and 64 virtual CPUs each cost about 12,000$ per month.

To view the accompanying Technical Perspective, visit doi.acm.org/10.1145/3592836

The original version of this paper was published in Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, Nov. 2021, 2066–2080; https://doi.org/10.1145/3460120.3485257


Copyright held by authors/owners. Publication rights licensed to ACM.
Request permission to publish from [email protected]

The Digital Library is published by the Association for Computing Machinery. Copyright © 2023 ACM, Inc.


 

No entries found