212 lines
12 KiB
Markdown
212 lines
12 KiB
Markdown
|
# Why I have settled on XChaCha20+Blake3 as the AE suite of choice for my projects
|
||
|
|
||
|
This might get me some looks, but I have pretty solidly decided to go in on
|
||
|
using XChaCha20+Blake3 as the AE of choice for my future open source work. There
|
||
|
are numerous reasons for this decision, but it mainly comes down to the desire
|
||
|
for defense in depth, and a deep dislike of fundamental properties of polynomial
|
||
|
MACs.
|
||
|
|
||
|
## The Why Nots
|
||
|
|
||
|
### Why not just use poly1305?
|
||
|
|
||
|
The burning question, I can hear it now:
|
||
|
|
||
|
> Why not just use ChaCha20-poly1305? Its standard, and its good enough for
|
||
|
> \<insert application here>? Why roll your own crypto?
|
||
|
|
||
|
The reason for this gets at a fundamental property of polynomial MACs[^1] that
|
||
|
opens them up to all kinds of fun attacks, like the partitioning oracle[^2], you
|
||
|
didn't even know you needed to worry about. Polynomial MACs violate intuitive
|
||
|
expected properties of cryptosystems, and are so finicky to work with that even
|
||
|
the specification for AES-GCM contained invalid proofs of its security[^3].
|
||
|
|
||
|
There is a not-infrequently important property of cryptosystems, referred to as
|
||
|
"commiting to their keys", or simply "being commiting"[^4]. Simply put, a system
|
||
|
has this property when it is impractical to produce a message that will
|
||
|
successfully decrypt or verify under multiple chosen keys. As a fundamental
|
||
|
result of their construction, polynomial MACs lack this property, and its even
|
||
|
worse than simply being able to construct a message that will decrypt under two
|
||
|
chosen keys. Even for the more well behaved polynomial MACs, such as Poly1305,
|
||
|
it is not only practical to construct garden variety collisions, it is practical
|
||
|
to construct multi-collisions that will decrypt under a large number of chosen
|
||
|
keys[^5] .
|
||
|
|
||
|
This lack of even a facade of collision resistance can actually turn into quite
|
||
|
a big deal, just as Mega[^6] famously experienced a loss of security due to
|
||
|
their inappropriate use of CBC-MAC (another type of MAC that also lacks
|
||
|
collision resistance) polynomial MACs, in practice, worm their way into all
|
||
|
kinds of places they shouldn't be, places where a collision resistant MAC would
|
||
|
have been the only appropriate choice. Even in the context of more seemingly
|
||
|
sane systems, this lack of collision resistance can cause no end of problems,
|
||
|
such as the partitioning oracle attack [^1], which was practically exploitable
|
||
|
for over the network password recovery in Shadowsocks, and the always
|
||
|
interesting Invisible Salamander[^7].
|
||
|
|
||
|
These issues alone are, personally, enough to turn me off from using poly1305 or
|
||
|
any other polynomial MACs in any of my future projects, given that it's 2021 and
|
||
|
hashes are fast now, but, it is my opinion that polynomial MACs being
|
||
|
non-collision resistant represents a huge foot gun, and as I am not a fan of
|
||
|
handing foot guns to the consumers of my code, I will refrain from offering this
|
||
|
one, and provide a committing, collision resistant scheme instead.
|
||
|
|
||
|
### Why not use AES
|
||
|
|
||
|
This one should be a little bit easier to answer, its not exactly a hot take to
|
||
|
say that AES is a lot closer to the bottom of the acceptable ciphers barrel than
|
||
|
ChaCha20[^8]. Even discounting the standard complaints, like how AES's
|
||
|
construction is somewhat-intrinsically vulnerable to cache based timing attacks
|
||
|
when implemented in software[^9] (though bit-slicing and other tricks mitigate
|
||
|
this, they are surprisingly rare to see in the wild, even in software running on
|
||
|
devices that lack hardware accelerated AES), and how the block size is too
|
||
|
small[^10], AES still has other concerning factors in its corner, like how the
|
||
|
PRP-PRF distinguishing attack reduces the effective security of AES in counter
|
||
|
mode.
|
||
|
|
||
|
AES just, all around, isn't ideal. Sure, ChaCha is naturally vulnerable to
|
||
|
electromagnetic side channel attacks[^12], which have been an item of, ehm,
|
||
|
_increasing concern_ for me[^13], AES is just as vulnerable in that regard, with
|
||
|
even side-channel resistant hardware implementations being a rare thing, and for
|
||
|
most practical purposes, only existing in the literature[^14]. ChaCha20's much
|
||
|
larger block size (512-bit) and more timing side channel resistant construction
|
||
|
just lends itself to fewer foot guns, and while hardware assisted AES is a bit
|
||
|
faster, I don't think there is enough of a speed differential to warrant the
|
||
|
loss of misuse resistance in most applications.
|
||
|
|
||
|
### Why not Panic
|
||
|
|
||
|
This is not intended to be an indictment of any particular protocol, while the
|
||
|
issues I've brought up are real, and can result in vulnerabilities in the real
|
||
|
world, none of them are automatically going to lead to an exploitable
|
||
|
vulnerability. The people making the systems you rely on to keep your private
|
||
|
data safe by and large understand the state of the research on the matter, and
|
||
|
are careful to design systems to avoid the foot guns. Your AES-GCM or
|
||
|
ChaCha20-Poly1305 TLS stream is not in immediate peril.
|
||
|
|
||
|
Cryptographic primitives don't exist in a vacuum, any analysis of
|
||
|
vulnerabilities must be done on complete systems, as decisions on any level
|
||
|
above the choice of fundamental primitives may sink _or_ save the ship[^15].
|
||
|
|
||
|
My interest in writing this post is in informing the design of new systems,
|
||
|
where I much prefer foot guns be avoided categorically instead of case-by-case,
|
||
|
when practical, and I believe the time has come where the use of cryptosystems
|
||
|
that categorically avoid these foot guns is practical in all but niche edge case
|
||
|
scenarios.
|
||
|
|
||
|
## The Whys
|
||
|
|
||
|
### Why use XChaCha20
|
||
|
|
||
|
ChaCha20 has a number of benefits over AES, and I don't really feel the need to
|
||
|
go _too_ deep into them, but lets give the overview. ChaCha20 is a stream
|
||
|
cipher, but it acts like a block cipher being used in CTR mode. In this respect,
|
||
|
it has two major advantages over AES-CTR:
|
||
|
|
||
|
1. The block size is 512 bits, compared to AES's 128 bits, making a wide variety
|
||
|
of attacks, such as birthday bound or PRP-PRF distinguishing attacks _many_
|
||
|
orders of magnitude less practical to pull of against ChaCha
|
||
|
1. The nonce is provided to the underlying cipher independently from the stream
|
||
|
position. This has the nice practical effect of making accidental nonce-reuse
|
||
|
_that_ much harder, since while a given AES-CTR stream consumes a range of
|
||
|
nonces, a given ChaCha stream consumes only one
|
||
|
|
||
|
ChaCha20 also has other advantages, like being more energy efficient when
|
||
|
dedicated hardware isn't available, and being naturally resistant to timing side
|
||
|
channel attacks. The only real disadvantage is that it is slower than commonly
|
||
|
available hardware accelerated AES, but not by an amount that I think makes a
|
||
|
critical difference for all but the pickiest of applications.
|
||
|
|
||
|
The choice of XChaCha20 over that of plain ChaCha20 is also easy to explain.
|
||
|
ChaCha20 only gives you a 64 bit nonce, which, while enough to encrypt the world
|
||
|
with a given key, means that choosing a nonce at random can be incredibly
|
||
|
dangerous, even AES-GCM's 96-bit nonce is much too small, leading to concrete
|
||
|
failures in production[^16], and even the 128-bit nonce of straight AES-CTR
|
||
|
leaves me feeling uneasy, the birthday bound there is a lot lower than it feels
|
||
|
like it is. XChaCha20's choice of a 192 bit nonce, however, leaves plenty of
|
||
|
headroom to feel safe, even when randomly selecting nonces, taking another foot
|
||
|
gun out of the equation.
|
||
|
|
||
|
### Why use Blake3
|
||
|
|
||
|
This is another easy one to answer. The biggest reason, is, well Blake3 is
|
||
|
_fast_, several times faster than ChaCha20 on my machine, and having an HMAC
|
||
|
that can out run your cipher several times over is always handy when building
|
||
|
HMAC based cryptosystems. In my opinion having a hash, like Blake3, that's truly
|
||
|
_fast_ changes the trade off analysis, making it much less worth the trade off
|
||
|
of using a polynomial MAC.
|
||
|
|
||
|
Blake3 also has some other nice properties, its a modern, secure,
|
||
|
length-extension resistant hash based on a merkle tree construction, which
|
||
|
provides lots of other side benefits, such as the ability to use the hash for
|
||
|
verified streaming. It additionally has the nice property of not _really_ having
|
||
|
variants, with the HMAC, KDF, hash, and XOF modes all operating in a very
|
||
|
similar and consistent manner, easing the cognitive load on the programmer.
|
||
|
|
||
|
Being effectively an HMAC, Blake3 also provides a scheme that commits to its
|
||
|
keys, almost for free as a side effect of collision resistance, directly
|
||
|
sidestepping my concerns about polynomial MACs.
|
||
|
|
||
|
## Conclusion
|
||
|
|
||
|
I believe that using XChaCha20-Blake3 in the encrypt-then-hmac construction
|
||
|
provides a more-than-fast-enough base to build more than useable cryptosystems
|
||
|
on top of, and acts as a primitive for doing so that presents substantially
|
||
|
fewer foot guns than other competing options, while still remaining more than
|
||
|
useably fast, even in the absence of dedicated hardware acceleration. Humans are
|
||
|
such fallible creatures, and accidental de-footings are _much_ less likely to
|
||
|
happen when we are provided fewer chances to do so.
|
||
|
|
||
|
[^1]: A family of very closely related Message Authentication Codes, including
|
||
|
AES-GCM's [GHASH](https://en.wikipedia.org/wiki/Galois/Counter_Mode),
|
||
|
AES-GCM-SIV's Polyval, and
|
||
|
[Poly1305](https://en.wikipedia.org/wiki/Poly1305)
|
||
|
|
||
|
[^2]: Julia Len, Paul Grubbs and Thomas Ristenpart:
|
||
|
[Partitioning Oracle Attacks (USENIX '21)](https://www.usenix.org/conference/usenixsecurity21/presentation/len)
|
||
|
|
||
|
[^3]: Tetsu Iwata, Keisuke Ohashi, Kazuhiko Minematsu:
|
||
|
[Breaking and Repairing GCM Security Proofs (CRYPTO 2012)](https://www.iacr.org/cryptodb/data/paper.php?pubkey=24296)
|
||
|
|
||
|
[^4]: [OPAQUE protocol specification section discussing this](https://datatracker.ietf.org/doc/html/draft-krawczyk-cfrg-opaque-03#section-3.1.1)
|
||
|
|
||
|
[^5]: Kryptos Logic:
|
||
|
[Faster Poly1305 key multicollisons](https://www.kryptoslogic.com/blog/2021/01/faster-poly1305-key-multicollisions/)
|
||
|
|
||
|
[^6]: [Megafail](https://fail0verflow.com/blog/2013/megafail/)
|
||
|
|
||
|
[^7]: Yevgeni Dodis, Paul Grubbs, Thomas Ristenpart, Joanne Woodage:
|
||
|
[Fast Message Franking: From Invisible Salamanders to Encryptment (CRYPTO 2018)](https://eprint.iacr.org/2019/016.pdf)
|
||
|
|
||
|
[^8]: Source: My Ass
|
||
|
|
||
|
[^9]: Daniel J. Bernstein:
|
||
|
[Cache-timing attacks on AES](https://cr.yp.to/antiforgery/cachetiming-20050414.pdf)
|
||
|
|
||
|
[^10]: Even AES256 has a block size of 128 bits, which means you can start
|
||
|
expecting collisions after 2^64 encryptions. This sounds like a lot, and the
|
||
|
exact details of how this can cause a mode of operation to break down depend
|
||
|
on the particular mode, but that's a not an unachievable amount of data, the
|
||
|
internet does that much every few months (assuming 1 block per encryption),
|
||
|
and the security can start to break down well before the 2^64 mark.
|
||
|
[https://www.youtube.com/watch?v=v0IsYNDMV7A](https://www.youtube.com/watch?v=v0IsYNDMV7A)
|
||
|
|
||
|
[^12]: Alexandre Adomnicai, Jacques J.A. Fournier, Laurent Masson:
|
||
|
[Bricklayer Attack: A Side-Channel Analysis on the ChaCha Quarter Round](https://eprint.iacr.org/2017/1021.pdf)
|
||
|
|
||
|
[^13]: Giovanni Camurati, Sebastian Poeplau, Marius Muench, Tom Hayes, Aurélien
|
||
|
Francillon:
|
||
|
[Screaming Channels: When Electromagnetic Side Channels Meet Radio Transceivers](https://www.s3.eurecom.fr/docs/ccs18_camurati_preprint.pdf)
|
||
|
|
||
|
[^14]: Raghavan Kumar, Vikram Suresh, Monodeep Kar, et al:
|
||
|
[A 4900-μm2 839-Mb/s Side-Channel Attack-Resistant AES-128 in 14-nm CMOS With Heterogeneous Sboxes, Linear Masked MixColumns, and Dual-Rail Key Addition](https://ieeexplore.ieee.org/document/8952789)
|
||
|
|
||
|
[^15]: One good example of it _saving_ the ship is HMAC-MD5. While the MD5 hash has
|
||
|
been blown pretty widely open, the common HMAC construction gets many of its
|
||
|
security guarantees from the first-preimage resistance of the underlying
|
||
|
hash, and as there are no known practical first preimage attacks against
|
||
|
MD5, HMAC-MD5 remains largely secure (please do not use it for new systems
|
||
|
though)
|
||
|
|
||
|
[^16]: Hanno Böck, Aaron Zauner, Sean Devlin, Juraj Somorovsky, Phillip Jovanovic:
|
||
|
[Nonce-Disrespecting Adversaries: Practical](https://eprint.iacr.org/2016/475.pdf)
|