website/projects/Markdown/CryptoSuite.md

# Why I have settled on XChaCha20+Blake3 as the AE suite of choice for my projects

This might get me some looks, but I have pretty solidly decided to go in on
using XChaCha20+Blake3 as the AE of choice for my future open source work. There
are numerous reasons for this decision, but it mainly comes down to the desire
for defense in depth, and a deep dislike of fundamental properties of polynomial
MACs.

## The Why Nots

### Why not just use poly1305?

The burning question, I can hear it now:

> Why not just use ChaCha20-poly1305? Its standard, and its good enough for
> \<insert application here>? Why roll your own crypto?

The reason for this gets at a fundamental property of polynomial MACs[^1] that
opens them up to all kinds of fun attacks, like the partitioning oracle[^2], you
didn't even know you needed to worry about. Polynomial MACs violate intuitive
expected properties of cryptosystems, and are so finicky to work with that even
the specification for AES-GCM contained invalid proofs of its security[^3].

There is a not-infrequently important property of cryptosystems, referred to as
"commiting to their keys", or simply "being commiting"[^4]. Simply put, a system
has this property when it is impractical to produce a message that will
successfully decrypt or verify under multiple chosen keys. As a fundamental
result of their construction, polynomial MACs lack this property, and its even
worse than simply being able to construct a message that will decrypt under two
chosen keys. Even for the more well behaved polynomial MACs, such as Poly1305,
it is not only practical to construct garden variety collisions, it is practical
to construct multi-collisions that will decrypt under a large number of chosen
keys[^5] .

This lack of even a facade of collision resistance can actually turn into quite
a big deal, just as Mega[^6] famously experienced a loss of security due to
their inappropriate use of CBC-MAC (another type of MAC that also lacks
collision resistance) polynomial MACs, in practice, worm their way into all
kinds of places they shouldn't be, places where a collision resistant MAC would
have been the only appropriate choice. Even in the context of more seemingly
sane systems, this lack of collision resistance can cause no end of problems,
such as the partitioning oracle attack [^1], which was practically exploitable
for over the network password recovery in Shadowsocks, and the always
interesting Invisible Salamander[^7].

These issues alone are, personally, enough to turn me off from using poly1305 or
any other polynomial MACs in any of my future projects, given that it's 2021 and
hashes are fast now, but, it is my opinion that polynomial MACs being
non-collision resistant represents a huge foot gun, and as I am not a fan of
handing foot guns to the consumers of my code, I will refrain from offering this
one, and provide a committing, collision resistant scheme instead.

### Why not use AES

This one should be a little bit easier to answer, its not exactly a hot take to
say that AES is a lot closer to the bottom of the acceptable ciphers barrel than
ChaCha20[^8]. Even discounting the standard complaints, like how AES's
construction is somewhat-intrinsically vulnerable to cache based timing attacks
when implemented in software[^9] (though bit-slicing and other tricks mitigate
this, they are surprisingly rare to see in the wild, even in software running on
devices that lack hardware accelerated AES), and how the block size is too
small[^10], AES still has other concerning factors in its corner, like how the
PRP-PRF distinguishing attack reduces the effective security of AES in counter
mode.

AES just, all around, isn't ideal. Sure, ChaCha is naturally vulnerable to
electromagnetic side channel attacks[^12], which have been an item of, ehm,
_increasing concern_ for me[^13], AES is just as vulnerable in that regard, with
even side-channel resistant hardware implementations being a rare thing, and for
most practical purposes, only existing in the literature[^14]. ChaCha20's much
larger block size (512-bit) and more timing side channel resistant construction
just lends itself to fewer foot guns, and while hardware assisted AES is a bit
faster, I don't think there is enough of a speed differential to warrant the
loss of misuse resistance in most applications.

### Why not Panic

This is not intended to be an indictment of any particular protocol, while the
issues I've brought up are real, and can result in vulnerabilities in the real
world, none of them are automatically going to lead to an exploitable
vulnerability. The people making the systems you rely on to keep your private
data safe by and large understand the state of the research on the matter, and
are careful to design systems to avoid the foot guns. Your AES-GCM or
ChaCha20-Poly1305 TLS stream is not in immediate peril.

Cryptographic primitives don't exist in a vacuum, any analysis of
vulnerabilities must be done on complete systems, as decisions on any level
above the choice of fundamental primitives may sink _or_ save the ship[^15].

My interest in writing this post is in informing the design of new systems,
where I much prefer foot guns be avoided categorically instead of case-by-case,
when practical, and I believe the time has come where the use of cryptosystems
that categorically avoid these foot guns is practical in all but niche edge case
scenarios.

## The Whys

### Why use XChaCha20

ChaCha20 has a number of benefits over AES, and I don't really feel the need to
go _too_ deep into them, but lets give the overview. ChaCha20 is a stream
cipher, but it acts like a block cipher being used in CTR mode. In this respect,
it has two major advantages over AES-CTR:

1. The block size is 512 bits, compared to AES's 128 bits, making a wide variety
   of attacks, such as birthday bound or PRP-PRF distinguishing attacks _many_
   orders of magnitude less practical to pull of against ChaCha
1. The nonce is provided to the underlying cipher independently from the stream
   position. This has the nice practical effect of making accidental nonce-reuse
   _that_ much harder, since while a given AES-CTR stream consumes a range of
   nonces, a given ChaCha stream consumes only one

ChaCha20 also has other advantages, like being more energy efficient when
dedicated hardware isn't available, and being naturally resistant to timing side
channel attacks. The only real disadvantage is that it is slower than commonly
available hardware accelerated AES, but not by an amount that I think makes a
critical difference for all but the pickiest of applications.

The choice of XChaCha20 over that of plain ChaCha20 is also easy to explain.
ChaCha20 only gives you a 64 bit nonce, which, while enough to encrypt the world
with a given key, means that choosing a nonce at random can be incredibly
dangerous, even AES-GCM's 96-bit nonce is much too small, leading to concrete
failures in production[^16], and even the 128-bit nonce of straight AES-CTR
leaves me feeling uneasy, the birthday bound there is a lot lower than it feels
like it is. XChaCha20's choice of a 192 bit nonce, however, leaves plenty of
headroom to feel safe, even when randomly selecting nonces, taking another foot
gun out of the equation.

### Why use Blake3

This is another easy one to answer. The biggest reason, is, well Blake3 is
_fast_, several times faster than ChaCha20 on my machine, and having an HMAC
that can out run your cipher several times over is always handy when building
HMAC based cryptosystems. In my opinion having a hash, like Blake3, that's truly
_fast_ changes the trade off analysis, making it much less worth the trade off
of using a polynomial MAC.

Blake3 also has some other nice properties, its a modern, secure,
length-extension resistant hash based on a merkle tree construction, which
provides lots of other side benefits, such as the ability to use the hash for
verified streaming. It additionally has the nice property of not _really_ having
variants, with the HMAC, KDF, hash, and XOF modes all operating in a very
similar and consistent manner, easing the cognitive load on the programmer.

Being effectively an HMAC, Blake3 also provides a scheme that commits to its
keys, almost for free as a side effect of collision resistance, directly
sidestepping my concerns about polynomial MACs.

## Conclusion

I believe that using XChaCha20-Blake3 in the encrypt-then-hmac construction
provides a more-than-fast-enough base to build more than useable cryptosystems
on top of, and acts as a primitive for doing so that presents substantially
fewer foot guns than other competing options, while still remaining more than
useably fast, even in the absence of dedicated hardware acceleration. Humans are
such fallible creatures, and accidental de-footings are _much_ less likely to
happen when we are provided fewer chances to do so.

[^1]: A family of very closely related Message Authentication Codes, including
    AES-GCM's [GHASH](https://en.wikipedia.org/wiki/Galois/Counter_Mode),
    AES-GCM-SIV's Polyval, and
    [Poly1305](https://en.wikipedia.org/wiki/Poly1305)

[^2]: Julia Len, Paul Grubbs and Thomas Ristenpart:
    [Partitioning Oracle Attacks (USENIX '21)](https://www.usenix.org/conference/usenixsecurity21/presentation/len)

[^3]: Tetsu Iwata, Keisuke Ohashi, Kazuhiko Minematsu:
    [Breaking and Repairing GCM Security Proofs (CRYPTO 2012)](https://www.iacr.org/cryptodb/data/paper.php?pubkey=24296)

[^4]: [OPAQUE protocol specification section discussing this](https://datatracker.ietf.org/doc/html/draft-krawczyk-cfrg-opaque-03#section-3.1.1)

[^5]: Kryptos Logic:
    [Faster Poly1305 key multicollisons](https://www.kryptoslogic.com/blog/2021/01/faster-poly1305-key-multicollisions/)

[^6]: [Megafail](https://fail0verflow.com/blog/2013/megafail/)

[^7]: Yevgeni Dodis, Paul Grubbs, Thomas Ristenpart, Joanne Woodage:
    [Fast Message Franking: From Invisible Salamanders to Encryptment (CRYPTO 2018)](https://eprint.iacr.org/2019/016.pdf)

[^8]: Source: My Ass

[^9]: Daniel J. Bernstein:
    [Cache-timing attacks on AES](https://cr.yp.to/antiforgery/cachetiming-20050414.pdf)

[^10]: Even AES256 has a block size of 128 bits, which means you can start
    expecting collisions after 2^64 encryptions. This sounds like a lot, and the
    exact details of how this can cause a mode of operation to break down depend
    on the particular mode, but that's a not an unachievable amount of data, the
    internet does that much every few months (assuming 1 block per encryption),
    and the security can start to break down well before the 2^64 mark.
    [https://www.youtube.com/watch?v=v0IsYNDMV7A](https://www.youtube.com/watch?v=v0IsYNDMV7A)

[^12]: Alexandre Adomnicai, Jacques J.A. Fournier, Laurent Masson:
    [Bricklayer Attack: A Side-Channel Analysis on the ChaCha Quarter Round](https://eprint.iacr.org/2017/1021.pdf)

[^13]: Giovanni Camurati, Sebastian Poeplau, Marius Muench, Tom Hayes, Aurélien
    Francillon:
    [Screaming Channels: When Electromagnetic Side Channels Meet Radio Transceivers](https://www.s3.eurecom.fr/docs/ccs18_camurati_preprint.pdf)

[^14]: Raghavan Kumar, Vikram Suresh, Monodeep Kar, et al:
    [A 4900-μm2 839-Mb/s Side-Channel Attack-Resistant AES-128 in 14-nm CMOS With Heterogeneous Sboxes, Linear Masked MixColumns, and Dual-Rail Key Addition](https://ieeexplore.ieee.org/document/8952789)

[^15]: One good example of it _saving_ the ship is HMAC-MD5. While the MD5 hash has
    been blown pretty widely open, the common HMAC construction gets many of its
    security guarantees from the first-preimage resistance of the underlying
    hash, and as there are no known practical first preimage attacks against
    MD5, HMAC-MD5 remains largely secure (please do not use it for new systems
    though)

[^16]: Hanno Böck, Aaron Zauner, Sean Devlin, Juraj Somorovsky, Phillip Jovanovic:
    [Nonce-Disrespecting Adversaries: Practical](https://eprint.iacr.org/2016/475.pdf)