We found two subtle yet impactful bugs in the ArkWorks library’s implementation of the Poseidon Sponge. This discovery highlights the complexity and precision required in cryptographic implementations. In this article, we’ll first explain what Poseidon Sponge is, how it works, and then delve into the two bugs we uncovered.
Poseidon is a hash function using the principle of a sponge. It can absorb
inputs (elements to hash), and squeeze
outputs (hashed elements). The values that we squeeze
depend directly on the values that we absorb
.
The inputs and outputs are field elements (they are values modulo another value). We can create a “bonus” function which absorb
elements of another field, or bytes, for example. We just have to convert this data into elements of the initial field.
Poseidon is widely used in Zero-Knowledge Proof domain (and by extension in SNARK/STARK/Blockchain domain) because of its compatibility with cryptography “types”, like finite field elements, and its efficiency.
Ed: Poseidon has also an advantage in the use. In ZKP, there is a principle named Fiat-Shamir heuristic, consisting of replacing the randomness of certain values with the hash of all previous values. It allows us to transform an interactive protocol (with random values) into a non-interactive protocol (with hashed values). To be sure that there are no flaws in the protocol using this heuristic, we must hash every previous values, and why not some parameters etc. This can be difficult to manage, we have to be sure that we forgot nothing each time we want a new hashed value etc. With the principle of sponge, we can keep it throughout the protocol without worrying about having to put all the right parameters back each time.
In terms of hashing functions, we can cite SHA-256, which is largely used for its speed and security, but in ZKP domain, we more often use finite field elements, and converting them into bits (the input of the SHA hash) is very expensive. We can also cite Pedersen wich is another hashing function used in ZKP, but, because it works with Elliptic Curves, this function may not be as efficient as Poseidon. Finally, we can cite MiMC and Rescue, two other hash sponges designed for ZKP, but both a little less efficient than Poseidon.
Poseidon is not a “classical” hash function where you hash some values, and get a fixed-size output. Using the sponge construction, we can input as many values to hash as we want, and recover as much hashed values as we want. The operation of inputting values to hash is called absorb()
, and the operation of recovering hashed values is called squeeze()
.
There are several possible constructions for a sponge, and we have, among others, the Simplex
construction, and the Duplex
construction. ArkWorks
uses the Duplex
construction, but we will explain first what is the Simplex
construction to better understand the Duplex
construction. Before entering in the details, let’s take a look on what the two constructions have in common.
In Poseidon, when we absorb
N field elements, we do not absorb
directly every element. We consider successively only a certain number of field elements (defined in parameters) that we treat, until we’ve treated them all. We call this number of elements that can be treated at once the rate
r of the sponge
.
In the same way, when we squeeze
I elements, we consider in fact blocks of r elements (and not directly the I elements).
The Poseidon sponge
contains a vector of field elements, of size r+c, called state
, and initially filled with 0. absorb
modifies this internal state
, and squeeze
outputs certain values of the state
. Between (almost) every call to absorb()
and squeeze()
, we apply a function P (permute()
) on (all) the state. This function mixes the state pseudo-randomly-like to propagate differences. The security of the sponge depends on this function P (how well it mixes).
The Simplex and the Duplex construction don’t have the same use! With the Simplex construction, once we have squeeze(), we can’t absorb() anymore. It’s a one-shot function (but we can absorb() as much as we want if it is before the squeeze()). With the Duplex construction, we can absorb() and squeeze() in the order we want, as much as we want.
In the Simplex
construction, absorb()
an input consists of
state
(terms-to-terms)permute()
on all the state
If the input length is not a multiple of the rate r, one can think to pad this input to have the right length, but we don’t talk about it here.
And squeeze()
I elements consists of
state
permute()
on all the state
In the Duplex
construction, we don’t have directly the absorb()
and squeeze()
functions. Instead, we have what we call duplex()
, that takes in parameters
absorb()
function) of length rduplex()
consists of
state
(terms-to-terms)permute()
on all the state
state
Again, if the input length is not the rate r, one can think to pad this input to have the right length, but we don’t talk about it here
We can simulate absorb()
on an input of arbitrary length by
duplex()
on each r-length input, with I=0If the input length is not a multiple of the rate r, one can think to pad this input to have the right length, but we don’t talk about it here.
Another way to manage the problem is to consider that we wait squeeze()
to pad, in the sense that if we have two (or more) absorb()
one after the other, then the values of the next absorb()
can complete the missing values of the previous absorb()
to raise r values in total, and so it is less expensive to wait the last call to absorb()
to pad (the last call to absorb()
is the one just before a call to squeeze()
, it is why it can be managed in the squeeze()
function).
We have illustrated the principle previously explained so that you have a good intuition (this is not a rigorous protocol, just an example):
We can simulate squeeze(n)
by
duplex()
with σ = Ø and I=r until there are less than r elements to squeeze
duplex()
with σ = Ø and I the remaining number of values to squeeze
We can do as for absorb()
, if we call several times in a row the squeeze()
function: instead of forgetting the excess values in the truncation step, we can stock them, and if one calls again squeeze()
just after, we first output the values that we have stored and, if needed, we apply the classic squeeze()
function with the remaining number of values.
We have illustrated the principle previously explained so that you have a good intuition (this is not a rigorous protocol, just an example):
We are coding a differential fuzzer that allows us to compare the results of protocols, implemented by different libraries. For the Poseidon
case typically, we compare the implementations of arkworks, aleo, the custom one of starknet and lambdaworks, and the one of aztec is currently being integrated.
In another project, we already find a bug in January, and this differential fuzzer allows us to find this bug again, and another one in the same protocol. These bugs are coming from the arkworks library (especially the one implementing Poseidon).
As mentioned above, when we squeeze
elements, we output r values of the state
successively, and between each r outputted values, we permute()
. The problem here is that there is a case where we don’t permute()
. Indeed, due to the line 178, we permute()
only when the number of remaining elements to squeeze is not the rate r. So, if we first squeeze()
I values with 0 < l < r, and then we squeeze()
after that r values, the last I outputs of the second squeeze()
will be the same as the I squeezed
values of the previous squeeze()
because the second squeeze()
will first output the values of the first squeeze()
which have not been squeezed (the r-I last values of the state
), check the number of values to squeeze()
(which is the rate
r since the length of the output vector has not been changed – it is done line 182), and permute or not following the result of the comparison. In our case, the comparison leads to not permute()
(since the length of the output vector is r), and so we enter in the first if
loop line 160 without permute()
before. We literally recover the values of the state (which are the same as before), and return it.
We represented the error in the following schemes:
This error is very simple to understand. What happens if we squeeze()
0
elements? The instinctive and logical answer is that nothing is happening, i.e. there is no permutation
(the internal state
of the sponge remains intact) and no element is output.
But in the current implementation, calling squeeze
(squeeze_native_field_elements
in the code) on 0
right after an absorb
results in a permutation
on the internal state
(instead of doing nothing).
The discovery of these two bugs in the Arkworks implementation of Poseidon Sponge underscores the critical need for precision in cryptographic systems. Cryptographic libraries, even those widely trusted, are not immune to subtle implementation errors, which can lead to significant consequences for the systems that rely on them.
By leveraging differential fuzzing across multiple implementations, we have exposed flaws in ArkWorks and potential vulnerabilities elsewhere. This methodology is proving invaluable in uncovering hidden issues and ensuring the robustness of cryptographic protocols.
These findings are likely the tip of the iceberg. As we continue our analysis, we anticipate uncovering more unexpected behaviors in implementations. This reinforces the need for rigorous testing and validation as the cornerstone of cryptographic security.
Patrick Ventuzelo / @Pat_Ventuzelo
Ronan Thoraval / @RonanThoraval
Founded in 2021 and headquartered in Paris, FuzzingLabs is a cybersecurity startup specializing in vulnerability research, fuzzing, and blockchain security. We combine cutting-edge research with hands-on expertise to secure some of the most critical components in the blockchain ecosystem.
Contact us for an audit or long term partnership!
Cookie | Duration | Description |
---|---|---|
cookielawinfo-checkbox-analytics | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics". |
cookielawinfo-checkbox-functional | 11 months | The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional". |
cookielawinfo-checkbox-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
cookielawinfo-checkbox-others | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other. |
cookielawinfo-checkbox-performance | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance". |
viewed_cookie_policy | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |