Poseidon edge cases
Description
In circuits/src/utils/hashing.rs
, the function var_len_poseidon_no_len_check
makes some assumptions that are not fully documented or could be checked in code to prevent incorrect usage.
The following is an extract of the function's implementation:
/// Computes the poseidon hash of `assigned[..num_parts]`.
///
/// # Note
///
/// `num_parts` must be constrained to be smaller than `assigned.max_parts()`.
pub fn var_len_poseidon_no_len_check<F, T>(
ctx: &mut Context<F>,
domain_tag_str: Option<&str>,
chip: &GateChip<F>,
num_parts: AssignedValue<F>,
assigned: &T,
) -> AssignedValue<F>
where
F: EccPrimeField,
T: InCircuitPartialHash<F>,
{
let max_parts = assigned.max_parts();
// `PoseidonHasher` absorbs the field elements in `POSEIDON_R`-sized chunks.
// For possible number of parts, we compute how many chunks `PoseidonHasher` would
// be absorbing (more precisely, the corresponding chunk indices), as well as the
// remainder, i.e., how many extra field elements are there which wouldn't fill a chunk.
let number_of_elements = (0..max_parts)
.map(|parts| {
assigned.parts_to_num_elements(parts)
+ (domain_tag_str.is_some() as usize)
})
.collect_vec();
let (chunk_indices, remainder_lens): (Vec<_>, Vec<_>) = number_of_elements
.iter()
.map(|num_elts| (num_elts / POSEIDON_R, num_elts % POSEIDON_R))
.unzip();
// ...
// Retrieve all intermediate states (after the absorption of each chunk).
let (intermediate_states, is_remainder_len_zero) = hasher.poseidon.intermediate_states(ctx, chip);
// ...
// Take only the states with the relevant chunk indexes. Note that these
// indices are given by type T's implementation of `InCircuitPartialHash`,
// so they don't need to be selected in-circuit: they are constant and don't
// depend on `num_parts`.
let intermediate_states = intermediate_states
.into_iter()
.enumerate()
.filter(|(index, _)| chunk_indices.contains(index))
.map(|(_, state)| state.to_vec())
.collect_vec();
// Compute the `num_parts`-th bit bitmask.
let bitmask = ith_bit_bitmask(ctx, chip, num_parts, max_parts as u64);
// Select the right intermediate state
let state = select_with_bitmask(ctx, chip, &bitmask, intermediate_states);
// ...
}
We can see that number_of_elements
is a list where the component with index i
contains the number of field elements that the first i
parts of assigned
would use, possibly together with the domain tag if relevant. Then, chunk_indices
is calculated as the list that as component with index i
contains the number of full chunks that would be filled by number_of_elements[i]
many field elements. For example, it could be that each part consists of six field elements, and there is no domain tag. Then, with POSEIDON_R = 2
, we would have number_of_elements = [0, 6, 12, 18, ...]
and chunk_indices = [0, 3, 6, 9, ...]
.
In the first assignment of intermediate_states
, it will be a list whose component at index i
is the state after absorbing the first i
chunks. To be able to talk about the reassigned intermediate_states
, we will denote it by intermediate_states_new
. The intention is that intermediate_states_new[i] = intermediate_states[chunk_indices[i]]
. So in our example above, the intention is that intermediate_states_new = [intermediate_states[0], intermediate_states[3], intermediate_states[6], intermediate_states[9], ...]
.
This is implemented by filtering intermediate_states
and only keeping those components whose index appears in chunk_indices
. This works for the example considered. However, it assumes that the components of chunk_indices
do not repeat. If InCircuitPartialHash
were implemented for a type for which each part takes no field elements to store (for example, the edge case of lists of tuples of length zero), then we would have chunk_indices = [0, 0, 0, ...]
, and the intention would be to get intermediate_states_new = [intermediate_states[0], intermediate_states[0], intermediate_states[0], ...]
. However, instead we will have intermediate_states_new = [intermediate_states[0]]
, and the call to select_with_bitmask
will panic because bitmask
and intermediate_states_new
will have different lengths.
Another assumption that is made by the implementation of var_len_poseidon_no_len_check
that is not fully documented is that assigned.max_parts()
and assigned.parts_to_num_elements(parts)
need to be independent of values (so only depend on types).
Impact
The current in-scope codebase satisfies the assumptions needed for var_len_poseidon_no_len_check
to work correctly. To prevent issues arising with future code changes, we nevertheless recommend documenting the assumptions made by the implementation.
Recommendations
We recommend to document that this function assumes that assigned.max_parts()
and assigned.parts_to_num_elements(parts)
need to be independent of values. Furthermore, we recommend to document that assigned.parts_to_num_elements
must be injective (not have reoccurring return values) and assert in code that no components of chunk_indices
reoccur. Alternatively, change the implementation of the snippet
let intermediate_states = intermediate_states
.into_iter()
.enumerate()
.filter(|(index, _)| chunk_indices.contains(index))
.map(|(_, state)| state.to_vec())
.collect_vec();
to also cover the case of reoccurring chunk indexes by implementing this (pseudocode): intermediate_states = [intermediate_states[chunk_indices[i]] for i in 0..max_parts]
.
Remediation
This issue has been acknowledged by Nebra, and a fix was implemented in commit 59df2dc3↗.