Assessment reports>Universal Proof Aggregator>Informational findings>Poseidon edge cases
Category: Code Maturity

Poseidon edge cases

Informational Severity
Informational Impact
N/A Likelihood

Description

In circuits/src/utils/hashing.rs, the function var_len_poseidon_no_len_check makes some assumptions that are not fully documented or could be checked in code to prevent incorrect usage.

The following is an extract of the function's implementation:

/// Computes the poseidon hash of `assigned[..num_parts]`.
///
/// # Note
///
/// `num_parts` must be constrained to be smaller than `assigned.max_parts()`.
pub fn var_len_poseidon_no_len_check<F, T>(
    ctx: &mut Context<F>,
    domain_tag_str: Option<&str>,
    chip: &GateChip<F>,
    num_parts: AssignedValue<F>,
    assigned: &T,
) -> AssignedValue<F>
where
    F: EccPrimeField,
    T: InCircuitPartialHash<F>,
{
    let max_parts = assigned.max_parts();
    // `PoseidonHasher` absorbs the field elements in `POSEIDON_R`-sized chunks.
    // For possible number of parts, we compute how many chunks `PoseidonHasher` would
    // be absorbing (more precisely, the corresponding chunk indices), as well as the
    // remainder, i.e., how many extra field elements are there which wouldn't fill a chunk.
    let number_of_elements = (0..max_parts)
        .map(|parts| {
            assigned.parts_to_num_elements(parts)
                + (domain_tag_str.is_some() as usize)
        })
        .collect_vec();
    let (chunk_indices, remainder_lens): (Vec<_>, Vec<_>) = number_of_elements
        .iter()
        .map(|num_elts| (num_elts / POSEIDON_R, num_elts % POSEIDON_R))
        .unzip();

    // ...
    
     // Retrieve all intermediate states (after the absorption of each chunk).
    let (intermediate_states, is_remainder_len_zero) = hasher.poseidon.intermediate_states(ctx, chip);

    // ...
    
    // Take only the states with the relevant chunk indexes. Note that these
    // indices are given by type T's implementation of `InCircuitPartialHash`,
    // so they don't need to be selected in-circuit: they are constant and don't
    // depend on `num_parts`.
    let intermediate_states = intermediate_states
        .into_iter()
        .enumerate()
        .filter(|(index, _)| chunk_indices.contains(index))
        .map(|(_, state)| state.to_vec())
        .collect_vec();

    // Compute the `num_parts`-th bit bitmask.
    let bitmask = ith_bit_bitmask(ctx, chip, num_parts, max_parts as u64);
    // Select the right intermediate state
    let state = select_with_bitmask(ctx, chip, &bitmask, intermediate_states);
    
    // ...
}

We can see that number_of_elements is a list where the component with index i contains the number of field elements that the first i parts of assigned would use, possibly together with the domain tag if relevant. Then, chunk_indices is calculated as the list that as component with index i contains the number of full chunks that would be filled by number_of_elements[i] many field elements. For example, it could be that each part consists of six field elements, and there is no domain tag. Then, with POSEIDON_R = 2, we would have number_of_elements = [0, 6, 12, 18, ...] and chunk_indices = [0, 3, 6, 9, ...].

In the first assignment of intermediate_states, it will be a list whose component at index i is the state after absorbing the first i chunks. To be able to talk about the reassigned intermediate_states, we will denote it by intermediate_states_new. The intention is that intermediate_states_new[i] = intermediate_states[chunk_indices[i]]. So in our example above, the intention is that intermediate_states_new = [intermediate_states[0], intermediate_states[3], intermediate_states[6], intermediate_states[9], ...].

This is implemented by filtering intermediate_states and only keeping those components whose index appears in chunk_indices. This works for the example considered. However, it assumes that the components of chunk_indices do not repeat. If InCircuitPartialHash were implemented for a type for which each part takes no field elements to store (for example, the edge case of lists of tuples of length zero), then we would have chunk_indices = [0, 0, 0, ...], and the intention would be to get intermediate_states_new = [intermediate_states[0], intermediate_states[0], intermediate_states[0], ...]. However, instead we will have intermediate_states_new = [intermediate_states[0]], and the call to select_with_bitmask will panic because bitmask and intermediate_states_new will have different lengths.

Another assumption that is made by the implementation of var_len_poseidon_no_len_check that is not fully documented is that assigned.max_parts() and assigned.parts_to_num_elements(parts) need to be independent of values (so only depend on types).

Impact

The current in-scope codebase satisfies the assumptions needed for var_len_poseidon_no_len_check to work correctly. To prevent issues arising with future code changes, we nevertheless recommend documenting the assumptions made by the implementation.

Recommendations

We recommend to document that this function assumes that assigned.max_parts() and assigned.parts_to_num_elements(parts) need to be independent of values. Furthermore, we recommend to document that assigned.parts_to_num_elements must be injective (not have reoccurring return values) and assert in code that no components of chunk_indices reoccur. Alternatively, change the implementation of the snippet

let intermediate_states = intermediate_states
    .into_iter()
    .enumerate()
    .filter(|(index, _)| chunk_indices.contains(index))
    .map(|(_, state)| state.to_vec())
    .collect_vec();

to also cover the case of reoccurring chunk indexes by implementing this (pseudocode): intermediate_states = [intermediate_states[chunk_indices[i]] for i in 0..max_parts].

Remediation

This issue has been acknowledged by Nebra, and a fix was implemented in commit 59df2dc3.

Zellic © 2024Back to top ↑