Underflow possible in evaluate_non_native_field_multiplication
Description
This finding primarily concerns the method evaluate_non_native_field_multiplication
of the class UltraCircuitBuilder_
. We refer to section ref↗ for background on how this function is intended to work. In Finding ref↗, we discuss how the calculations done in that function can overflow modulo , the native circuit prime modulus, due to sums of products becoming too large. In this finding, we discuss how subtractions happening in the calculations can cause an underflow modulo , similarly causing incorrect results, though in this case resulting in a completeness rather than a soundness issue (under mild assumptions on caller behavior).
The core computation of the function is summarized by the following snippet containing the out-of-circuit calculation of the witnesses, which are then constrained elsewhere in the function:
FF lo_0 = a[0] * b[0] - r[0] + (a[1] * b[0] + a[0] * b[1]) * LIMB_SHIFT;
FF lo_1 = (lo_0 + q[0] * input.neg_modulus[0] +
(q[1] * input.neg_modulus[0] + q[0] * input.neg_modulus[1] - r[1]) * LIMB_SHIFT) *
LIMB_RSHIFT_2;
FF hi_0 = a[2] * b[0] + a[0] * b[2] + (a[0] * b[3] + a[3] * b[0] - r[3]) * LIMB_SHIFT;
FF hi_1 = hi_0 + a[1] * b[1] - r[2] + (a[1] * b[2] + a[2] * b[1]) * LIMB_SHIFT;
FF hi_2 = (hi_1 + lo_1 + q[2] * input.neg_modulus[0] +
(q[3] * input.neg_modulus[0] + q[2] * input.neg_modulus[1]) * LIMB_SHIFT);
FF hi_3 = (hi_2 + (q[0] * input.neg_modulus[3] + q[1] * input.neg_modulus[2]) * LIMB_SHIFT +
(q[0] * input.neg_modulus[2] + q[1] * input.neg_modulus[1])) *
LIMB_RSHIFT_2;
Note that the four remainder limbs are subtracted here. Thus, if the other terms of the expressions for the crucial end results lo_1/LIMB_RSHIFT_2
and hi_3/LIMB_RSHIFT_2
are smaller than the terms that get subtracted, which are r[0] + r[1]*LIMB_SHIFT
and r[2] + r[3]*LIMB_SHIFT
, respectively, then an underflow and wraparound modulo will happen. Calls to evaluate_non_native_field_multiplication
usually have remainders that passed a reduction check before, so r[0],r[1],r[2],r[3]
should all be at most about 117 bits wide, and so the subtracted terms must be significantly smaller than . Thus, on wraparound due to underflow, we must obtain a value that is very large (close to ). Putting aside the issue described in Finding ref↗, the range check on lo_1
or hi_3
would then fail. Thus, the possibility of an underflow in these calculations amounts to a completeness issue.
We now discuss a realistic example that is impacted by this completeness issue. Suppose we construct two bigfield
elements with binary limbs (using little-endian notation) (0, 1<<116, 0, 0)
and (1, 0, 0, 0)
and divide the former by the latter. The former represents the element 1<<(116+68) = 1<<184
, and the second the element 1
, so the division should result in 1<<184
again. Note that both elements will pass the reduction check.
The call to operator/
will cause a call to internal_div
with numerators = [(0, 1<<116, 0, 0)]
and denominator = (1, 0, 0, 0)
. This function then calculates some additional values as follows:
const uint1024_t left = uint1024_t(numerator_values);
const uint1024_t right = uint1024_t(denominator.get_value());
const uint1024_t modulus(target_basis.modulus);
uint512_t inverse_value = right.lo.invmod(target_basis.modulus).lo;
uint1024_t inverse_1024(inverse_value);
inverse_value = ((left * inverse_1024) % modulus).lo;
const uint1024_t quotient_1024 =
(uint1024_t(inverse_value) * right + unreduced_zero().get_value() - left) / modulus;
const uint512_t quotient_value = quotient_1024.lo;
// ...
quotient = create_from_u512_as_witness(ctx, quotient_value, false, num_quotient_bits);
inverse = create_from_u512_as_witness(ctx, inverse_value);
Here we will then have inverse_value = 1<<(116+68)
. Thus, inverse
will be (0, 0, 1<<48, 0)
. As unreduced_zero()
is 263 bits wide if is 256 bits, we will get that the quotient is at most a 7-bit value of the form (q, 0, 0, 0)
. In the end, internal_div
calls unsafe_evaluate_multiply_add
as follows:
unsafe_evaluate_multiply_add(denominator, inverse, { unreduced_zero() }, quotient, numerators);
Here is the signature of that function:
template <typename Builder, typename T>
void bigfield<Builder, T>::unsafe_evaluate_multiply_add(const bigfield& input_left, const bigfield& input_to_mul, const std::vector<bigfield>& to_add, const bigfield& input_quotient, const std::vector<bigfield>& input_remainders)
In our case, we will have the following values:
input_left = (1, 0, 0, 0)
input_to_mul = (0, 0, 1<<48, 0)
to_add = [(u0, u1, u2, u3)]
input_quotient = (q, 0, 0, 0)
input_remainders = [(0, 1<<116, 0, 0)]
Where each u
is at most 68 bits wide, and q
is at most 7 bits wide. We assume we are in the HasPlookup<Builder>
case and the needs_normalize=true
. The function will add various contributions from input_remainders
and to_add
together. At the end we will get this:
remainder_limbs = (-u0 + (1<<(68+116)) - (u1<<68), 0, -u2 - u3<<68 , 0)
Then, the function UltraCircuitBuilder_::evaluate_non_native_field_multiplication
is called with the following parameters:
a = (1, 0, 0, 0)
b = (0, 0, 1<<48, 0)
q = (q, 0, 0, 0)
r = (-u0 + (1<<(68+116)) - (u1<<68), 0, -u2 - u3<<68 , 0)
The function evaluate_non_native_field_multiplication
will calculate and constrain the following values, with lo_1
being returned to be range checked:
FF lo_0 = a[0] * b[0] - r[0] + (a[1] * b[0] + a[0] * b[1]) * LIMB_SHIFT;
FF lo_1 = (lo_0 + q[0] * input.neg_modulus[0] +
(q[1] * input.neg_modulus[0] + q[0] * input.neg_modulus[1] - r[1]) * LIMB_SHIFT) *
LIMB_RSHIFT_2;
In the case under consideration, this will amount to the following:
lo_0 = u0 + (u1<<68) - (1<<(68+116)
lo_1 = (u0 + (u1<<68) - (1<<(68+116)) + q*neg_modulus[0] + (q*neg_modulus[1])<<68) * LIMB_RSHIFT_2
= (u0 + (u1<<68) + q*neg_modulus[0] + (q*neg_modulus[1])<<68 - (1<<(68+116))) * LIMB_RSHIFT_2
As stated before, u0
and u1
are at most 68 bits wide, so the first two summands are at most 68 and 2*68=136
bits wide, respectively. As q
was estimated to be at most 7 bits wide and neg_modulus[0]
and neg_modulus[1]
are at most 68 bits, we furthermore conclude that the third and fourth summands are at most 7+68=75
and 7+68+68=143
bits wide. Thus, the sum of the four positive summands can be at most 144 bits wide. However, 1<<(68+116)
is larger than even any 68+116=184
-bit--wide value. Thus, we will get an underflow. Note that while our estimates might be off by a small number of bits on intermediate steps (for example if has less than 256 bits), the gap of 40 bits is so large that such estimation errors do not change the conclusion.
The following test case demonstrates the problem. To demonstrate that the values needed to trigger the error can occur naturally, we instantiate a bigfield
element with limbs at most 68 bits wide (the second limb being 1<<67
, the others zero), and then add this element to itself nine times, to obtain an element with 1<<76
as the second limb. By testing with this test case, we empirically see that the issue is triggered by this limb 1<<76
, while it is not triggered by 1<<75
. This is as expected, as 76 is exactly 40 less than 116 — this being precisely the gap of 40 bits we arrived at above.
static void test_zellic_division_underflow()
{
std::cout << "\nIn Zellic division underflow test!" << std::endl;
auto builder = Builder();
fq_ct numerator = fq_ct::create_from_u512_as_witness(&builder, uint256_t(1) << (68 + 67));
ASSERT(numerator.binary_basis_limbs[0].element.get_value() == 0);
ASSERT(numerator.binary_basis_limbs[1].element.get_value() == uint256_t(1)<<67);
ASSERT(numerator.binary_basis_limbs[2].element.get_value() == 0);
ASSERT(numerator.binary_basis_limbs[3].element.get_value() == 0);
numerator.binary_basis_limbs[0].maximum_value = 0;
numerator.binary_basis_limbs[1].maximum_value = uint256_t(1)<<67;
numerator.binary_basis_limbs[2].maximum_value = 0;
numerator.binary_basis_limbs[3].maximum_value = 0;
for (size_t i = 0; i < 9 ; i++) {
numerator = numerator + numerator;
info("Doubled ", i, " times, second limb: ", numerator.binary_basis_limbs[1].element);
}
info("\nnumerator:");
for (size_t limb_index = 0; limb_index < 4 ; limb_index++) {
info("Limb ", limb_index, ": ", numerator.binary_basis_limbs[limb_index]);
}
fq_ct denominator = fq_ct::create_from_u512_as_witness(&builder, uint256_t(1));
info("\ndenominator:");
for (size_t limb_index = 0; limb_index < 4 ; limb_index++) {
info("Limb ", limb_index, ": ", denominator.binary_basis_limbs[limb_index]);
}
fq_ct result = numerator / denominator;
info("\nback in test case:");
info("\nresult:");
for (size_t limb_index = 0; limb_index < 4 ; limb_index++) {
info("Limb ", limb_index, ": ", result.binary_basis_limbs[limb_index]);
}
bool checked = CircuitChecker::check(builder);
EXPECT_TRUE(checked);
}
This test case fails with the following output:
Failed Arithmetic relation at row idx = 1835
Failed at block idx = 1
/home/user/aztec-packages/barretenberg/cpp/src/barretenberg/stdlib/primitives/bigfield/
bigfield.test.cpp:372: Failure
Value of: checked
Actual: false
Expected: true
[ FAILED ] stdlib_bigfield/1.zellicdivisionunderflow, where TypeParam = bb::UltraCircuitBuilder_<bb::UltraArith<bb::field<bb::Bn254FrParams> > > (26 ms)
[----------] 1 test from stdlib_bigfield/1 (26 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (26 ms total)
[ PASSED ] 0 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] stdlib_bigfield/1.zellicdivisionunderflow, where TypeParam = bb::UltraCircuitBuilder_<bb::UltraArith<bb::field<bb::Bn254FrParams> > >
1 FAILED TEST
The test case fails both for the UltraCircuitBuilder as expected based on the discussion above. With the StandardCircuitBuilder, the test case fails as well, likely for analogous reasons. However, we have not analyzed the precise flow for this case.
Impact
Some divisions and possibly other arithmetic operations for bigfield
elements will fail to be proven in the current implementation, amounting to a completeness issue.
Recommendations
As limbs come with maximum bounds but not minimum bounds, this problem cannot be easily fixed merely by checking whether certain bounds are satisfied and if not to reduce. Note that, considering evaluate_non_native_field_multiplication
itself, there is no reasonable upper bound for the remainder that can reliably prevent an underflow in lo_1
, as the lower two limbs of a
and b
could be zero, with q
being 1
, so that it would be required to impose the requirement that r[1] <= input.neg_modulus[1]
. However, as input.neg_modulus[1]
is just a 68-bit value, this would mean that r[1]
is restricted from taking some larger 68-bit values, which would make it impossible to represent some remainders for passing to the evaluate_non_native_field_multiplication
function.
A solution might thus involve restricting the maximum value of the remainder limbs and then adding an appropriate multiple of 1<<(2*68)
to lo_1
and hi_3
before multiplication by LIMB_RSHIFT_2
that is ensured to be larger than the negative contributions from the remainder. This would then prevent an underflow from happening. Care must be taken to prevent an overflow; see Finding ref↗. Estimates for how wide of a range check lo_1
and hi_3
need to be checked with will need to be updated as well. Note that hi_2
needs to be corrected to account for the modification of lo_2
.