Assessment reports>Scroll zkEVM>Critical findings>Missing constraints in the copy circuit for ,MCOPY, allow inserting illegitimate entries in the rw table
Category: Coding Mistakes

Missing constraints in the copy circuit for MCOPY allow inserting illegitimate entries in the rw table

Critical Severity
Critical Impact
High Likelihood

Description

This finding concerns constraints missing in the copy circuit regarding copy events associated to MCOPY instructions, which ultimately allow inserting illegitimate entries in the rw table. To describe this issue, we will start by describing the relevant aspects of the rw table and the intended way to ensure there are no illegitimate entries in the rw table, via the EVM circuit's counting of legitimate rw lookups. We will then discuss how, in the case of MCOPY instructions, the correctness of this count depends on the copy circuit correctly constraining a certain value. Finally, we can discuss how the copy circuit constrains that value, and we will see that it is insufficiently constrained.

Purpose of the rw table and state circuit

The rw table is used to keep track of consistency of read and write operations that write to or read from, among other locations, storage, memory, and stack. This table is constrained by the state circuit, which ensures that, for example, reads that follow a write to some location will read the value written, that reads do not change the value, and so on. The state circuit does not however check that the entries are actually legitimate, so from the perspective of the state circuit only, any writes may appear in the table. Non-Start entries of the rw table are intended to have a unique ID, given by the counter, which is to increment on each read or write. Entries of the Start type, which are used as padding at the start of the table, have separate counters, counting up on each row. By looking up a Start entry with counter being 1 and one Start entry with counter being a high number, an upper bound on the number of non-Start entries in the rw table can be enforced.

Ensuring that there are no malicious entries in the rw table

Other circuits make lookups to the rw table to perform reads and writes to storage, memory, and so on. In order to ensure that all non-Start entries in the rw table actually arise from such legitimate lookups, the EVM circuit keeps track of the current counter for the next entry of the rw table in state.rw_counter of each step. This is done by adding the number of rw lookups that were done when transitioning from one step to the next. At the very end, the rw_counter should reflect the total number of legitimate lookups done by the EVM circuit or by the copy circuit. The EVM circuit then, at the end of the block, looks up two entries in the rw table of the Start type to ensure that the remaining number of rows available (not of type Start) are at most as many as expected for legitimate lookups:

// 3. Verify rw_counter counts to the same number of meaningful rows in
// rw_table to ensure there is no malicious insertion.
// Verify that there are at most total_rws meaningful entries in the rw_table
cb.rw_table_start_lookup(1.expr());
cb.rw_table_start_lookup(max_rws.expr() - total_rws.expr());
// Since every lookup done in the EVM circuit must succeed and uses
// a unique rw_counter, we know that at least there are
// total_rws meaningful entries in the rw_table.
// We conclude that the number of meaningful entries in the rw_table
// is total_rws.

This will ensure that there are indeed no malicious entries in the rw table as long as two assumptions hold:

  1. The count of legitimate rw lookups done is accurate.

  2. No two legitimate rw lookups are identical.

If either of these assumptions is violated, there will be additional rows available in the rw table in which an adversary may insert arbitrary malicious entries, such as writes to storage of their choosing.

This finding concerns the possibility of violating both assumptions for MCOPY instructions due to missing constraints in the copy circuit.

Constraints on rw_counter in the EVM circuit for MCOPY

In zkevm-circuits/src/evm_circuit/execution/mcopy.rs, the MCopyGadget::configure function adds the constraints for the MCOPY operation. Here is a relevant snippet:

// copy_rwc_inc used in copy circuit lookup.
let copy_rwc_inc = cb.query_cell();
cb.condition(memory_src_address.has_length(), |cb| {
    cb.copy_table_lookup(
        cb.curr.state.call_id.expr(),
        CopyDataType::Memory.expr(),
        cb.curr.state.call_id.expr(),
        CopyDataType::Memory.expr(),
        // src_addr
        memory_src_address.offset(),
        // src_addr_end
        memory_src_address.end_offset(),
        // dest_addr
        memory_dest_address.offset(),
        memory_dest_address.length(),
        // rlc_acc is 0 here.
        0.expr(),
        copy_rwc_inc.expr(),
    );
});

cb.condition(not::expr(memory_src_address.has_length()), |cb| {
    cb.require_zero(
        "if no bytes to copy, copy table rwc inc == 0",
        copy_rwc_inc.expr(),
    );
});

The crucial circuit variable to consider here is copy_rwc_inc, which is part of the arguments for copy_table_lookup, which will add the copy table lookup. Otherwise copy_rwc_inc is not constrained here as long as the length of the the copy operation is positive. In the case that the copy operation is for length zero, copy_rwc_inc is constrained to be zero as well. We will only consider the case for positive length in the following.

The circuit builder's EVMConstraintBuilder::copy_table_lookup function is implemented in zkevm-circuits/src/evm_circuit/util/constraint_builder.rs as follows:

pub(crate) fn copy_table_lookup(
    &mut self,
    src_id: Expression<F>,
    src_tag: Expression<F>,
    dst_id: Expression<F>,
    dst_tag: Expression<F>,
    src_addr: Expression<F>,
    src_addr_end: Expression<F>,
    dst_addr: Expression<F>,
    length: Expression<F>,
    rlc_acc: Expression<F>,
    rwc_inc: Expression<F>,
) {
    self.add_lookup(
        "copy lookup",
        Lookup::CopyTable {
            is_first: 1.expr(), // is_first
            src_id,
            src_tag,
            dst_id,
            dst_tag,
            src_addr,
            src_addr_end,
            dst_addr,
            length,
            rlc_acc,
            rw_counter: self.curr.state.rw_counter.expr() + self.rw_counter_offset(),
            rwc_inc: rwc_inc.clone(),
        },
    );

    self.rw_counter_offset = self.rw_counter_offset.clone() + self.condition_expr() * rwc_inc;
}

The copy_rwc_inc circuit variable from MCopyGadget::configure is used here as rwc_inc in the actual copy table lookup. It is also used to increase the read write counter offset accordingly. This means that if the operation currently executing actually is MCOPY (so the condition expression is 1), then the circuit builder's rw_counter_offset will increase by copy_rwc_inc.

Back in MCopyGadget::configure in zkevm-circuits/src/evm_circuit/execution/mcopy.rs, the state transition is constrained by the following snippet:

let step_state_transition = StepStateTransition {
    rw_counter: Transition::Delta(cb.rw_counter_offset()),
    program_counter: Transition::Delta(1.expr()),
    stack_pointer: Transition::Delta(3.expr()),
    memory_word_size: Transition::To(memory_expansion.next_memory_word_size()),

    gas_left: Transition::Delta(-gas_cost),
    ..Default::default()
};
let same_context = SameContextGadget::construct(cb, opcode, step_state_transition);

So the rw_counter will increase by cb.rw_counter_offset for the next execution step. In effect this means that rw_counter will increase by copy_rwc_inc, and the only other way this value is constrained in the evm circuit is via the copy table lookup.

The copy circuit

In the copy circuit, the column containing copy_rwc_inc is called rwc_inc_left. The copy circuit is thus responsible to constrain the value of the rwc_inc_left column in the first row of the copy event to be the number of distinct rw table lookups that are performed by it.

However, this is not the case. The top-level function introducing constraints for the copy circuit is CopyCircuitConfig::new in zkevm-circuits/src/copy_circuit.rs. The only place rwc_inc_left is used there is in a call to constrain_rw_counter in zkevm-circuits/src/copy_circuit/copy_gadgets.rs.

That function introduces two different types of constraints, depending on whether or not the copy operation is a memory-to-memory copy.

The case of copies that are not memory-to-memory

The copy table contains a row for each byte read or written during the copy operation. However, reads and writes actually happen per word, which is 32 bytes of length. Multiple bytes will thus be read or written from the same lookup in the rw table. Some relevant columns (or expressions formed from columns) for the discussion here are as follows:

Copy events have rows with read and write rows alternating. For copies that are not memory-to-memory, the first row will read the first byte, the second row will write the first byte, and so on. There is some additional padding logic if the copy operation begins or ends in the middle of a word, but we will ignore this in our exposition here, as it is not relevant to the finding.

We depict below an example for a copy operation that involves copying two words. The r/w column indicates whether the row is a read or write row, the word column indicates which word is being copied, and word_index is the index of the byte currently copied within the current word. In this example, we have not yet taken into account the constraints imposed by constrain_rw_counter.

Here, we indicated the rw_counter and rwc_inc_left entries in the first row with and to indicate that these are (as we have not yet imposed the constraints from constrain_rw_counter) arbitrary values — but looked up by the EVM circuit. Entries marked with a question mark indicate that these are fully unconstrained cells (apart from rw_counter being used in lookups into the rw table).

We are now ready to consider the constraints imposed by constrain_rw_counter in the case where is_memory_copy is 0 (indicating we are not doing a memory-to-memory copy).

// Decrement rwc_inc_left for the next row, when an RWrwc_inc_left operation happens.
let rwc_diff = is_rw_type.expr() * is_row_end.expr();
let new_value = meta.query_advice(rwc_inc_left, CURRENT) - rwc_diff;
let is_last = meta.query_advice(is_last_col, CURRENT);

// ...

// At the end, it must reach 0.
let update_or_finish = select::expr(
    not::expr(is_last.clone()),
    meta.query_advice(rwc_inc_left, NEXT_ROW),
    0.expr(),
);

// ...

cb.condition(not::expr(is_memory_copy.clone()), |cb| {
    cb.require_equal(
        "rwc_inc_left[1] == rwc_inc_left[0] - rwc_diff, or 0 at the end",
        new_value.clone(),
        update_or_finish,
    );
});

// ...

// Maintain rw_counter based on rwc_inc_left. Their sum remains constant in all cases.
cb.condition(not::expr(is_last.expr()), |cb| {
    cb.require_equal(
        "rw_counter[0] + rwc_inc_left[0] == rw_counter[1] + rwc_inc_left[1]",
        meta.query_advice(rw_counter, CURRENT) + meta.query_advice(rwc_inc_left, CURRENT),
        meta.query_advice(rw_counter, NEXT_ROW) + meta.query_advice(rwc_inc_left, NEXT_ROW),
    );
});

As is_rw_type will be 1, rwc_diff will be the same as is_row_end. Thus, new_value will be rwc_inc_left decremented by one at rows for the last byte of a word, otherwise it will be rwc_inc_left unchanged.

There are then two cases to distinguish depending on whether is_last is 1 or 0. If it is 0, then update_or_finish will be rwc_inc_left of the next row. Thus, rwc_inc_left of the next row will be constrained to be the same as for the current row or decremented by 1 in the case of the last byte in a word. We can fill out some parts of the table with this information:

When we are on the very last row, where is_last = 1, we will have update_or_finish = 0 instead. As on that row is_row_end = 1 as well, we have new_value = rwc_inc_left - 1. Thus, the constraint new_value = update_or_finish enforces rwc_inc_left = 1. We can see that we already filled in that cell with . This implies that we must have in this example:

Finally, we consider the remaining constraint:

// Maintain rw_counter based on rwc_inc_left. Their sum remains constant in all cases.
cb.condition(not::expr(is_last.expr()), |cb| {
    cb.require_equal(
        "rw_counter[0] + rwc_inc_left[0] == rw_counter[1] + rwc_inc_left[1]",
        meta.query_advice(rw_counter, CURRENT) + meta.query_advice(rwc_inc_left, CURRENT),
        meta.query_advice(rw_counter, NEXT_ROW) + meta.query_advice(rwc_inc_left, NEXT_ROW),
    );
});

This ensures that the sum of rw_counter and rwc_inc_left cannot change within a copy event. As the sum in the first row of our example is , it must thus be in the remaining rows as well, so we can fill in the remainder of the table:

We would like to emphasize in particular here that the constraints enforce that the value of rwc_inc_left in the first row accurately reflect the number of words read or written. There is a single countdown for read and write rows together.

The case of memory-to-memory copies

In the case of memory-to-memory copies, the table we obtained above would not be adequate. To see why that is, imagine that the copy operation is to copy words 0 and 1 to words 1 and 2. With the above table, the reads and writes happening would be as follows:

Note that with this access order, the first write will overwrite the original value at word 1. However, MCOPY should operate as if an intermediate buffer were used. Thus this access order will not work for MCOPY. It is instead necessary to first carry out all reads and only then perform the writes.

The table that is intended in the case of our example above, but for MCOPY, would be as follows:

Let us now consider the constraints for this case:

// Decrement rwc_inc_left for the next row, when an RWrwc_inc_left operation happens.
let rwc_diff = is_rw_type.expr() * is_row_end.expr();
let new_value = meta.query_advice(rwc_inc_left, CURRENT) - rwc_diff;
let is_last = meta.query_advice(is_last_col, CURRENT);
let is_last_two = meta.query_advice(is_last_col, NEXT_ROW);

// ...

let update_or_finish_mcopy = meta.query_advice(rwc_inc_left, NEXT_STEP);

// ...

// handle is_memory_copy case: for all read steps, `rwc_inc_left` decrease by 1 or 0,
// for all write steps, `rwc_inc_left` decrease by 1 or 0 as well. this is not the same as
// normal case( `rwc_inc_left` decrease by 1 or 0 for consecutive read steps--> write step
// --> read step -->write step ...).
cb.condition(
    is_memory_copy * not::expr(is_last_two) * not::expr(is_last.clone()),
    |cb| {
        cb.require_equal(
            "rwc_inc_left[2] == rwc_inc_left[0] - rwc_diff, or 0 at the end",
            new_value,
            update_or_finish_mcopy,
        );
    },
);

// Maintain rw_counter based on rwc_inc_left. Their sum remains constant in all cases.
cb.condition(not::expr(is_last.expr()), |cb| {
    cb.require_equal(
        "rw_counter[0] + rwc_inc_left[0] == rw_counter[1] + rwc_inc_left[1]",
        meta.query_advice(rw_counter, CURRENT) + meta.query_advice(rwc_inc_left, CURRENT),
        meta.query_advice(rw_counter, NEXT_ROW) + meta.query_advice(rwc_inc_left, NEXT_ROW),
    );
});

For rows except the last two, rwc_inc_left of two rows down (NEXT_STEP is 2, whereas NEXT_ROW is 1) is constrained to be new_value, which is the possibly decremented value of rwc_inc_left of the current row, as before. For the last two rows, there is no constraint imposed. We can fill in the table with this information again. The constraint that the sum of rw_counter and rwc_inc_left stays constant is the same as before, so we fill that in as well. Here we gave the value of rw_counter in the second row the value Z, which is arbitrary and not a constraint.

Implications for the EVM circuit

Using the notation from the above example, the EVM circuit fixes , as this is the current rw_counter at the start of the copy operation. The value of is copy_rwc_inc, which is used in the state transition to obtain the correct rw_counter for the next operation and ultimately constrain the number of nonpadding entries available in the rw table.

As we could see, this value can be chosen arbitrarily. By choosing a value that is larger than the actual number of read and write lookups done for the copy event, an adversary will cause an insufficient number of padding rows to be enforced in the rw table. They will then be able to insert additional rows between the reads and writes done by the copy event and the next legitimate reads and writes. This could include arbitrary writes to storage, including in other contexts.

Another value that can be chosen arbitrarily, and independently of , is , the rw_counter of the first write performed by the copy event. This enables another variant with which an adversary could insert malicious entries into the rw table, even if is set to the intended value. They could repeat the same MCOPY operation twice, and in such a way that during writes done during the copy operation, the previous values are also the same. In that case, the writes looked up by the two MCOPYs will only differ in possibly the rw_counter. The adversary can then actually use the same rw_counter (by setting to the same value), looking up the same entry in the rw table twice. Thus, the number of distinct write lookups done by the two copy events will be half of the intended and counted number. This frees up rows in the rw table that can be used for malicious writes.

Note that choosability of also means that writes can potentially happen in a different order than intended; for example by having two MCOPYs but the writes of the first are performed at the time of the second MCOPY and vice versa.

Impact

For transactions involving execution of MCOPY instructions of nonzero length, and adversary can insert arbitrary malicious writes to the rw table. The correct ordering of reads and writes is also not ensured for MCOPY.

We tested this issue with the variant in which copy_rwc_inc is set to a larger than intended value, allowing to insert a malicious write to the rw table. To do so, we made the following changes to the bus mapping.

--- a/bus-mapping/src/evm/opcodes/mcopy.rs
+++ b/bus-mapping/src/evm/opcodes/mcopy.rs
@@ -66,6 +66,9 @@ fn gen_copy_event(
     let (read_steps, write_steps, prev_bytes) =
         state.gen_copy_steps_for_memory_to_memory(exec_step, src_addr, dst_addr, length)?;
 
+    // Adding additional rw operation
+    state.memory_write_word(exec_step, 0x20u64.into(), Word::from(4247u64))?;
+
     Ok(CopyEvent {
         src_type: CopyDataType::Memory,
         // use call_id as src id for memory --> memory type.

This above first change to gen_copy_event adds an additional memory write when generating the data for the MCOPY execution step. This is the malicious write we insert. To be able to pass the constraints on the number of padding start rows, we must also make copy_rwc_inc too large. For this, we changed CopyEvent::rw_counter_delta to increase the value of copy_rwc_inc by one as well.

--- a/bus-mapping/src/circuit_input_builder/execution.rs
+++ b/bus-mapping/src/circuit_input_builder/execution.rs
@@ -522,8 +522,7 @@ impl CopyEvent {
             // operations.
             return self.full_length();
         }
-
-        (self.is_source_rw() as u64 + self.is_destination_rw() as u64) * (self.full_length() / 32)
+        (self.is_source_rw() as u64 + self.is_destination_rw() as u64) * (self.full_length() / 32) + 1
     }
 }

The code in bus-mapping is used to generate witnesses, so these changes only impact assignments to the circuit, not the constraints (and thus also not the verification key).

We then added the following test case to zkevm-circuits/src/evm_circuit/execution/mcopy.rs:

fn zellic_mcopy_rwc_inc() {
    let src_offset = Word::from(0x20);
    let dest_offset = Word::from(0x40);
    let length = Word::from(0x20);
    let mut code = Bytecode::default();
    code.append(&bytecode! {
        PUSH32(word!("123456789"))
        PUSH2(0x20)
        MSTORE
        PUSH32(length)
        PUSH32(src_offset)
        PUSH32(dest_offset)
        #[start]
        MCOPY
        PUSH2(0x20)
        MLOAD
        PUSH2(0x40)
        MLOAD
        STOP
    });

    let ctx = TestContext::<3, 1>::new(
        None,
        |accs| {
            accs[0]
                .address(address!("0x000000000000000000000000000000000000cafe"))
                .code(code);
            accs[1]
                .address(address!("0x0000000000000000000000000000000000000010"))
                .balance(Word::from(1u64 << 20));
        },
        |mut txs, accs| {
            txs[0]
                .to(accs[0].address)
                .from(accs[1].address)
                .gas(1_000_000.into());
        },
        |block, _tx| block.number(0x1111111),
    )
    .unwrap();

    CircuitTestBuilder::new_from_test_ctx(ctx)
        .params(CircuitsParams {
            max_copy_rows: 1750,
            ..Default::default()
        })
        .run();
}

Note that an MCOPY from 0x20 to 0x40 happens, copying the value 0x123456789 = 4886718345. Then the contents of 0x20 and 0x40 are loaded to the stack, which should both be 0x123456789 = 4886718345.

We ran this test with RUST_BACKTRACE=1 RUST_LOG=trace cargo test zellic_mcopy_rwc_inc -- --nocapture. The test passes, indicating that all constraints are satisfied. Below is an excerpt of the output produced by the state circuit, with the lines reordered so that rw_counter is increasing:

[2024-05-23T13:59:00Z TRACE zkevm_circuits::state_circuit] state circuit assign offset:65479 row:Memory { rw_counter: 41, is_write: true, call_id: 1, memory_address: 32, value: 4886718345, value_prev: 0 }
[2024-05-23T13:59:00Z TRACE zkevm_circuits::state_circuit] state circuit assign offset:65483 row:Memory { rw_counter: 42, is_write: true, call_id: 1, memory_address: 64, value: 0, value_prev: 0 }
[2024-05-23T13:59:00Z TRACE zkevm_circuits::state_circuit] state circuit assign offset:65480 row:Memory { rw_counter: 49, is_write: false, call_id: 1, memory_address: 32, value: 4886718345, value_prev: 4886718345 }
[2024-05-23T13:59:00Z TRACE zkevm_circuits::state_circuit] state circuit assign offset:65484 row:Memory { rw_counter: 50, is_write: true, call_id: 1, memory_address: 64, value: 4886718345, value_prev: 0 }
[2024-05-23T13:59:00Z TRACE zkevm_circuits::state_circuit] state circuit assign offset:65481 row:Memory { rw_counter: 51, is_write: true, call_id: 1, memory_address: 32, value: 4247, value_prev: 4886718345 }
[2024-05-23T13:59:00Z TRACE zkevm_circuits::state_circuit] state circuit assign offset:65482 row:Memory { rw_counter: 55, is_write: false, call_id: 1, memory_address: 32, value: 4247, value_prev: 4247 }
[2024-05-23T13:59:00Z TRACE zkevm_circuits::state_circuit] state circuit assign offset:65485 row:Memory { rw_counter: 56, is_write: false, call_id: 1, memory_address: 64, value: 4886718345, value_prev: 4886718345 }
[2024-05-23T13:59:00Z TRACE zkevm_circuits::state_circuit] state circuit assign offset:65486 row:Memory { rw_counter: 60, is_write: false, call_id: 1, memory_address: 64, value: 4886718345, value_prev: 4886718345 }
[2024-05-23T13:59:00Z TRACE zkevm_circuits::state_circuit] state circuit assign offset:65487 row:Memory { rw_counter: 61, is_write: false, call_id: 1, memory_address: 96, value: 0, value_prev: 0 }

For rw_counter being 49 and 50, we can see the read and write happening due to the copy event. At 51, we then see the illegitimate write that we inserted; a value of 4247 is written, which should not happen according to the bytecode we are executing. Then at counter 55, this smuggled-in value is read again, as part of the MLOAD. There is also a corresponding line with this value being written to the stack, confirming this:

[2024-05-23T13:59:00Z TRACE zkevm_circuits::state_circuit] state circuit assign offset:65478 row:Stack { rw_counter: 54, is_write: true, call_id: 1, stack_pointer: 1023, value: 4247 }

Recommendations

It should be ensured that the read and write rwc_inc_left countdowns match up and end with 1. Concretely, rwc_inc_left for the last read must be one bigger than rwc_inc_left for the first write, and rwc_inc_left for the last write must be 1.

Currently, there are essentially two constraints: rwc_inc_left[i+2] = rwc_inc_left - (1 if last byte of a word, else 0) and rw_counter[i+1] + rwc_inc_left[i+1] = rw_counter[i] + rwc_inc_left[i]. In this notation, rwc_inc_left[i] refers to the value of the rwc_inc_counter column in the i-th row of the copy event. The latter constraint can be rewritten and split up in three cases by plugging in the first one:

  1. For the first read and write, rw_counter[1] - rw_counter[0] = rwc_inc_left[0] - rwc_inc_left[1].

  2. When at the last byte of a word, rw_counter[i+2] = rw_counter[i] + 1.

  3. When not at the last byte of a word, rw_counter[i+2] = rw_counter[i].

To fix this issue, we suggest to add for the memory-to-memory case the constraints:

A. Constrain rwc_inc_left[0] = 2 * rwc_inc_left[1]. In other words, conditional on being on the first row of a copy event that is memory-to-memory, constrain rwc_inc_left to be twice the value of rwc_inc_left in the next row.

B. Constrain the last row's rwc_inc_left to be 1.

We will now discuss why this fixes the issues. Let n be the number of writes and reads (i.e., n writes and also n reads). Then constraint B implies that rwc_inc_left must be 1 for the last write. Together with the constraint rwc_inc_left[i+2] = rwc_inc_left - (1 if last byte of a word, else 0), this implies that for the first write we must have rwc_inc_left[1] = n. Now constraint A implies rwc_inc_left[0] = 2*n. This means the i-th read has rwc_inc_left = 2*n - i and the i-th write has rwc_inc_left = n - i. Now constraint 1 implies that rw_counter[1] = rw_counter[0] + n. Constraints 2 and 3 then imply that the i-th read has rw_counter = rw_counter[0] + i and the i-th write has rw_counter = rw_counter[0] + n + i. That rwc_inc_left[0] = 2*n means that the rw counter delta used back in the EVM circuit will be 2*n, and we just showed that the actual reads and writes looked up by the copy circuit are using rw counters rw_counter[0] through rw_counter[0] + 2*n - 1. There is thus no way anymore to look up a different write or to obtain an incorrect rw counter delta.

See ref for a discussion of a proposal to increase robustness against such vulnerabilities more long-term.

Remediation

This issue has been acknowledged by Scroll, and fixes were implemented in the following commits:

Zellic © 2024Back to top ↑