Endianess for Bytes32
Description
The Bytes32
type (sdk/api_bytes32.go) is intended as an in-circuit representation of the similarly named Solidity type, as the documentation comment states:
// Bytes32 is an in-circuit representation of the solidity bytes32 type.
type Bytes32 struct {
Val [2]frontend.Variable
}
A Solidity bytes32
type is a sequence of 32 bytes. For storage and the stack, Ethereum operates with 32-byte words, so one bytes32
is exactly one such word. Only memory is addressed bytewise. When storing a bytes32
in memory, the first byte is stored at the lowest address. The following Solidity example demonstrates this:
pragma solidity ^0.8.0;
contract Example {
function demonstrateBytes32InMemory() public pure returns (string memory) {
// Initialize bytes32 with a constant example string
bytes32 b32 = "Hello world!";
// Overwrite the second byte (lowest address + 1) with a new value, say 'X'
// 'X' ASCII value is 88
assembly {
mstore(mload(0x40), b32) // Store b32 in memory
mstore8(add(mload(0x40), 1), 88) // Store 0x58 (ASCII 'X') at the second byte position
b32 := mload(mload(0x40)) // Load b32 back from memory
}
// Return the bytes32 as a string
// Create a new bytes memory representation for string conversion
bytes memory result = new bytes(32);
assembly {
// Copy the bytes32 value into the result bytes array
mstore(add(result, 32), b32) // Store the bytes32 value in the new bytes array
}
return string(result); // Convert bytes to string and return
}
}
When a type consisting of successive bytes or bits such as bytes32
gets interpreted as an unsigned integer, endianess needs to be taken into account; one may interpret the first bytes/bit as the least significant (little endian) or most significant (big endian). Ethereum uses the big-endian convention on such conversions, as described in the first sentence of Appendix H of the yellow paperā.
Let us return to the Bytes32
type that is part of the Brevis sdk. Internally, it stores its data in two circuit variables Val[0]
and Val[1]
. The reason two circuit variables must be used is that the finite field of prime order over which the circuit is defined has only order r
, where r
is a 254-bit prime, and so insufficiently large for 32 bytes (so 256 bits) of data. The most natural expectation would be that Val[0]
will encode bytes 0 through k
for some k
, and Val[1]
will encode bytes k+1
through 31.
Let us consider the toBinaryVars
function, which converts Bytes32
to 256 circuit variables representing the bits making up the 32-byte--long bytestring.
// toBinaryVars defines the circuit that decomposes the Variables into little endian bits
func (v Bytes32) toBinaryVars(api frontend.API) []frontend.Variable {
var bits []frontend.Variable
bits = append(bits, api.ToBinary(v.Val[0], numBitsPerVar)...)
bits = append(bits, api.ToBinary(v.Val[1], 32*8-numBitsPerVar)...)
return bits
}
This implementation fits with the interpretation of Val[0]
and Val[1]
just given; the first numBitsPerVar
bits, which should correspond to the first numBitsPerVar / 8
bytes, are stored in Val[0]
, with the remaining bytes stored in Val[1]
. The api.ToBinary
function decomposes field elements into little-endian bits. Thus, we come to conclusion that Bytes32
is stored by dividing the bytes up into the first numBitsPerVar / 8
bytes and the remainder, with the former being stored in Val[0]
by interpreting the bytes as little-endian representation of an unsigned integer to base 256, and similarly, the latter is being stored in Val[1]
.
The FromBinary
fits with this interpretation:
func (api *Bytes32API) FromBinary(vs ...Uint248) Bytes32 {
var list List[Uint248] = vs
values := list.Values()
for i := len(vs); i < 256; i++ {
values = append(values, 0)
}
res := Bytes32{}
res.Val[0] = api.g.FromBinary(values[:numBitsPerVar]...)
res.Val[1] = api.g.FromBinary(values[numBitsPerVar:]...)
return res
}
However, ConstBytes32
functions differently:
// ConstBytes32 initializes a constant Bytes32 circuit variable. Panics if the
// length of the supplied data bytes is larger than 32.
func ConstBytes32(data []byte) Bytes32 {
if len(data) > 32 {
panic(fmt.Errorf("ConstBytes32 called with data of length %d", len(data)))
}
bits := decomposeBits(new(big.Int).SetBytes(data), 256)
lo := recompose(bits[:numBitsPerVar], 1)
hi := recompose(bits[numBitsPerVar:], 1)
return Bytes32{[2]frontend.Variable{lo, hi}}
}
This function is passed a slice of bytes data
. Based on what was discussed before regarding the other functions, we would expect that data[0]
through data[(numBitsPerVar / 8) - 1]
are stored in Val[0]
and the remaining bytes in Val[1]
.
Instead, the function first uses new(big.Int).SetBytes(data)
to obtain a big.Int
from data
. This will interpret data
in big endian. So data[0]
will be the most significant byte of the resulting integer. This integer is then converted to bits with decomposeBits
, which will order bits with little endian. Thus, now data[0]
, as the most significant byte, will occur as bits 248 to 255. Finally, the bits are recomposed (using little-endian interpretation again) into two values. The value lo
, used for Val[0]
, will consist of the first numBitsPerVar
bits, which will thus correspond to bytes byte[31-0]
to data[31-((numBitsPerVar / 8) - 1)]
. So the least significant eight bits of Val[0]
will be data[31]
, the next least significant eight bits will be data[30]
, and so on, up to the most significant eight bits data[31-((numBitsPerVar / 8) - 1)]
. The second field Val[1]
will have as the least significant eight bits the byte data[31-(numBitsPerVar / 8)]
.
This way that ConstBytes32
handles its argument does thus not fit a compatible interpretation of the Bytes32
data type that also incorporates the other functions; the order of the bytes is reversed by ConstBytes32
.
It would be instructive to also look at the following function, SlotOfArrayElement
, from sdk/circuit_api.go:
// SlotOfArrayElement computes the storage slot for an element in a solidity
// array state variable. arrSlot is the plain slot of the array variable.
// index determines the array index. offset determines the
// offset (in terms of bytes32) within each array element.
func (api *CircuitAPI) SlotOfArrayElement(arrSlot Bytes32, elementSize int, index, offset Uint248) Bytes32 {
//api.Uint248.AssertIsLessOrEqual(offset, ConstUint248(elementSize))
o := api.g.Mul(index.Val, elementSize)
return Bytes32{Val: [2]variable{
api.g.Add(arrSlot.Val[0], o, offset.Val),
arrSlot.Val[1],
}}
}
Here, slots in storage are addressed with Bytes32
. By the Ethereum specification, these should be interpreted as big endian to convert to unsigned integers' indexing slots, in order to add the offset. However, the function adds the (assumed small) offset to Val[0]
, which suggests that the slot address is actually stored in little endian in the Bytes32
given as argument and will be similarly for the return value.
This is compatible with ConstBytes32
, if a []byte
input is obtained by converting the address to 32 bytes using Ethereum's big-endian standard and then converted to Bytes32
using ConstBytes32
, which thus flips the order of the bytes. The return value of SlotOfArrayElement
could then be compared against similar addresses also obtained in flipped representation using ConstBytes32
. Both ConstBytes32
and SlotOfArrayElement
are implemented with a surprising reversion of the order of the bytes, but these cancel each other out so that they are compatible with each other.
Impact
The interface for Bytes32
and its use is confusing and inconsistent regarding how the type is to be interpreted and in which orders the bytes are stored. This can cause mistakes when users of the sdk use this type.
The root cause of this is that the Bytes32
is in SlotOfArrayElement
and ConstBytes32
used as if it were a byte of 256-bit unsigned integers. Endianess questions arise whenever one converts between a type consisting of a list of values (such as a list of bytes) and a type for numeric values. Using the same type with both interpretations makes the need for such conversions particularly confusing. The Bytes32
type should thus not be used like this; for 256-bit unsigned integers, a Uint256
type should be used. This would allow for explicit and therby cleaner and more transparent conversions.
Recommendations
We recommend to clearly document which functions flip orders of bytes and, on conversion, which endianess is used.
We also recommend to resolve the current discrepancy with regards to ordering of the bytes/bits between ConstBytes32
and the toBinaryVars
and FromBinary
functions.
The option we would suggest would be to use a new type Uint256
for use cases such as SlotOfArrayElement
. It could be documented that this type stores its data in little endian. If the current ConstBytes32
copied for use for Uint256
were then named to something like ConstFromBigEndianBytes
, then it would be transparent how this type behaves. As it takes arguments in big-endian order but stores data in little-endian order, it will reverse the order, which is as expected then. The ConstBytes32
function for Bytes32
should in this case be changed to not flip the order of the bytes, to establish compatibility with toBinaryVars
and FromBinary
.
An alternative would be to change ConstBytes32
to store the first byte in Val[0]
and the last 31 bytes in Val[1]
. Those 31 bytes should be stored so that the least significant eight bits of Val[1]
correspond to byte 31. If one does it this way, then Val[0]
and Val[1]
would be ordered in the expected way, with Val[0]
holding lower indexed bytes than Val[1]
, and compatibly with toBinaryVars
and FromBinary
, if they are changed to take into account that Val[1]
now stores a list of bytes in big endian instead of little-endian order as before. Additionally, it would still be possible to do the addition needed in SlotOfArrayElement
by just adding one slot (now Val[1]
).
Remediation
In , Brevis renamed he ConstBytes32
function to ConstFromBigEndianBytes
.