Shallow copies leading to unintended side effects
Description
Some functions provided by the sdk take in pointer-like types as arguments (possibly occurring in a nested manner, as fields of a struct) and return these pointer-like types again (usually as part of a struct). In most such instances, a deep copy is performed, with the data that is being pointed at being copied. In some cases, however, the pointer is copied instead. This results in only a single backing data, so any modification done via one of the two identical pointers will also change the data visible through the other one.
A concrete example of this occurs in the ZipMap2
function implemented in sdk/datastream.go, which is implemented as follows:
// ZipMap2 zips a data stream with a list and apply the map function over the
// zipped data. The underlying toggles of the result data stream depends on the
// toggles from the source data stream. Panics if the underlying data lengths
// mismatch
// Example: ZipMap2([1,2,3], [4,5,6], mySumFunc) -> [5,7,9]
func ZipMap2[T0, T1, R CircuitVariable](
a *DataStream[T0], b List[T1],
zipFunc ZipMap2Func[T0, T1, R],
) *DataStream[R] {
if la, lb := len(a.underlying), len(b); la != lb {
panic(fmt.Errorf("cannot zip: inconsistent underlying array lengths %d and %d", la, lb))
}
res := make([]R, len(a.underlying))
for i := range a.underlying {
va, vb := a.underlying[i], b[i]
res[i] = zipFunc(va, vb)
}
return newDataStream(a.api, res, a.toggles)
}
Note that data streams consist of a slice of some type T
and a slice of the same length of Boolean toggles, which are interpreted as indicating whether the corresponding component of the data slice is enabled. So the ZipMap2
function zips the list coming from a data stream with another list by applying a zipping function and returns a new data stream with this data and with toggles passed through from the input data stream.
Instead of copying the slice of toggles, ZipMap2
merely reuses the slice for the return data stream. But slices are pointer-like, so the issue described above occurs.
The following test demonstrates what can happen because of this:
type TestZellicZip2Circuit struct {
}
func TestZellicZip2(t *testing.T) {
c := &TestZellicZip2Circuit{}
err := test.IsSolved(c, c, ecc.BN254.ScalarField())
check(err)
}
func (c *TestZellicZip2Circuit) Define(g frontend.API) error {
api := NewCircuitAPI(g)
u248 := api.Uint248
// The user creates some datastream...
input := DataPoints[Uint248]{
Raw: newU248s([]frontend.Variable{1, 2, 3}...),
Toggles: []frontend.Variable{1, 0, 0},
}
in := NewDataStream(api, input)
in1 := newU248s([]frontend.Variable{5, 5, 5}...)
// ... and computes a zip involving it.
zippedOnlyFirst := ZipMap2(in, in1, func(a Uint248, b Uint248) Uint248 { return u248.Add(a, b) })
// The user assumes that zippedOnlyFirst will not be mutated by changing the original `toggles` list, and so reuses it.
in.underlying[0].Val = 42
in.underlying[1].Val = 43
in.underlying[2].Val = 44
in.toggles[1] = 1
in.toggles[2] = 1
zippedFull := ZipMap2(in, in1, func(a Uint248, b Uint248) Uint248 { return u248.Add(a, b) })
// However, zippedOnlyFirst has been changed as well now.
fmt.Println("The original ZipMap2 result: Data is as expected, but toggles have been unintentionally modified: now data is enabled that was not indended to be!")
zippedOnlyFirst.Show()
fmt.Println("The new ZipMap2 result: Data and toggles are updated as expected.")
zippedFull.Show()
return nil
}
In this test, a data stream is instantiated with certain toggles disabled. After applying ZipMap2
to this data stream, the data stream's toggles are modified by enabling some that were previously disabled. This will also enable entries of the zipped data stream, as can be seen in the output of the above test:
=== RUN TestZellicZip2
The original ZipMap2 result: Data is as expected, but toggles have been unintentionally modified: now data is enabled that was not indended to be!
+---+------+--------+
| # | DATA | TOGGLE |
+---+------+--------+
| 0 | 6 | 1 |
| 1 | 7 | 1 |
| 2 | 8 | 1 |
+---+------+--------+
The new ZipMap2 result: Data and toggles are updated as expected.
+---+------+--------+
| # | DATA | TOGGLE |
+---+------+--------+
| 0 | 47 | 1 |
| 1 | 48 | 1 |
| 2 | 49 | 1 |
+---+------+--------+
There are also other parts of the codebase where such shallow copies may be surprising to users, such as the ZipMap3
function that is analogous to ZipMap2
, the Clone
function for CircuitInput
in sdk/circuit_input.go, or functions like NewDataStream
, though in the latter case, this behavior might be slightly less surprising.
Impact
Users that are unaware of the shallow copies might unintentionally change data that they did not wish to change.
Let us describe one particular hypothetical scenario where a user may make some choices that can result in significant problems due to the above.
In this scenario, the application developer is working with a list of some type of data. Some part of that data (referred from now as part A) may be what is to be used for some purpose. But then there are some additional entries (part B) that should also be included in some other computations. Finally, there may be some padding to a constant length. The developer may want to make use of the DataStream
type to organize their data. For part A, they may use such a data stream with toggles enabling only those entries that are needed for part A, and analogously they would want a second data stream with toggles enabling the entries of part A as well as those for part B. In pseudocode, part of their circuit may look like this:
// This function gets some data (consisting ultimately of circuit variables), as well as circuit variables that should be Boolean and indicate which entries of data are part A and which are part B.
// We assume that part A data is something that the prover is not supposed to be able to choose, so a commitment is exposed that will be verified by the verifier. But the additional part B data entries are more freely choosable witnesses by the prover.
func complexABComputation(data []T, togglesA []frontend.Variable, togglesB []frontend.Variable) (/* return types */) {
// Checks that togglesA[i] and togglesB[i] are Boolean and not both enabled.
// ...
toggles := make([]frontend.Variable, len(togglesA))
copy(toggles, togglesA)
input := DataPoints[T]{
Raw: data // circuit variables with the data
Toggles: toggles // circuit variables reflecting which entries are part A
}
ds := NewDataStream(api, input)
// To ensure malicious provers can not cheat with the part A data, we will expose a commitment that commits to the toggles for part A, and commits to the A-enabled data entries.
commitment := calculateCommitment(ds.underlying, ds.toggles)
// `commitment` will be some kind of commitment that is constrained properly with regards to the circuit variables data[i] and togglesA[i]
resultA := ZipMap2(ds, someOtherData, someFunction)
// A result was now computed based on the datastream.
// The intention is that only those entries of the resulting datastream will be enabled that held that were A-entries, as encoded in the toggles commited to with `commitment`.
// If nothing further was done, this would be the case.
// Now the developer wants to work with the datastream with part B enabled as well. They do not need to do further computations with the original A-data anymore, and they do not know that `ZipMap2` reused the toggles slice rather than copying it. They thus believe it is fine to assign new circuit variables to the entries of the toggles slice.
for i := 0; i < len(ds.toggles); i++ {
ds.toggles[i] = api.Or(togglesA[i], togglesB[i])
}
// However, now resultA will have toggles that use the circuit variables just created (api.Or(togglesA[i], togglesB[i])) rather than the original circuit variables togglesA[i]!
// This makes resultA incorrect.
resultB := // some computation using ds
return commitment, resultA, resultB
// The caller will ultimately expose commitment, and use resultA and resultB.
}
In the example outlined above, a malicious prover may assign some part B entries specially crafted data that will cause computations done with resultsA
(where these entries are unintendedly enabled) to result in a profitable outcome for them. The commitment, which is supposed to prevent this, is ineffective, as it only commits to the entries and toggles for part A. Thus, this could amount to a critical vulnerability in this hypothetical application circuit.
The above example is hypothetical; however, it does seem plausible that usage with issues such as the described one could occur.
Recommendations
We recommend to make the sdk more robust against these kind of problems by deep copying. We have not done an exhaustive search for shallow copies, so the list given in the description subsection above should not be taken to be complete. In those cases, where public functions perform shallow copies of pointer-like types and the implementation is not changed, we recommend to clearly document this to avoid users making the discussed mistake.
Remediation
In , ZipMap2
and ZipMap3
were modified to copy rather than reuse the toggles slice.