Unchecked UTF-8 decoding enables memory corruption
Description
The following function fails to ensure the vector of strings is a well-formed UTF-8 before returning it:
public fun to_vector_string(v: vector<u8>): vector<String> {
from_bytes<vector<String>>(v)
}
So, it is possible to obtain a String containing invalid UTF-8 sequences.
Impact
A malformed UTF-8 String may be able to cause an out-of-bounds (OOB) memory read, leading to a segmentation fault or potentially providing an attacker with an arbitrary read primitive — which would allow an attacker to break chain consensus by reading OOB, nondeterministic memory. Arbitrary read primitives may also be used in conjunction with other vulnerabilities to achieve arbitrary code execution.
Additionally, there is a possibility that the String may allow an attacker to overwrite bytes on the heap. Often, heap-corruption vulnerabilities can be directly exploited to achieve code execution; that is, in the worst case scenario, an attacker could create a malicious module that exploits this vulnerability to get remote code execution (RCE) on validators' machines.
Due to the limited time available for this assessment, we did not attempt to exploit this vulnerability beyond the following simple proof of concept, which demonstrates the ability to obtain a String containing invalid UTF-8 sequences:
#[test]
fun zellic_malformed_utf8_oob_read() {
let invalid_utf8_vec = b"\x01\x01\x80";
// obtain invalid String
let res = to_vector_string(invalid_utf8_vec);
assert!(!vector::is_empty(&res), 0);
let s = vector::pop_back(&mut res);
// trigger crash
let needle = string::utf8(b"");
string::index_of(&s, &needle);
}
The output of the test is the following:
thread '<unnamed>' panicked at library/core/src/panicking.rs:156:5:
unsafe precondition(s) violated: hint::unreachable_unchecked must never be reached
stack backtrace:
0: rust_begin_unwind
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/
panicking.rs:645:5
1: core::panicking::panic_nounwind_fmt::runtime
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/
panicking.rs:110:18
2: core::panicking::panic_nounwind_fmt
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/
panicking.rs:123:9
3: core::panicking::panic_nounwind
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/
panicking.rs:156:5
4: core::hint::unreachable_unchecked::precondition_check
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/
intrinsics.rs:2799:21
5: core::hint::unreachable_unchecked
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/
hint.rs:101:5
6: core::option::Option<T>::unwrap_unchecked
at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/
option.rs:1041:30
7: core::str::validations::next_code_point
...
Following the stack trace, we can identify the cause of the crash as being the following code↗:
pub const unsafe fn unwrap_unchecked(self) -> T {
match self {
Some(val) => val,
// SAFETY: the safety contract must be upheld by the caller.
None => unsafe { hint::unreachable_unchecked() },
}
}
Recommendations
Ensure all Strings in the vector are valid before returning the vector:
public fun to_vector_string(v: vector<u8>): vector<String> {
- from_bytes<vector<String>>(v)
+ let vec_string = from_bytes<vector<String>>(v);
+ vector::for_each_ref(&vec_string, |s| {
+ assert!(string::internal_check_utf8(string::bytes(s)), EINVALID_UTF8);
+ });
+ vec_string
}