Lack of comprehensive test suite
Description
During this audit, we observed a number of severe findings that affect the core logic of the codebase. Some of these findings could result in the failure of working on the production environment if not fixed, even if no malicious attack is assumed.
When building a complex contract ecosystem with multiple moving parts and dependencies, comprehensive testing is essential. This includes testing for both positive and negative scenarios. Positive tests should verify that each function's side effect is as expected, while negative tests should cover every revert, preferably in every logical branch.
The test coverage for this project should be expanded to include more than just surface-level functions. It is important to test the invariants required for ensuring security.
Good test coverage has multiple effects.
It finds bugs and design flaws early (preaudit or prerelease).
It gives insight into areas for optimization (e.g., gas cost).
It displays code maturity.
It bolsters customer trust in your product.
It improves understanding of how the code functions, integrates, and operates — for developers and auditors alike.
It increases development velocity long-term.
The last point seems contradictory, given the time investment to create and maintain tests. To expand upon that, tests help developers trust their own changes. It is difficult to know if a code refactor — or even just a small one-line fix — breaks something if there are no tests. This is especially true for new developers or those returning to the code after a prolonged absence. Tests have your back here. They are an indicator that the existing functionality most likely was not broken by your change to the code.
Impact
Writing comprehensive test suites can prevent many logical errors existing in the codebase. Finding ref↗ could have been discovered and remediated before this audit if it was attempted to test the behavior of Aggregation ISM that skips ISMs without metadata.
It is also important to write comprehensive negative tests as well as positive tests. This helps not only uncover complicated issues but also understand the exact behavior of the codebase. Finding ref↗ discovers the straightforward counterexample of the premature optimization, and we believe that writing negative tests would have found the same counterexample and prevented this issue in advance.
We strongly recommend Pragma to strive for 100% code coverage. It has come to our attention that some trivial functions, such as upgrade()
, do not have corresponding tests. Finding ref↗ is a case where achieving 100% code coverage could have discovered the issue.
We also recommend enhancing the integration tests because issues affecting the interactions between contracts cannot be easily discovered with unit tests. For instance, Finding ref↗ discovers the design error of the fee-forwarding mechanism, and we believe it could have been discovered with proper integration tests, including the protocol-fee hook contract.
Recommendations
Consider building a rigorous test suite that includes all functions and possibly attainable edge cases for the Hyperlane protocol.