Finality provider can get stuck in an infinite loop
Description
See Finding ref↗ for a more detailed background on the finality provider.
The retryCheckRandomnessUntilBlockFinalized()
function is used by finality providers to wait for one of two conditions:
Exit-condition 1 — The block's randomness number was committed.
Exit-condition 2 — The block was already finalized.
The function additionally has logic to only retry a specific number of times. This is used to handle the case where a bug in the randomnessCommitmentLoop()
prevents the block's randomness number from being committed. If enough finality providers were to run into such a bug, then all of them would be stuck in retryCheckRandomnessUntilBlockFinalized()
forever if there was not a limit to the number of retries.
However, the issue here is that the retry logic only occurs through an error condition, which is only reachable through a query to Babylon. This error condition will realistically never be hit and thus prevents the retry logic from working:
hasRand, err := fp.hasRandomness(targetBlock)
if err != nil {
fp.logger.Debug(
"failed to check last committed randomness",
zap.String("pk", fp.GetBtcPkHex()),
zap.Uint32("current_failures", failedCycles),
zap.Uint64("target_block_height", targetBlock.Height),
zap.Error(err),
)
// Zellic: retry logic here
failedCycles += 1
if failedCycles > uint32(fp.cfg.MaxSubmissionRetries) {
return fmt.Errorf("reached max failed cycles with err: %w", err)
}
}
Impact
This issue leads to an infinite loop in the finality providers. If enough finality providers get stuck in this loop, it will lead to a finality halt.
The impact is reduced to informational due to the following reasons:
It requires a separate bug in the
randomnessCommitmentLoop()
that prevents the randomness number from being committed.There is no attacker. The bug will trigger by itself under certain conditions, thus having a low likelihood.
Recommendations
Modify the retry logic such that it occurs every time the for loop in retryCheckRandomnessUntilBlockFinalized()
reruns.
Remediation
This issue has been acknowledged by Babylon, and a fix was implemented in commit 9fe04d26↗.