Finality provider can crash when submitting signature on finalized block
Description
The finality-provider
is a tool run by all finality providers. It automatically fetches new Babylon blocks and commits a randomness number to these blocks before providing finality signatures.
The finality-signature submission occurs through the finalitySigSubmissionLoop()
. This function, on a high level, does the following:
Ensures the finality provider has voting power at the current block height.
Waits until the
randomnessCommitmentLoop()
function has committed a randomness number to the current block or the block has already achieved finality due to other finality providers providing enough signatures.Tries to submit a finality signature for the block — this keeps being retried over an interval.
In step 2, it uses the retryCheckRandomnessUntilBlockFinalized()
. The important thing to note are the two exit conditions:
Exit-condition 1 — The block's randomness number was committed.
Exit-condition 2 — The block was already finalized.
In both of these cases, the function returns nil
, signifying that the finalitySigSubmissionLoop()
should continue to submit a finality signature. This is obviously not correct if exit-condition 2 was the reason for exiting the retryCheckRandomnessUntilBlockFinalized()
function.
The issue now (with exit-condition 2) is that when the finality provider attempts to submit a finality signature, the block's randomness number likely is not committed at all. In this scenario, the AddFinalitySig()
message handler in Babylon will return the ErrPubRandNotFound
error:
// ensure the finality provider has committed public randomness
pubRand, err := ms.GetPubRand(ctx, fpPK, req.BlockHeight)
if err != nil {
return nil, types.ErrPubRandNotFound
}
This error, in turn, is treated as one of many unrecoverable errors on the finality provider:
var unrecoverableErrors = []*sdkErr.Error{
finalitytypes.ErrBlockNotFound,
finalitytypes.ErrInvalidFinalitySig,
finalitytypes.ErrNoPubRandYet,
finalitytypes.ErrPubRandNotFound,
finalitytypes.ErrTooFewPubRand,
btcstakingtypes.ErrFpAlreadySlashed,
}
Therefore, when the finality provider attempts to submit a finality signature for this block, it will get back an ErrPubRandNotFound
and subsequently exit.
Impact
If enough finality providers run into this issue, the chain will be left in a state where block finality can never reach quorum, thus leading to a finality halt.
The finality halting itself is critical in nature. However, due to the following criteria, we think this bug has a medium level of impact:
There is no attacker. The bug will trigger by itself under certain conditions, thus having a low likelihood.
The finality providers can just restart to fix the issue.
Recommendations
Add logic to retryCheckRandomnessUntilBlockFinalized()
such that it returns something different if exit-condition 2 was the reason for exiting the function (i.e., the block was finalized). Then, the finalitySigSubmissionLoop()
function can skip the block.
Remediation
This issue has been acknowledged by Babylon, and a fix was implemented in commit 9fe04d26↗.