The _toLower
function incorrectly handles the Unicode characters
Description
The internal _toLower
function converts the hashtag
string from uppercase to lowercase. This function accurately processes only ASCII characters in the range of A to Z. However, the input string can contain Unicode symbols, where the character set is significantly broader and includes not only the Latin alphabet. So in the case where an input string contains characters outside the specified range, the function will not process them as expected and the characters will remain unchanged.
function _toLower(string memory str) internal pure returns (string memory) {
bytes memory bStr = bytes(str);
bytes memory bLower = new bytes(bStr.length);
for (uint i = 0; i < bStr.length; i++) {
// Uppercase character...
if ((uint8(bStr[i]) >= 65) && (uint8(bStr[i]) <= 90)) {
// So we add 32 to make it lowercase
bLower[i] = bytes1(uint8(bStr[i]) + 32);
} else {
bLower[i] = bStr[i];
}
}
return string(bLower);
}
Impact
If the input hashtag
string contains non-Latin characters, they will not be reduced to lowercase and will remain unchanged after processing with the _toLower
function. Therefore, other functionality of this contract can be violated by the incorrect string handling.
Recommendations
Consider limiting the input character set to only the required ASCII characters if the project does not assume that users can use the extended Unicode character set.
Remediation
This issue has been acknowledged by SAX, and a fix was implemented in commit dfb4e3f1↗.