Skip to content

Fixes from security review#1067

Open
leethomason wants to merge 5 commits intomasterfrom
leethomason/security-review
Open

Fixes from security review#1067
leethomason wants to merge 5 commits intomasterfrom
leethomason/security-review

Conversation

@leethomason
Copy link
Owner

Fix potential overflows.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses potential numeric overflow/invalid-range handling when parsing numeric character references, and adds regression coverage to ensure boundary behavior is correct.

Changes:

  • Add regression tests for numeric character references at/above the Unicode max (U+10FFFF / U+110000) and for extremely long hex entities.
  • Harden XMLUtil::GetCharacterRef to reject out-of-range numeric references during accumulation to prevent wrap/overflow-based bypasses.
  • Remove an outdated build/formatting comment block from tinyxml2.h.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
xmltest.cpp Adds security regression tests covering max-code-point and overflow-sized numeric entities.
tinyxml2.cpp Adds in-loop bounds checks in numeric entity parsing and fixes an incorrect length pointer check.
tinyxml2.h Removes an obsolete comment block with build/formatting notes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +2706 to +2723
// Regression: U+10FFFF is the last valid Unicode code point and must
// parse correctly. Fix #2 must not reject the maximum valid value.
XMLDocument doc;
doc.Parse( "<t v='&#x10FFFF;'/>" );
XMLTest( "Numeric ref U+10FFFF: no error", false, doc.Error() );
const char* v = doc.FirstChildElement()->Attribute( "v" );
// U+10FFFF encodes to the 4-byte UTF-8 sequence F4 8F BF BF.
const char expected[] = {
static_cast<char>(0xF4), static_cast<char>(0x8F),
static_cast<char>(0xBF), static_cast<char>(0xBF), 0
};
XMLTest( "Numeric ref U+10FFFF: correct UTF-8 output", expected, v );
}
{
// Fix #2 boundary: U+110000 is one above the maximum code point.
// The in-loop overflow guard must catch this before ucs is written,
// leaving the entity as a literal (starting with '&').
XMLDocument doc;
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new test comments refer to "Fix #2", but there’s no local context explaining what #2 is (issue/PR/CVE), which makes the intent harder to understand later. Consider rewording these to reference the specific behavior being tested (e.g., “GetCharacterRef max code point handling”) or link to the relevant issue/CVE in the comment.

Copilot uses AI. Check for mistakes.
Comment on lines 554 to 562
const unsigned int digitScaled = mult * digit;
// Reject before adding: if digitScaled alone exceeds MAX_CODE_POINT,
// or if adding it to ucs would exceed it (checked without overflow by
// testing ucs > MAX_CODE_POINT - digitScaled, safe since digitScaled
// <= MAX_CODE_POINT at this point).
if (digitScaled > MAX_CODE_POINT || ucs > MAX_CODE_POINT - digitScaled) {
return 0;
}
ucs += digitScaled;
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

digitScaled is declared as unsigned int while mult, digit, ucs, and MAX_CODE_POINT are uint32_t. On platforms where unsigned int is narrower than 32 bits, digitScaled can truncate and bypass the overflow/range checks. Make digitScaled a uint32_t (or wider) to keep the math and comparisons in a consistent width.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants