Fixes a StackOverflow issue with the Lexer system.#259
Fixes a StackOverflow issue with the Lexer system.#259GitHubProUser67 wants to merge 1 commit intonuskey8:mainfrom
Conversation
When doing a lua script with a whole lot of comments starting with "--", it will throw a StackOverflow assertion due to excessive amount of MoveNext calls. This commit fixes that bug by manually doing recursive jobs.
There was a problem hiding this comment.
Pull request overview
This PR aims to prevent stack overflows in the Lua lexer caused by recursively calling MoveNext() while skipping large volumes of -- comments.
Changes:
- Added an iterative loop to consume runs of spaces/tabs before continuing lexing.
- Reworked
-handling to attempt iterative skipping of line/block comments instead of recursively callingMoveNext().
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -131,13 +136,24 @@ public bool MoveNext() | |||
| ReadUntilEOL(ref span, ref offset, out _); | |||
| } | |||
|
|
|||
| return MoveNext(); | |||
| } | |||
| else | |||
| { | |||
| current = SyntaxToken.Subtraction(position); | |||
| return true; | |||
| // prepare for next iteration | |||
| if (offset < span.Length) | |||
| { | |||
| c2 = (offset + 1 < span.Length) ? span[offset + 1] : char.MinValue; | |||
| if (span[offset] != '-') break; // next char is not a comment, exit loop | |||
| } | |||
| else | |||
| { | |||
| break; | |||
| } | |||
| } | |||
|
|
|||
| // after skipping comments, if we reached end, return false | |||
| if (offset >= span.Length) | |||
| return false; | |||
|
|
|||
| current = SyntaxToken.Subtraction(position); // if single '-' remains | |||
| return true; | |||
There was a problem hiding this comment.
This change is intended to prevent stack overflows on inputs with many -- comments, but the current test suite only checks a few small comment cases. Please add a regression test that feeds the lexer a large number of comment lines (and optionally comments followed by real code) to ensure MoveNext() can iterate through them without a stack overflow and still produces correct tokens afterward.
| @@ -131,13 +136,24 @@ public bool MoveNext() | |||
| ReadUntilEOL(ref span, ref offset, out _); | |||
| } | |||
|
|
|||
| return MoveNext(); | |||
| } | |||
| else | |||
| { | |||
| current = SyntaxToken.Subtraction(position); | |||
| return true; | |||
| // prepare for next iteration | |||
| if (offset < span.Length) | |||
| { | |||
| c2 = (offset + 1 < span.Length) ? span[offset + 1] : char.MinValue; | |||
| if (span[offset] != '-') break; // next char is not a comment, exit loop | |||
| } | |||
| else | |||
| { | |||
| break; | |||
| } | |||
| } | |||
|
|
|||
| // after skipping comments, if we reached end, return false | |||
| if (offset >= span.Length) | |||
| return false; | |||
|
|
|||
| current = SyntaxToken.Subtraction(position); // if single '-' remains | |||
| return true; | |||
There was a problem hiding this comment.
The comment-skipping logic in the '-' case does not continue lexing after a comment unless the file ends. After skipping a line/block comment, this code falls through to current = SyntaxToken.Subtraction(position) without consuming the current character at offset, so inputs like "--comment\nprint(1)" will incorrectly emit a subtraction token (and can get stuck reprocessing the same character). Restructure this branch so that when a comment is recognized (c2 == '-'), it skips the comment and then restarts tokenization from the new offset (iteratively, not by emitting Subtraction).
| @@ -131,13 +136,24 @@ public bool MoveNext() | |||
| ReadUntilEOL(ref span, ref offset, out _); | |||
| } | |||
|
|
|||
| return MoveNext(); | |||
| } | |||
| else | |||
| { | |||
| current = SyntaxToken.Subtraction(position); | |||
| return true; | |||
| // prepare for next iteration | |||
| if (offset < span.Length) | |||
| { | |||
| c2 = (offset + 1 < span.Length) ? span[offset + 1] : char.MinValue; | |||
| if (span[offset] != '-') break; // next char is not a comment, exit loop | |||
| } | |||
There was a problem hiding this comment.
Within the while loop, position is captured once before entering the '-' case and is never recomputed after skipping a full comment line. If the loop consumes multiple comments (or reaches code after a comment), any diagnostics raised from later iterations will report the wrong start position. A safer approach is to avoid trying to process multiple logical tokens inside the same '-' switch arm; after skipping one comment, restart the main lexing loop so startOffset/position are recalculated for the next token.
| var pos = position; | ||
| Advance(1); | ||
| Advance(1); // consume first '-' | ||
|
|
||
| // block comment | ||
| if (span.Length > offset + 1 && span[offset] is '[' && span[offset + 1] is '[' or '=') | ||
| if (span.Length > offset + 1 && span[offset] == '[' && (span[offset + 1] == '[' || span[offset + 1] == '=')) | ||
| { | ||
| Advance(1); | ||
| Advance(1); // consume second '-' | ||
| var (_, _, isTerminated) = ReadUntilLongBracketEnd(ref span); |
There was a problem hiding this comment.
The inline comments on the Advance(1) calls are misleading. At this point the first '-' has already been consumed before the switch, so Advance(1) here consumes the second '-', and the subsequent Advance(1) inside the block-comment branch consumes the '[' that starts the long-bracket delimiter. Please update these comments to reflect what is actually being consumed to avoid future regressions.
When doing a lua script with a whole lot of comments starting with "--", it will throw a StackOverflow assertion due to excessive amount of MoveNext calls. This commit fixes that bug by manually doing recursive jobs.