Fixes a StackOverflow issue with the Lexer system. by GitHubProUser67 · Pull Request #259 · nuskey8/Lua-CSharp

GitHubProUser67 · 2026-02-18T17:17:23Z

When doing a lua script with a whole lot of comments starting with "--", it will throw a StackOverflow assertion due to excessive amount of MoveNext calls. This commit fixes that bug by manually doing recursive jobs.

Copilot

Pull request overview

This PR aims to prevent stack overflows in the Lua lexer caused by recursively calling MoveNext() while skipping large volumes of -- comments.

Changes:

Added an iterative loop to consume runs of spaces/tabs before continuing lexing.
Reworked - handling to attempt iterative skipping of line/block comments instead of recursively calling MoveNext().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-24T04:12:14Z

src/Lua/CodeAnalysis/Syntax/Lexer.cs

@@ -131,13 +136,24 @@ public bool MoveNext()
                        ReadUntilEOL(ref span, ref offset, out _);
                    }

-                    return MoveNext();
-                }
-                else
-                {
-                    current = SyntaxToken.Subtraction(position);
-                    return true;
+                    // prepare for next iteration
+                    if (offset < span.Length)
+                    {
+                        c2 = (offset + 1 < span.Length) ? span[offset + 1] : char.MinValue;
+                        if (span[offset] != '-') break; // next char is not a comment, exit loop
+                    }
+                    else
+                    {
+                        break;
+                    }
                }
+
+                // after skipping comments, if we reached end, return false
+                if (offset >= span.Length)
+                    return false;
+
+                current = SyntaxToken.Subtraction(position); // if single '-' remains
+                return true;


This change is intended to prevent stack overflows on inputs with many -- comments, but the current test suite only checks a few small comment cases. Please add a regression test that feeds the lexer a large number of comment lines (and optionally comments followed by real code) to ensure MoveNext() can iterate through them without a stack overflow and still produces correct tokens afterward.

Copilot · 2026-02-24T04:12:14Z

src/Lua/CodeAnalysis/Syntax/Lexer.cs

@@ -131,13 +136,24 @@ public bool MoveNext()
                        ReadUntilEOL(ref span, ref offset, out _);
                    }

-                    return MoveNext();
-                }
-                else
-                {
-                    current = SyntaxToken.Subtraction(position);
-                    return true;
+                    // prepare for next iteration
+                    if (offset < span.Length)
+                    {
+                        c2 = (offset + 1 < span.Length) ? span[offset + 1] : char.MinValue;
+                        if (span[offset] != '-') break; // next char is not a comment, exit loop
+                    }
+                    else
+                    {
+                        break;
+                    }
                }
+
+                // after skipping comments, if we reached end, return false
+                if (offset >= span.Length)
+                    return false;
+
+                current = SyntaxToken.Subtraction(position); // if single '-' remains
+                return true;


The comment-skipping logic in the '-' case does not continue lexing after a comment unless the file ends. After skipping a line/block comment, this code falls through to current = SyntaxToken.Subtraction(position) without consuming the current character at offset, so inputs like "--comment\nprint(1)" will incorrectly emit a subtraction token (and can get stuck reprocessing the same character). Restructure this branch so that when a comment is recognized (c2 == '-'), it skips the comment and then restarts tokenization from the new offset (iteratively, not by emitting Subtraction).

Copilot · 2026-02-24T04:12:14Z

src/Lua/CodeAnalysis/Syntax/Lexer.cs

@@ -131,13 +136,24 @@ public bool MoveNext()
                        ReadUntilEOL(ref span, ref offset, out _);
                    }

-                    return MoveNext();
-                }
-                else
-                {
-                    current = SyntaxToken.Subtraction(position);
-                    return true;
+                    // prepare for next iteration
+                    if (offset < span.Length)
+                    {
+                        c2 = (offset + 1 < span.Length) ? span[offset + 1] : char.MinValue;
+                        if (span[offset] != '-') break; // next char is not a comment, exit loop
+                    }


Within the while loop, position is captured once before entering the '-' case and is never recomputed after skipping a full comment line. If the loop consumes multiple comments (or reaches code after a comment), any diagnostics raised from later iterations will report the wrong start position. A safer approach is to avoid trying to process multiple logical tokens inside the same '-' switch arm; after skipping one comment, restart the main lexing loop so startOffset/position are recalculated for the next token.

Copilot · 2026-02-24T04:12:15Z

src/Lua/CodeAnalysis/Syntax/Lexer.cs

                    var pos = position;
-                    Advance(1);
+                    Advance(1); // consume first '-'

                    // block comment
-                    if (span.Length > offset + 1 && span[offset] is '[' && span[offset + 1] is '[' or '=')
+                    if (span.Length > offset + 1 && span[offset] == '[' && (span[offset + 1] == '[' || span[offset + 1] == '='))
                    {
-                        Advance(1);
+                        Advance(1); // consume second '-'
                        var (_, _, isTerminated) = ReadUntilLongBracketEnd(ref span);


The inline comments on the Advance(1) calls are misleading. At this point the first '-' has already been consumed before the switch, so Advance(1) here consumes the second '-', and the subsequent Advance(1) inside the block-comment branch consumes the '[' that starts the long-bracket delimiter. Please update these comments to reflect what is actually being consumed to avoid future regressions.

Fixes a StackOverflow issue with the Lexer system.

f50d6d2

When doing a lua script with a whole lot of comments starting with "--", it will throw a StackOverflow assertion due to excessive amount of MoveNext calls. This commit fixes that bug by manually doing recursive jobs.

akeit0 requested a review from Copilot February 24, 2026 04:07

Copilot started reviewing on behalf of akeit0 February 24, 2026 04:08 View session

Copilot AI reviewed Feb 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Fixes a StackOverflow issue with the Lexer system.#259

Fixes a StackOverflow issue with the Lexer system.#259
GitHubProUser67 wants to merge 1 commit intonuskey8:mainfrom
GitHubProUser67:main

GitHubProUser67 commented Feb 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Copilot AI Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

GitHubProUser67 commented Feb 18, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant