fix: parse EXCLUDE column_list after non-star columns in Redshift#7019
fix: parse EXCLUDE column_list after non-star columns in Redshift#7019veeceey wants to merge 1 commit intotobymao:mainfrom
Conversation
In Redshift, the EXCLUDE clause can appear at the end of the entire projection list, not just immediately after a `*`. For example: `SELECT *, 4 AS col4 EXCLUDE (col2, col3) FROM t` This was previously raising a ParseError because the parser only recognized EXCLUDE as part of the Star expression parsing. The fix adds an `exclude` field to Select, overrides `_parse_projections` in the Redshift parser to extract a trailing EXCLUDE clause, and handles generation by either emitting the EXCLUDE directly (for Redshift) or wrapping in a derived table with `* EXCEPT (...)` (for other dialects). Fixes tobymao#6963
Manual Testing ResultsI've tested the fix for parsing clauses on aliased expressions in Redshift dialect. All tests passed successfully. Test 1: Issue SQL ParsingOriginal SQL from issue: SELECT *, 4 AS col4 EXCLUDE (col2, col3) FROM (SELECT 1 AS col1, 2 AS col2, 3 AS col3)✅ Parses successfully - no more parse errors Test 2: Round-trip Verification# Parse and regenerate
parsed = sqlglot.parse_one(sql, dialect='redshift')
regenerated = parsed.sql(dialect='redshift')✅ Preserves syntax correctly: SELECT *, 4 AS col4 EXCLUDE (col2, col3) FROM (SELECT 1 AS col1, 2 AS col2, 3 AS col3)Test 3: Cross-Dialect Transpilation# Spark
SELECT * EXCEPT (col2, col3) FROM (SELECT *, 4 AS col4 FROM (SELECT 1 AS col1, 2 AS col2, 3 AS col3))
# BigQuery
SELECT * EXCEPT (col2, col3) FROM (SELECT *, 4 AS col4 FROM (SELECT 1 AS col1, 2 AS col2, 3 AS col3))
# DuckDB
SELECT * EXCLUDE (col2, col3) FROM (SELECT *, 4 AS col4 FROM (SELECT 1 AS col1, 2 AS col2, 3 AS col3))✅ Correctly transpiles to dialect-specific syntax (EXCEPT vs EXCLUDE) Test 4: Edge CasesAll edge cases passed:
All tests confirm the fix resolves the original issue and handles various edge cases properly. |
|
Hey @veeceey, we really appreciate the contribution but there's currently an open PR working towards the fix so I'll go ahead and close this one. |
Summary
Fixes #6963
In Redshift, the
EXCLUDEclause can appear at the end of the entire SELECT projection list, not just immediately after*. For example, this is valid Redshift SQL:Previously this raised
ParseError: Invalid expression / Unexpected tokenbecause the parser only recognizedEXCLUDEas part of theStarexpression (i.e.,* EXCLUDE (...)).Changes
expressions.py: Addedexcludefield toSelectto store the trailing EXCLUDE column listparser.py: Changed_parse_projectionsreturn type to include an optional exclude listredshift.py(Parser): Override_parse_projectionsto detect and parse a trailingEXCLUDEclause after all projectionsredshift.py(Generator): SetSTAR_EXCEPT = "EXCLUDE"andSTAR_EXCLUDE_REQUIRES_DERIVED_TABLE = Falsefor correct Redshift outputgenerator.py: AddedSTAR_EXCLUDE_REQUIRES_DERIVED_TABLEflag; whenTrue(default), transpiles to a derived table with* EXCEPT (...); whenFalse(Redshift), emitsEXCLUDEdirectlytsql.py: Updated_parse_projectionsoverride to match new return typetest_redshift.py: Added test cases for identity, normalization, and cross-dialect transpilationTranspilation behavior
SELECT *, 4 AS col4 EXCLUDE (col2, col3) FROM tSELECT *, 4 AS col4 EXCLUDE (col2, col3) FROM tSELECT *, 4 AS col4 EXCLUDE (col2, col3) FROM tSELECT * EXCEPT (col2, col3) FROM (SELECT *, 4 AS col4 FROM t)Test plan
test_executorandtest_optimizerwhich require duckdb)EXCLUDEafter non-star columns* EXCLUDE (...)(star-only) still works as before