Skip to content

Conversation

@xanderbailey
Copy link
Contributor

@xanderbailey xanderbailey commented Jan 16, 2026

Which issue does this PR close?

Rationale for this change

The previous implementation used UUID-based aliasing as a workaround to prevent duplicate names for literals in Substrait plans. This approach had several drawbacks:

  • Non-deterministic plan names that made testing difficult (requiring UUID regex filters)
  • Only addressed literal naming conflicts, not the broader issue of name deduplication
  • Added unnecessary dependency on the uuid crate
  • Didn't properly handle cases where the same qualified name could appear with different schema representations

What changes are included in this PR?

  1. Enhanced NameTracker: Refactored to track both schema_name() and qualified_name() separately, ensuring uniqueness across both naming dimensions
  2. Removed UUID dependency: Eliminated the uuid crate from datafusion/substrait
  3. Removed literal-specific aliasing: The UUID-based workaround in project_rel.rs is no longer needed as the improved NameTracker handles all naming conflicts consistently
  4. Deterministic naming: Name conflicts now use predictable __temp__N suffixes instead of random UUIDs

Note: This doesn't fully fix all the issues in #17508 which allow some special casing of CAST which are not included here.

Are these changes tested?

Yes:

  • Updated snapshot tests to reflect the new deterministic naming (e.g., Utf8("people")__temp__0 instead of UUID-based names)
  • Modified some roundtrip tests to verify semantic equivalence (schema matching and execution) rather than exact string matching, which is more robust
  • All existing integration tests pass with the new naming scheme

Are there any user-facing changes?

Minimal. The generated plan names are now deterministic and more readable (using __temp__N suffixes instead of UUIDs), but this is primarily an internal representation change. The functional behavior and query results remain unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve substrait NameTracker so it doesn't require uuids

1 participant