Skip to content

[SPARK-55062][Protobuf] Support proto2 extensions in protobuf functions#53828

Closed
dichlorodiphen wants to merge 13 commits intoapache:masterfrom
dichlorodiphen:proto-rebase
Closed

[SPARK-55062][Protobuf] Support proto2 extensions in protobuf functions#53828
dichlorodiphen wants to merge 13 commits intoapache:masterfrom
dichlorodiphen:proto-rebase

Conversation

@dichlorodiphen
Copy link
Contributor

@dichlorodiphen dichlorodiphen commented Jan 16, 2026

What changes were proposed in this pull request?

This PR adds support for proto2 extensions to from_protobuf and to_protobuf (when file descriptor set is provided, as Java classes do not contain enough information to support extensions).

This is done by building an ExtensionRegistry and a map from descriptor name to its extensions. The registry is used during construction of the DynamicMessage to provide the Protobuf library with visibility of the extensions. The index is plumbed through the various helper classes for use in schema conversion and serde.

This new functionality is gated behind the Spark config property spark.sql.function.protobufExtensions.enabled.

Why are the changes needed?

Proto2 extensions are a valid, if somewhat uncommon, feature of Protobuf, and it therefore makes sense to incorporate them into the schema when provided so as to not confuse the user.

Does this PR introduce any user-facing change?

Yes. Previously, extension fields would be dropped by both from_protobuf and to_protobuf. Now, they are retained. This can be demonstrated with the minimal example below. See the unit tests for more examples.

message Person {
    int32 id = 1;
    extensions 100 to 200;
}
extend Person {
    int32 age = 100;
}

How was this patch tested?

Unit tests were added for the new behavior, including basic behavior, extending nested messages, and extensions defined in separate files.

Was this patch authored or co-authored using generative AI tooling?

Initial draft authored with Claude Code.

Generated-by: claude-4.5-opus

@github-actions
Copy link

JIRA Issue Information

=== Improvement SPARK-55062 ===
Summary: from_protobuf and to_protobuf do not support proto2 extensions
Assignee: None
Status: Open
Affected: ["4.1.1"]


This comment was automatically generated by GitHub Actions

Copy link
Contributor

@anishshri-db anishshri-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm pending green CI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants