[SPARK-55062][Protobuf] Support proto2 extensions in protobuf functions#53828
Closed
dichlorodiphen wants to merge 13 commits intoapache:masterfrom
Closed
[SPARK-55062][Protobuf] Support proto2 extensions in protobuf functions#53828dichlorodiphen wants to merge 13 commits intoapache:masterfrom
dichlorodiphen wants to merge 13 commits intoapache:masterfrom
Conversation
JIRA Issue Information=== Improvement SPARK-55062 === This comment was automatically generated by GitHub Actions |
zikangh
reviewed
Jan 16, 2026
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufFunctionsSuite.scala
Outdated
Show resolved
Hide resolved
dichlorodiphen
commented
Jan 21, 2026
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDataToCatalyst.scala
Show resolved
Hide resolved
zikangh
approved these changes
Jan 28, 2026
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufUtils.scala
Show resolved
Hide resolved
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala
Show resolved
Hide resolved
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufUtils.scala
Outdated
Show resolved
Hide resolved
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufUtils.scala
Show resolved
Hide resolved
8995c4c to
1afffd0
Compare
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufUtils.scala
Show resolved
Hide resolved
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/checkpointing/OffsetSeq.scala
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/checkpointing/OffsetSeq.scala
Show resolved
Hide resolved
7d45de1 to
039a9b2
Compare
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufFunctionsSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/OffsetSeqLogSuite.scala
Outdated
Show resolved
Hide resolved
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala
Show resolved
Hide resolved
anishshri-db
approved these changes
Feb 21, 2026
Contributor
anishshri-db
left a comment
There was a problem hiding this comment.
lgtm pending green CI
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR adds support for proto2 extensions to
from_protobufandto_protobuf(when file descriptor set is provided, as Java classes do not contain enough information to support extensions).This is done by building an ExtensionRegistry and a map from descriptor name to its extensions. The registry is used during construction of the DynamicMessage to provide the Protobuf library with visibility of the extensions. The index is plumbed through the various helper classes for use in schema conversion and serde.
This new functionality is gated behind the Spark config property
spark.sql.function.protobufExtensions.enabled.Why are the changes needed?
Proto2 extensions are a valid, if somewhat uncommon, feature of Protobuf, and it therefore makes sense to incorporate them into the schema when provided so as to not confuse the user.
Does this PR introduce any user-facing change?
Yes. Previously, extension fields would be dropped by both
from_protobufandto_protobuf. Now, they are retained. This can be demonstrated with the minimal example below. See the unit tests for more examples.How was this patch tested?
Unit tests were added for the new behavior, including basic behavior, extending nested messages, and extensions defined in separate files.
Was this patch authored or co-authored using generative AI tooling?
Initial draft authored with Claude Code.
Generated-by: claude-4.5-opus