-
Notifications
You must be signed in to change notification settings - Fork 281
feat: implement cast from whole numbers to binary format and bool to decimal #3083
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@parthchandra , @andygrove these are the changes to support int to binary cast and bool to decimal cast operation . I have also had to change the base test to make sure that we use DSL (since Int -> Binary casts are only supported in the DSL side of Spark and would raise an encoding exception if we use Spark SQL) . Also, Int -> Binary casts do not support Try or ANSI eval modes in Spark and I have followed the same convention here as well |
|
Fixed format issue with |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3083 +/- ##
============================================
+ Coverage 56.12% 59.98% +3.85%
- Complexity 976 1464 +488
============================================
Files 119 175 +56
Lines 11743 16171 +4428
Branches 2251 2685 +434
============================================
+ Hits 6591 9700 +3109
- Misses 4012 5114 +1102
- Partials 1140 1357 +217 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
andygrove
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding int→binary and boolean→decimal cast support!
Issue: Dead code
The function canCastToBinary (lines 354-358 in CometCast.scala) is defined but never called anywhere. This appears to be unused code that should be removed.
private def canCastToBinary(fromType: DataType): SupportLevel = fromType match {
case DataTypes.ByteType | DataTypes.ShortType | DataTypes.IntegerType | DataTypes.LongType =>
Compatible()
case _ => Unsupported(Some(s"Cast from BinaryType to $fromType is not supported"))
}The actual logic for int→binary casts is correctly handled in the canCastFromByte, canCastFromShort, canCastFromInt, and canCastFromLong functions.
Otherwise, the implementation looks good:
- Correctly limits int→binary to LEGACY mode only (matching Spark)
- Boolean→Decimal implementation handles precision/scale correctly
- Tests use column data from parquet (not just literals)
- Compatibility docs updated
- ANSI and Try modes correctly disabled for int→binary tests
This review was generated with AI assistance.
|
There is a test failure: |
|
Thank you @andygrove ,I am working on the test failures (I believe there is a bug on the DSL side vs using Spark SQL) |
910fe4f to
69d8290
Compare
|
There is a mismatch in the error message thrown by spark vs comet for a cast test from StringType to DecimalType(2,2) . I will raise a new PR to fix that error message . Steps to Reproduce : |
|
cc : @andygrove , @parthchandra |
|
I also found that the tests are not running on truly deterministic inputs . For example this test is failing (if I disable / change inputs for a test before that) : |
|
Related PR to fix test in decimal cast : #3248 |
8ccd20a to
fb8814f
Compare
d1f1512 to
05e7e50
Compare
|
@andygrove , @parthchandra I rebased the feature with main and the CometCastSuite finishes successfully on my local machine. Please review and kickoff CI whenever you get a chance |
| if (testTry) { | ||
| data.createOrReplaceTempView("t") | ||
| // try_cast() should always return null for invalid inputs | ||
| // not using spark DSL since it `try_cast` is only available from Spark 4x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not clear to me what this comment means. Do you mean that this should be run only for Spark 4.0? Maybe this should be in a different set of tests that assumes 4.0 is available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure . The try cast method is only available in spark 4 and my new cast impl doesnt even support that eval mode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So spark added DSL method to support try_cast starting v 4.0.0 . In order to keep the build working, I kept the try_cast as a SQL statement. Also, these changes to support Int / Long -> Binary dont support ANSI or Try Eval modes so I thought to keep the changes minimal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what does this do in Spark 3.5? The query in the test has a try_cast in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try_cast was introduced in Spark 3.2 it seems https://issues.apache.org/jira/browse/SPARK-34881
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. That is the SQL version of it. DSL version of it got added in v4.0.0 . I initially tried to convert all SQL statements to DSL but had to hold back on updating try_cast DSL from SQL given that it is only available starting v.4 and wold cause compilation issues in other spark 3x versions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I get it now.
nit: can you align the // ?
parthchandra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. (minor comment)
| if (testTry) { | ||
| data.createOrReplaceTempView("t") | ||
| // try_cast() should always return null for invalid inputs | ||
| // not using spark DSL since it `try_cast` is only available from Spark 4x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I get it now.
nit: can you align the // ?
|
Thank you for the approval @parthchandra |

Which issue does this PR close?
try_castis only supported starting Spark 4x)Relavent spark code :
https://github.com/apache/spark/blob/f89fbd49c45045b44259341c6c73e30ee9c46dd0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L682
(Note that only legacy mode routes the code here while we Try and ANSI are not supported)
Closes #.
Rationale for this change
What changes are included in this PR?
How are these changes tested?