Skip to content

Conversation

@coderfender
Copy link
Contributor

@coderfender coderfender commented Jan 13, 2026

Which issue does this PR close?

  1. Add int (byte/short/int/long) → binary cast support for LEGACY mode only (per Spark)
  2. Add boolean → decimal cast support
  3. Update cast tests to use DSL instead of SQL for regular casts (except try mode since try_cast is only supported starting Spark 4x)
  4. Skip try_cast tests for casts not supported by Spark SQL

Relavent spark code :
https://github.com/apache/spark/blob/f89fbd49c45045b44259341c6c73e30ee9c46dd0/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala#L682

(Note that only legacy mode routes the code here while we Try and ANSI are not supported)

Closes #.

Rationale for this change

  1. Add new comet native cast functionality

What changes are included in this PR?

How are these changes tested?

  1. CastTestSuite tests and additional tests to cast Boolean to various Decimal Types

@coderfender coderfender changed the title feat : implement cast from int (whole numbers) to binary feat : implement cast from int (whole numbers) to binary format Jan 14, 2026
@coderfender coderfender changed the title feat : implement cast from int (whole numbers) to binary format feat: implement cast from int (whole numbers) to binary format Jan 14, 2026
@coderfender
Copy link
Contributor Author

@parthchandra , @andygrove these are the changes to support int to binary cast and bool to decimal cast operation . I have also had to change the base test to make sure that we use DSL (since Int -> Binary casts are only supported in the DSL side of Spark and would raise an encoding exception if we use Spark SQL) . Also, Int -> Binary casts do not support Try or ANSI eval modes in Spark and I have followed the same convention here as well

@coderfender coderfender changed the title feat: implement cast from int (whole numbers) to binary format feat: implement cast from int whole numbers to binary format Jan 14, 2026
@coderfender coderfender changed the title feat: implement cast from int whole numbers to binary format feat: implement cast from whole numbers to binary format Jan 14, 2026
@coderfender
Copy link
Contributor Author

coderfender commented Jan 14, 2026

Fixed format issue with compatibility.md ( I initially didn't add it but turns out the builds are failing with git errors)

@codecov-commenter
Copy link

codecov-commenter commented Jan 14, 2026

Codecov Report

❌ Patch coverage is 63.82979% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.98%. Comparing base (f09f8af) to head (6419906).
⚠️ Report is 918 commits behind head on main.

Files with missing lines Patch % Lines
...scala/org/apache/comet/expressions/CometCast.scala 63.82% 0 Missing and 17 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3083      +/-   ##
============================================
+ Coverage     56.12%   59.98%   +3.85%     
- Complexity      976     1464     +488     
============================================
  Files           119      175      +56     
  Lines         11743    16171    +4428     
  Branches       2251     2685     +434     
============================================
+ Hits           6591     9700    +3109     
- Misses         4012     5114    +1102     
- Partials       1140     1357     +217     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding int→binary and boolean→decimal cast support!

Issue: Dead code

The function canCastToBinary (lines 354-358 in CometCast.scala) is defined but never called anywhere. This appears to be unused code that should be removed.

private def canCastToBinary(fromType: DataType): SupportLevel = fromType match {
  case DataTypes.ByteType | DataTypes.ShortType | DataTypes.IntegerType | DataTypes.LongType =>
    Compatible()
  case _ => Unsupported(Some(s"Cast from BinaryType to $fromType is not supported"))
}

The actual logic for int→binary casts is correctly handled in the canCastFromByte, canCastFromShort, canCastFromInt, and canCastFromLong functions.


Otherwise, the implementation looks good:

  • Correctly limits int→binary to LEGACY mode only (matching Spark)
  • Boolean→Decimal implementation handles precision/scale correctly
  • Tests use column data from parquet (not just literals)
  • Compatibility docs updated
  • ANSI and Try modes correctly disabled for int→binary tests

This review was generated with AI assistance.

@andygrove
Copy link
Member

There is a test failure:

2026-01-14T23:17:19.7092172Z - cast StringType to DecimalType(2,2) *** FAILED *** (684 milliseconds)
2026-01-14T23:17:19.7092965Z   "[[CAST_INVALID_INPUT] The value '
2026-01-14T23:17:19.7100086Z 3' of the type "STRING" cannot be cast to "DECIMAL(2,2)" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error]." did not equal "[[NUMERIC_VALUE_OUT_OF_RANGE] 3 cannot be represented as Decimal(2, 2). If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error, and return NULL instead]." (CometCastSuite.scala:1424)
2026-01-14T23:17:19.7102421Z   Analysis:
2026-01-14T23:17:19.7102696Z   "[[CAST_INVALID_INPUT] The value '
2026-01-14T23:17:19.7105340Z 3' of the type "STRING" cannot be cast to "DECIMAL(2,2)" because it is malformed. Correct the value as per the syntax, or change its target type. Use `try_cast` to tolerate malformed input and return NULL instead. If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error]." -> "[[NUMERIC_VALUE_OUT_OF_RANGE] 3 cannot be represented as Decimal(2, 2). If necessary set "spark.sql.ansi.enabled" to "false" to bypass this error, and return NULL instead]."

@coderfender
Copy link
Contributor Author

Thank you @andygrove ,I am working on the test failures (I believe there is a bug on the DSL side vs using Spark SQL)

@coderfender coderfender force-pushed the support_short_to_binary branch from 910fe4f to 69d8290 Compare January 17, 2026 02:35
@coderfender
Copy link
Contributor Author

There is a mismatch in the error message thrown by spark vs comet for a cast test from StringType to DecimalType(2,2) . I will raise a new PR to fix that error message .

Steps to Reproduce :


  test("cast StringType to DecimalType(2,2) -test error message") {
    withSQLConf(CometConf.getExprAllowIncompatConfigKey(classOf[Cast]) -> "true") {
      println("testing with simple input")
      val values = Seq("       3").toDF("a")
      Seq(true, false).foreach(ansiEnabled =>
        castTest(values, DataTypes.createDecimalType(2, 2), testAnsi = ansiEnabled))
    }
  }

@coderfender
Copy link
Contributor Author

cc : @andygrove , @parthchandra

@coderfender
Copy link
Contributor Author

I also found that the tests are not running on truly deterministic inputs . For example this test is failing (if I disable / change inputs for a test before that) :

  test("cast StringType to DecimalType(38,10) high precision") {
    withSQLConf(CometConf.getExprAllowIncompatConfigKey(classOf[Cast]) -> "true") {
      // TODO fix for Spark 4.0.0
      assume(!isSpark40Plus)
      val values = gen.generateStrings(dataSize, numericPattern, 38).toDF("a")
      Seq(true, false).foreach(ansiEnabled =>
        castTest(values, DataTypes.createDecimalType(38, 10), testAnsi = ansiEnabled))
    }
  }

[0e31,0E-10]                                   [0e31,null]

@coderfender
Copy link
Contributor Author

Related PR to fix test in decimal cast : #3248

@coderfender coderfender force-pushed the support_short_to_binary branch from 8ccd20a to fb8814f Compare January 25, 2026 08:33
@coderfender coderfender changed the title feat: implement cast from whole numbers to binary format feat: implement cast from whole numbers to binary format (depends on #3248) Jan 26, 2026
@coderfender coderfender changed the title feat: implement cast from whole numbers to binary format (depends on #3248) feat: implement cast from whole numbers to binary format (blocked by #3248) Jan 26, 2026
@coderfender coderfender force-pushed the support_short_to_binary branch from d1f1512 to 05e7e50 Compare February 1, 2026 20:56
@coderfender coderfender changed the title feat: implement cast from whole numbers to binary format (blocked by #3248) feat: implement cast from whole numbers to binary format Feb 1, 2026
@coderfender
Copy link
Contributor Author

@andygrove , @parthchandra I rebased the feature with main and the CometCastSuite finishes successfully on my local machine. Please review and kickoff CI whenever you get a chance

if (testTry) {
data.createOrReplaceTempView("t")
// try_cast() should always return null for invalid inputs
// not using spark DSL since it `try_cast` is only available from Spark 4x
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear to me what this comment means. Do you mean that this should be run only for Spark 4.0? Maybe this should be in a different set of tests that assumes 4.0 is available?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure . The try cast method is only available in spark 4 and my new cast impl doesnt even support that eval mode

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So spark added DSL method to support try_cast starting v 4.0.0 . In order to keep the build working, I kept the try_cast as a SQL statement. Also, these changes to support Int / Long -> Binary dont support ANSI or Try Eval modes so I thought to keep the changes minimal

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what does this do in Spark 3.5? The query in the test has a try_cast in it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try_cast was introduced in Spark 3.2 it seems https://issues.apache.org/jira/browse/SPARK-34881

Copy link
Contributor Author

@coderfender coderfender Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. That is the SQL version of it. DSL version of it got added in v4.0.0 . I initially tried to convert all SQL statements to DSL but had to hold back on updating try_cast DSL from SQL given that it is only available starting v.4 and wold cause compilation issues in other spark 3x versions

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I get it now.
nit: can you align the // ?

@coderfender coderfender changed the title feat: implement cast from whole numbers to binary format feat: implement cast from whole numbers to binary format and bool to decimal Feb 3, 2026
Copy link
Contributor

@parthchandra parthchandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. (minor comment)

if (testTry) {
data.createOrReplaceTempView("t")
// try_cast() should always return null for invalid inputs
// not using spark DSL since it `try_cast` is only available from Spark 4x
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I get it now.
nit: can you align the // ?

@coderfender
Copy link
Contributor Author

Thank you for the approval @parthchandra

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants