Skip to content

feat: enable decimal to decimal cast of different precision and scale #1086

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Nov 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -70,9 +70,13 @@ object CometCast {
case _ =>
Unsupported
}
case (_: DecimalType, _: DecimalType) =>
// https://github.com/apache/datafusion-comet/issues/375
Incompatible()
case (from: DecimalType, to: DecimalType) =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

higher to lower precision conversion in Datafusion changing the integer part, hence marked it as incompatible().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change the PR description to part of #375 instead of closing...

if (to.precision < from.precision) {
// https://github.com/apache/datafusion/issues/13492
Incompatible(Some("Casting to smaller precision is not supported"))
} else {
Compatible()
}
case (DataTypes.StringType, _) =>
canCastFromString(toType, timeZoneId, evalMode)
case (_, DataTypes.StringType) =>
Expand Down
28 changes: 28 additions & 0 deletions spark/src/test/scala/org/apache/comet/CometCastSuite.scala
Original file line number Diff line number Diff line change
Expand Up @@ -895,6 +895,34 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHelper {
}
}

test("cast between decimals with different precision and scale") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great to see that some basic tests now pass. I assume there must have been improvements in DataFusion since we started this project.

I'd like to see the tests cover more scenarios, such as:

  • Casting from smaller scale to larger scale e.g. (10, 1) to (10, 4)
  • Tests for edge cases such as negative scale and zero scale

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for more scenarios

  • edge case inputs like null
  • negative scale may not be allowed in Spark IIRC

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m adding more cases, negative scale throws scale exception in spark as well.

// cast between default Decimal(38, 18) to Decimal(6,2)
val values = Seq(BigDecimal("12345.6789"), BigDecimal("9876.5432"), BigDecimal("123.4567"))
val df = withNulls(values)
.toDF("b")
.withColumn("a", col("b").cast(DecimalType(6, 2)))
checkSparkAnswer(df)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, ideally this should be checkSparkAnswerAndOperator once precision change gets supported

}

test("cast between decimals with higher precision than source") {
// cast between Decimal(10, 2) to Decimal(10,4)
castTest(generateDecimalsPrecision10Scale2(), DataTypes.createDecimalType(10, 4))
}

test("cast between decimals with negative precision") {
// cast to negative scale
checkSparkMaybeThrows(
spark.sql("select a, cast(a as DECIMAL(10,-4)) from t order by a")) match {
case (expected, actual) =>
assert(expected.contains("PARSE_SYNTAX_ERROR") === actual.contains("PARSE_SYNTAX_ERROR"))
}
}

test("cast between decimals with zero precision") {
// cast between Decimal(10, 2) to Decimal(10,0)
castTest(generateDecimalsPrecision10Scale2(), DataTypes.createDecimalType(10, 0))
}

private def generateFloats(): DataFrame = {
withNulls(gen.generateFloats(dataSize)).toDF("a")
}
Expand Down
6 changes: 2 additions & 4 deletions spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala
Original file line number Diff line number Diff line change
Expand Up @@ -231,11 +231,9 @@ abstract class CometTestBase
df: => DataFrame): (Option[Throwable], Option[Throwable]) = {
var expected: Option[Throwable] = None
withSQLConf(CometConf.COMET_ENABLED.key -> "false") {
val dfSpark = Dataset.ofRows(spark, df.logicalPlan)
expected = Try(dfSpark.collect()).failed.toOption
expected = Try(Dataset.ofRows(spark, df.logicalPlan).collect()).failed.toOption
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the plan parsing encounters a problem like negative precision, this will catch that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the change here is just formatting: changing from 2 lines to 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Earlier only df.collect() part was inside the Try, I also included df.logicalPlan - this includes any exception during parsing, in this example, it was throwing parsing error for this

select a, cast(a as DECIMAL(10,-4)) from t order by a

}
val dfComet = Dataset.ofRows(spark, df.logicalPlan)
val actual = Try(dfComet.collect()).failed.toOption
val actual = Try(Dataset.ofRows(spark, df.logicalPlan).collect()).failed.toOption
(expected, actual)
}

Expand Down
Loading