-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-49966][SQL] Codegen Support for JsonToStructs(from_json
)
#48466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
code""" | ||
|${eval.code} | ||
|$resultType $resultTerm = ($resultType) $refEvaluator.evaluate( | ||
| ${eval.isNull} ? null : ${eval.value}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need this check? seems like evaluate()
does this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's redundant. I have already removed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code generated by an example is roughly like this:
- Before
/* 031 */ boolean localtablescan_isNull_0 = localtablescan_row_0.isNullAt(0);
/* 032 */ UTF8String localtablescan_value_0 = localtablescan_isNull_0 ?
/* 033 */ null : (localtablescan_row_0.getUTF8String(0));
/* 034 */ InternalRow project_result_0 = (InternalRow) ((org.apache.spark.sql.catalyst.expressions.json.JsonToStructsEvaluator) references[1] /* evaluator */).evaluate(
/* 035 */ localtablescan_isNull_0 ? null : localtablescan_value_0);
/* 036 */ boolean project_isNull_0 = project_result_0 == null;
/* 037 */ InternalRow project_value_0 = null;
/* 038 */ if (!project_isNull_0) {
/* 039 */ project_value_0 = project_result_0;
/* 040 */ }
/* 041 */ project_mutableStateArray_0[0].reset();
/* 042 */
/* 043 */ project_mutableStateArray_0[0].zeroOutNullBytes();
- After
/* 031 */ boolean localtablescan_isNull_0 = localtablescan_row_0.isNullAt(0);
/* 032 */ UTF8String localtablescan_value_0 = localtablescan_isNull_0 ?
/* 033 */ null : (localtablescan_row_0.getUTF8String(0));
/* 034 */ InternalRow project_result_0 = (InternalRow) ((org.apache.spark.sql.catalyst.expressions.json.JsonToStructsEvaluator) references[1] /* evaluator */).evaluate(localtablescan_value_0);
/* 035 */ boolean project_isNull_0 = project_result_0 == null;
/* 036 */ InternalRow project_value_0 = null;
/* 037 */ if (!project_isNull_0) {
/* 038 */ project_value_0 = project_result_0;
/* 039 */ }
/* 040 */ project_mutableStateArray_0[0].reset();
/* 041 */
/* 042 */ project_mutableStateArray_0[0].zeroOutNullBytes();
- Obviously unnecessary

- So it has been removed.
+1, LGTM. Merging to master. |
|
||
override def nullSafeEval(json: Any): Any = evaluator.evaluate(json.asInstanceOf[UTF8String]) | ||
|
||
override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to use Invoke
with Literal(new JsonToStructsEvaluator(...), ObjectType(...))
to rewrite the expression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me investigate it.
…on`) ### What changes were proposed in this pull request? The pr aims to use `Invoke` to implement `JsonToStructs`(`from_json`). ### Why are the changes needed? Based on cloud-fan's suggestion, I believe that implementing `JsonToStructs`(`from_json`) with `Invoke` can greatly simplify the code. #48466 (comment) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Update existed UT. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48509 from panbingkun/SPARK-49966_FOLLOWUP. Authored-by: panbingkun <[email protected]> Signed-off-by: Max Gekk <[email protected]>
### What changes were proposed in this pull request? The pr aims to add `Codegen` Support for `JsonToStructs`(`from_json`). ### Why are the changes needed? - improve codegen coverage. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass GA & Existed UT (eg: JsonFunctionsSuite#`*from_json*`) ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48466 from panbingkun/SPARK-49966. Authored-by: panbingkun <[email protected]> Signed-off-by: Max Gekk <[email protected]>
…on`) ### What changes were proposed in this pull request? The pr aims to use `Invoke` to implement `JsonToStructs`(`from_json`). ### Why are the changes needed? Based on cloud-fan's suggestion, I believe that implementing `JsonToStructs`(`from_json`) with `Invoke` can greatly simplify the code. apache#48466 (comment) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Update existed UT. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48509 from panbingkun/SPARK-49966_FOLLOWUP. Authored-by: panbingkun <[email protected]> Signed-off-by: Max Gekk <[email protected]>
I found that this one has made the SubExprEliminationBenchmark unexecutable.
then
@panbingkun Do you have time to take a look at this issue? Thanks ~ also cc @MaxGekk and @cloud-fan |
spark/sql/core/src/test/scala/org/apache/spark/sql/execution/SubExprEliminationBenchmark.scala Lines 80 to 86 in be0ae13
If |
Thank you for noticing this issue, let me investigate it. |
It seems that there is a performance regression in the |
Thank you for reporting, @LuciferYang . |
The number of data rows in this benchmark is a bit small (100 rows). spark/sql/core/src/test/scala/org/apache/spark/sql/execution/SubExprEliminationBenchmark.scala Line 125 in 8bd7789
Let me first verify the performance data locally with more data rows. |
I think it's because there seem to be some issues with the |
### What changes were proposed in this pull request? This PR aims to regenerate benchmark results as a new baseline before cutting branch-4.0. ### Why are the changes needed? To make the result up-to-date and make it easy to validate the regression. - `SubExprEliminationBenchmark` benchmark result is not included because it's broken currently. - #48466 (comment) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #49409 from dongjoon-hyun/bm_20250107. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: yangjie01 <[email protected]>
What changes were proposed in this pull request?
The pr aims to add
Codegen
Support forJsonToStructs
(from_json
).Why are the changes needed?
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Pass GA & Existed UT (eg: JsonFunctionsSuite#
*from_json*
)Was this patch authored or co-authored using generative AI tooling?
No.